date:20210407

[GitHub] [hudi] garyli1019 closed pull request #2783: [DOCS]Add docs for 0.8.0 release

2021-04-07 Thread GitBox



garyli1019 closed pull request #2783:
URL: https://github.com/apache/hudi/pull/2783


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 commented on pull request #2783: [DOCS]Add docs for 0.8.0 release

2021-04-07 Thread GitBox



garyli1019 commented on pull request #2783:
URL: https://github.com/apache/hudi/pull/2783#issuecomment-815496613


   closing pr for now, will reopen once fixed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-07 Thread Aditya Tiwari (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Tiwari updated HUDI-1716:

Status: In Progress  (was: Open)

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] jintaoguan commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



jintaoguan commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609351211



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.commands.SparkMain.SparkCommand;
+import org.apache.hudi.cli.utils.InputStreamConsumer;
+import org.apache.hudi.cli.utils.SparkUtil;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.launcher.SparkLauncher;
+import org.apache.spark.util.Utils;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+import scala.collection.JavaConverters;
+
+@Component
+public class ClusteringCommand implements CommandMarker {
+
+  private static final Logger LOG = 
LogManager.getLogger(ClusteringCommand.class);
+
+  @CliCommand(value = "clustering schedule", help = "Schedule Clustering")
+  public String scheduleClustering(
+  @CliOption(key = "sparkMemory", help = "Spark executor memory",
+  unspecifiedDefaultValue = "1G") final String sparkMemory,
+  @CliOption(key = "propsFilePath", help = "path to properties file on 
localfs or dfs with configurations for hoodie client for clustering",
+  unspecifiedDefaultValue = "") final String propsFilePath,
+  @CliOption(key = "hoodieConfigs", help = "Any configuration that can be 
set in the properties file can be passed here in the form of an array",
+  unspecifiedDefaultValue = "") final String[] configs) throws 
Exception {
+HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();

Review comment:
   Good catch! Thanks.

##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.commands.SparkMain.SparkCommand;
+import org.apache.hudi.cli.utils.InputStreamConsumer;
+import org.apache.hudi.cli.utils.SparkUtil;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.launcher.SparkLauncher;
+import org.apache.spark.util.Utils;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+import scala.collection.JavaConverters;
+
+@Component
+public class ClusteringCommand implements CommandMarker {
+
+  private static final Logger LOG = 
LogManager.getLogger(ClusteringCommand.class);
+
+  @CliCommand(value = "clustering schedule", help = "Schedule Clustering")
+  public String scheduleClustering(
+  @CliOption(key = "spa

[GitHub] [hudi] n3nash commented on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops

2021-04-07 Thread GitBox



n3nash commented on pull request #2388:
URL: https://github.com/apache/hudi/pull/2388#issuecomment-815488402


   @satishkotha gentle reminder


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2785: [HUDI-1775] Add option for compaction parallelism

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2785:
URL: https://github.com/apache/hudi/pull/2785#issuecomment-815400787


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=h1) Report
   > Merging 
[#2785](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=desc) (a53b11e) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/3a926aacf6552fc06005db4a7880a233db904330?el=desc)
 (3a926aa) will **increase** coverage by `0.01%`.
   > The diff coverage is `94.11%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2785/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2785  +/-   ##
   
   + Coverage 47.05%   47.07%   +0.01% 
   - Complexity 3357 3359   +2 
   
 Files   484  484  
 Lines 2309423107  +13 
 Branches   2456 2457   +1 
   
   + Hits  1086810878  +10 
   - Misses1128011282   +2 
   - Partials946  947   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.94% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.77% <ø> (-0.02%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.71% <94.11%> (+0.12%)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `9.37% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `11.90% <0.00%> (-0.30%)` | `2.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | `89.42% <100.00%> (+0.35%)` | `11.00 <0.00> (ø)` | |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `88.00% <100.00%> (+0.32%)` | `17.00 <0.00> (+2.00)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2743: Do we have any TTL mechanism in Hudi?

2021-04-07 Thread GitBox



n3nash commented on issue #2743:
URL: https://github.com/apache/hudi/issues/2743#issuecomment-815473655


   @aditiwari01 Here is the ticket and is assigned to you for now :) BTW, there 
is some relevant work happening here https://github.com/apache/hudi/pull/2452. 
Please comment on the PR for further changes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash closed issue #2743: Do we have any TTL mechanism in Hudi?

2021-04-07 Thread GitBox



n3nash closed issue #2743:
URL: https://github.com/apache/hudi/issues/2743


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1777) Add SparkDatasource support for delete_partition API

2021-04-07 Thread Nishith Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1777:
--
Labels: feature-request sev:normal  (was: )

> Add SparkDatasource support for delete_partition API
> 
>
> Key: HUDI-1777
> URL: https://issues.apache.org/jira/browse/HUDI-1777
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: feature-request, sev:normal
>
> The `delete_partition` API is supported through the hoodie write client but 
> not through spark datasource, this ticket tracks the effort to add support 
> there.
> See [https://github.com/apache/hudi/pull/2452] for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1777) Add SparkDatasource support for delete_partition API

2021-04-07 Thread Nishith Agarwal (Jira)

Nishith Agarwal created HUDI-1777:
-

 Summary: Add SparkDatasource support for delete_partition API
 Key: HUDI-1777
 URL: https://issues.apache.org/jira/browse/HUDI-1777
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: Nishith Agarwal
Assignee: Aditya Tiwari


The `delete_partition` API is supported through the hoodie write client but not 
through spark datasource, this ticket tracks the effort to add support there.

See [https://github.com/apache/hudi/pull/2452] for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] n3nash commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.

2021-04-07 Thread GitBox



n3nash commented on issue #2623:
URL: https://github.com/apache/hudi/issues/2623#issuecomment-815470053


   @root18039532923 Let me know if your issue was resolved after backporting 
that PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 closed pull request #2785: [HUDI-1775] Add option for compaction parallelism

2021-04-07 Thread GitBox



danny0405 closed pull request #2785:
URL: https://github.com/apache/hudi/pull/2785


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



n3nash commented on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-815465310


   @ztcheck What changes did you make to the `run_sync_too.sh` ? Can you list 
out the jars you added to the classpath ?
   It seems like some of the classes should be packaged in the 
`hudi-hive-sync-bundle` but they are not. Once you provide the packages you 
added to your classpath, we can then see how to add those to the bundle ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage

2021-04-07 Thread GitBox



n3nash commented on issue #2692:
URL: https://github.com/apache/hudi/issues/2692#issuecomment-815463131


   @stackfun Can you respond to @vburenin question ? We can try to go from 
there..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash edited a comment on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage

2021-04-07 Thread GitBox



n3nash edited a comment on issue #2692:
URL: https://github.com/apache/hudi/issues/2692#issuecomment-815462588


   @vburenin Can you please open a JIRA ticket with the details on "huge data 
losses with hudi 0.5.0 and EMR" ? This seems super critical and I would like to 
know the issues ASAP, don't want to pollute this thread. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2692: [SUPPORT] Corrupt Blocks in Google Cloud Storage

2021-04-07 Thread GitBox



n3nash commented on issue #2692:
URL: https://github.com/apache/hudi/issues/2692#issuecomment-815462588


   @vburenin Can you please open a JIRA ticket with the details on "huge data 
losses with hudi 0.5.0 and EMR" ? I don't want to pollute this thread. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] aditiwari01 commented on issue #2743: Do we have any TTL mechanism in Hudi?

2021-04-07 Thread GitBox



aditiwari01 commented on issue #2743:
URL: https://github.com/apache/hudi/issues/2743#issuecomment-815462565


   @n3nash Thanks for the clarificatio. Can we create a jira for the same. I 
can't pick this right away but would try to conntribute as and when I get time. 
   Meanwhile I will try to directly use the low level api to unblock myself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-04-07 Thread Nishith Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1711:
--
Labels: sev:critical user-support-issues  (was: sev:triage 
user-support-issues)

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2, 
> struct,
>  false])) null else createexternalrow(if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].version.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].connector.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].name.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].ts_ms, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].snapshot.toString, if (input

[GitHub] [hudi] n3nash closed issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)

2021-04-07 Thread GitBox



n3nash closed issue #2705:
URL: https://github.com/apache/hudi/issues/2705


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2705: [SUPPORT] Can not read data schema using Spark3.0.2 on k8s with hudi-utilities (build in 2.12 and spark3)

2021-04-07 Thread GitBox



n3nash commented on issue #2705:
URL: https://github.com/apache/hudi/issues/2705#issuecomment-815461755


   Closing this issue since this requires a bug fix, please follow the JIRA 
above for updates/details. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-04-07 Thread Nishith Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal reassigned HUDI-1711:
-

Assignee: sivabalan narayanan

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2, 
> struct,
>  false])) null else createexternalrow(if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].version.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].connector.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].name.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].ts_ms, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].snapshot.toString, if (input[2, 
> struct,
>  false].isNullAt) null else i

[hudi] branch master updated: [MINOR] Some unit test code optimize (#2782)

2021-04-07 Thread wangxianghu

This is an automated email from the ASF dual-hosted git repository.

wangxianghu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 18459d4  [MINOR] Some unit test code optimize (#2782)
18459d4 is described below

commit 18459d4045ec4a85081c227893b226a4d759f84b
Author: Simon <3656...@qq.com>
AuthorDate: Thu Apr 8 13:35:03 2021 +0800

[MINOR] Some unit test code optimize (#2782)

* Optimized code

* Optimized code
---
 .../java/org/apache/hudi/utils/TestConcatenatingIterator.java| 9 +
 .../hudi/integ/testsuite/converter/TestUpdateConverter.java  | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java
 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java
index af4c4fb..fc591ed 100644
--- 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java
+++ 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils/TestConcatenatingIterator.java
@@ -23,6 +23,7 @@ import org.junit.jupiter.api.Test;
 
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.Iterator;
 import java.util.List;
 
@@ -36,8 +37,8 @@ public class TestConcatenatingIterator {
   @Test
   public void testConcatBasic() {
 Iterator i1 = Arrays.asList(5, 3, 2, 1).iterator();
-Iterator i2 = new ArrayList().iterator(); // empty 
iterator
-Iterator i3 = Arrays.asList(3).iterator();
+Iterator i2 = Collections.emptyIterator(); // empty iterator
+Iterator i3 = Collections.singletonList(3).iterator();
 
 ConcatenatingIterator ci = new 
ConcatenatingIterator<>(Arrays.asList(i1, i2, i3));
 List allElements = new ArrayList<>();
@@ -51,9 +52,9 @@ public class TestConcatenatingIterator {
 
   @Test
   public void testConcatError() {
-Iterator i1 = new ArrayList().iterator(); // empty 
iterator
+Iterator i1 = Collections.emptyIterator(); // empty iterator
 
-ConcatenatingIterator ci = new 
ConcatenatingIterator<>(Arrays.asList(i1));
+ConcatenatingIterator ci = new 
ConcatenatingIterator<>(Collections.singletonList(i1));
 assertFalse(ci.hasNext());
 try {
   ci.next();
diff --git 
a/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java
 
b/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java
index c48d1b1..e162448 100644
--- 
a/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java
+++ 
b/hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/converter/TestUpdateConverter.java
@@ -21,6 +21,7 @@ package org.apache.hudi.integ.testsuite.converter;
 import static junit.framework.TestCase.assertTrue;
 
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 
@@ -65,7 +66,7 @@ public class TestUpdateConverter {
 
 // 2. DFS converter reads existing records and generates random updates 
for the same row keys
 UpdateConverter updateConverter = new UpdateConverter(schemaStr, 
minPayloadSize,
-Arrays.asList("timestamp"), Arrays.asList("_row_key"));
+Collections.singletonList("timestamp"), 
Collections.singletonList("_row_key"));
 List insertRowKeys = inputRDD.map(r -> 
r.get("_row_key").toString()).collect();
 assertTrue(inputRDD.count() == 10);
 JavaRDD outputRDD = updateConverter.convert(inputRDD);
@@ -75,7 +76,7 @@ public class TestUpdateConverter {
 Map inputRecords = inputRDD.mapToPair(r -> new 
Tuple2<>(r.get("_row_key").toString(), r))
 .collectAsMap();
 List updateRecords = outputRDD.collect();
-updateRecords.stream().forEach(updateRecord -> {
+updateRecords.forEach(updateRecord -> {
   GenericRecord inputRecord = 
inputRecords.get(updateRecord.get("_row_key").toString());
   assertTrue(areRecordsDifferent(inputRecord, updateRecord));
 });
@@ -87,11 +88,11 @@ public class TestUpdateConverter {
*/
   private boolean areRecordsDifferent(GenericRecord in, GenericRecord up) {
 for (Field field : in.getSchema().getFields()) {
-  if (field.name() == "_row_key") {
+  if (field.name().equals("_row_key")) {
 continue;
   } else {
 // Just convert all types to string for now since all are primitive
-if (in.get(field.name()).toString() != 
up.get(field.name()).toString()) {
+if 
(!in.get(field.name()).toString().equals(up.get(field.name()).toString())) {
   return true;
 }
   }

[GitHub] [hudi] wangxianghu merged pull request #2782: [MINOR] Some unit test code optimize

2021-04-07 Thread GitBox



wangxianghu merged pull request #2782:
URL: https://github.com/apache/hudi/pull/2782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash closed issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-07 Thread GitBox



n3nash closed issue #2707:
URL: https://github.com/apache/hudi/issues/2707


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-07 Thread GitBox



n3nash commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-815460958


   @ssdong Thanks for opening the PR! Closing this issue now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on pull request #2783: [DOCS]Add docs for 0.8.0 release

2021-04-07 Thread GitBox



n3nash commented on pull request #2783:
URL: https://github.com/apache/hudi/pull/2783#issuecomment-815457616


   @garyli1019 The CI is failing, can you take a look ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2743: Do we have any TTL mechanism in Hudi?

2021-04-07 Thread GitBox



n3nash commented on issue #2743:
URL: https://github.com/apache/hudi/issues/2743#issuecomment-815456351


   @aditiwari01 I think you mentioned 2 issues here 
   
   1. Record level TTL -> We don't have such a feature in Hudi. Like others 
have pointed out, using the `hudiTable.deletePartitions()` API is a way to 
manage older partitions. Yes, you could partition based on _hoodie_commit_time 
or any other date based partitioning that structures your table to be eligible 
for deleting older partitions completely. 
   2. Duplicates across partitions -> If you have an update workload and are 
using the `upsert` API, yes, using a GlobalIndex will help eliminate duplicates 
for your table. 
   
   As @nsivabalan pointed out, we don't have such support out of the spark 
datasource but have a low level API as pointed above. We welcome contributions 
and would be good to add this support in spark datasource - let me know if you 
want to contribute this feature and we can guide you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (151b9d4) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc)
 (e970e1f) will **decrease** coverage by `42.94%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 151b9d4 differs from pull request most recent 
head a63cf5e. Consider uploading reports for the commit a63cf5e to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2645   +/-   ##
   
   - Coverage 52.32%   9.37%   -42.95% 
   + Complexity 3689  48 -3641 
   
 Files   483  54  -429 
 Lines 230951995-21100 
 Branches   2460 235 -2225 
   
   - Hits  12084 187-11897 
   + Misses 99421795 -8147 
   + Partials   1069  13 -1056 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (-60.33%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProces

[GitHub] [hudi] yanghua commented on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand

2021-04-07 Thread GitBox



yanghua commented on pull request #2325:
URL: https://github.com/apache/hudi/pull/2325#issuecomment-815419885


   > @wangxianghu: It's OK now.
   
   Thanks for your patience, I will do a final check soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yanghua commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer

2021-04-07 Thread GitBox



yanghua commented on pull request #2747:
URL: https://github.com/apache/hudi/pull/2747#issuecomment-815417193


   > @yanghua - I don't see the unit tests for the existing transformers except 
for two functions, I don't have time now to write unit tests, can I handle it 
in a separate pull request where I can write unit tests for all transformers?
   
   It's better to follow a unified contribution guide. If we can test it,  we 
should test it, so that we can make sure the code quality.
   
   > This is blocking my data pipelines, can we make an exception and merge 
this pull request? I'm happy to create a JIRA to track the unit tests for all 
transformers. thoughts?
   
   You can pick this patch into your inner branch. wdyt?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (46516da) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc)
 (e970e1f) will **decrease** coverage by `42.94%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 46516da differs from pull request most recent 
head f1d9ada. Consider uploading reports for the commit f1d9ada to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2645   +/-   ##
   
   - Coverage 52.32%   9.37%   -42.95% 
   + Complexity 3689  48 -3641 
   
 Files   483  54  -429 
 Lines 230951995-21100 
 Branches   2460 235 -2225 
   
   - Hits  12084 187-11897 
   + Misses 99421795 -8147 
   + Partials   1069  13 -1056 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (-60.33%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProces

[jira] [Created] (HUDI-1776) Support AlterCommand For Hoodie

2021-04-07 Thread pengzhiwei (Jira)

pengzhiwei created HUDI-1776:


 Summary: Support AlterCommand For Hoodie
 Key: HUDI-1776
 URL: https://issues.apache.org/jira/browse/HUDI-1776
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: pengzhiwei
Assignee: pengzhiwei
 Fix For: 0.9.0


Support AlterCommand for hoodie. The AlterCommand will change the 
hoodie.properites and metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] ssdong commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



ssdong commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-815413234


   @satishkotha Thank you for your review! I’ll take a look when I get back. 
Currently on a day trip. 😄 
   Basically, I wanna stop the abuse of `REQUESTED` here, at least for the 
insert overwrite writing operation, and separate it from `INFLIGHT`. With 
non-empty inflight commit files, we will suffer info loss. However, as you 
pointed out, this solution should also work against empty inflight files, I.e. 
clustering. 
   I consider this a start to clean up and clarify various commit file logics 
as we have another issue of creating completely empty `REQUESTED` commit files. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ssdong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



ssdong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609231425



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
   HoodieReplaceCommitMetadata replaceCommitMetadata = 
HoodieReplaceCommitMetadata
   
.fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(),
 HoodieReplaceCommitMetadata.class);
   
archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata));
+} else if (hoodieInstant.isInflight()) {
+  // inflight replacecommit files have the same meta data body as 
HoodieCommitMetadata

Review comment:
   Thanks for pointing that out. Will test against clustering and see what 
happens. If it doesn’t work, will find an alternative way. 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] susudong commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



susudong commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609229811



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
   HoodieReplaceCommitMetadata replaceCommitMetadata = 
HoodieReplaceCommitMetadata
   
.fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(),
 HoodieReplaceCommitMetadata.class);
   
archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata));
+} else if (hoodieInstant.isInflight()) {
+  // inflight replacecommit files have the same meta data body as 
HoodieCommitMetadata

Review comment:
   Thanks for pointing that out! Let me test it with clustering and see 
what happens. 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lw309637554 commented on pull request #2765: [HUDI-1716]: Resolving default values for schema from dataframe

2021-04-07 Thread GitBox



lw309637554 commented on pull request #2765:
URL: https://github.com/apache/hudi/pull/2765#issuecomment-815405833


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lw309637554 commented on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



lw309637554 commented on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-815405634


   @jintaoguan add some minor comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



lw309637554 commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609227257



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.commands.SparkMain.SparkCommand;
+import org.apache.hudi.cli.utils.InputStreamConsumer;
+import org.apache.hudi.cli.utils.SparkUtil;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.launcher.SparkLauncher;
+import org.apache.spark.util.Utils;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+import scala.collection.JavaConverters;
+
+@Component
+public class ClusteringCommand implements CommandMarker {
+
+  private static final Logger LOG = 
LogManager.getLogger(ClusteringCommand.class);
+
+  @CliCommand(value = "clustering schedule", help = "Schedule Clustering")
+  public String scheduleClustering(
+  @CliOption(key = "sparkMemory", help = "Spark executor memory",
+  unspecifiedDefaultValue = "1G") final String sparkMemory,
+  @CliOption(key = "propsFilePath", help = "path to properties file on 
localfs or dfs with configurations for hoodie client for clustering",
+  unspecifiedDefaultValue = "") final String propsFilePath,
+  @CliOption(key = "hoodieConfigs", help = "Any configuration that can be 
set in the properties file can be passed here in the form of an array",
+  unspecifiedDefaultValue = "") final String[] configs) throws 
Exception {
+HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
+String sparkPropertiesPath =
+
Utils.getDefaultPropertiesFile(JavaConverters.mapAsScalaMapConverter(System.getenv()).asScala());
+SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath);
+
+// First get a clustering instant time and pass it to spark launcher for 
scheduling clustering
+String clusteringInstantTime = HoodieActiveTimeline.createNewInstantTime();
+
+sparkLauncher.addAppArgs(SparkCommand.CLUSTERING_SCHEDULE.toString(), 
client.getBasePath(),
+client.getTableConfig().getTableName(), clusteringInstantTime, 
sparkMemory, propsFilePath);
+UtilHelpers.validateAndAddProperties(configs, sparkLauncher);
+Process process = sparkLauncher.launch();
+InputStreamConsumer.captureOutput(process);
+int exitCode = process.waitFor();
+if (exitCode != 0) {
+  return "Failed to schedule clustering for " + clusteringInstantTime;
+}
+return "Attempted to schedule clustering for " + clusteringInstantTime;
+  }
+
+  @CliCommand(value = "clustering run", help = "Run Clustering")
+  public String runClustering(
+  @CliOption(key = "parallelism", help = "Parallelism for hoodie 
clustering",
+  unspecifiedDefaultValue = "1") final String parallelism,
+  @CliOption(key = "sparkMemory", help = "Spark executor memory",
+  unspecifiedDefaultValue = "4G") final String sparkMemory,
+  @CliOption(key = "retry", help = "Number of retries",
+  unspecifiedDefaultValue = "1") final String retry,
+  @CliOption(key = "clusteringInstant", help = "Clustering instant time",
+  mandatory = true) final String clusteringInstantTime,
+  @CliOption(key = "propsFilePath", help = "path to properties file on 
localfs or dfs with configurations for hoodie client for compacting",
+  unspecifiedDefaultValue = "") final String propsFilePath,
+  @CliOption(key = "hoodieConfigs", help = "Any configuration that can be 
set in the properties file can be passed here in the form of an array",
+  unspecifiedDefaultValue = "") final String[] configs
+  ) throws Exception {
+HoodieTableMetaClie

[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



lw309637554 commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609227046



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.commands.SparkMain.SparkCommand;
+import org.apache.hudi.cli.utils.InputStreamConsumer;
+import org.apache.hudi.cli.utils.SparkUtil;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.launcher.SparkLauncher;
+import org.apache.spark.util.Utils;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+import scala.collection.JavaConverters;
+
+@Component
+public class ClusteringCommand implements CommandMarker {
+
+  private static final Logger LOG = 
LogManager.getLogger(ClusteringCommand.class);
+
+  @CliCommand(value = "clustering schedule", help = "Schedule Clustering")
+  public String scheduleClustering(
+  @CliOption(key = "sparkMemory", help = "Spark executor memory",
+  unspecifiedDefaultValue = "1G") final String sparkMemory,
+  @CliOption(key = "propsFilePath", help = "path to properties file on 
localfs or dfs with configurations for hoodie client for clustering",
+  unspecifiedDefaultValue = "") final String propsFilePath,
+  @CliOption(key = "hoodieConfigs", help = "Any configuration that can be 
set in the properties file can be passed here in the form of an array",
+  unspecifiedDefaultValue = "") final String[] configs) throws 
Exception {
+HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
+String sparkPropertiesPath =
+
Utils.getDefaultPropertiesFile(JavaConverters.mapAsScalaMapConverter(System.getenv()).asScala());
+SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath);
+
+// First get a clustering instant time and pass it to spark launcher for 
scheduling clustering
+String clusteringInstantTime = HoodieActiveTimeline.createNewInstantTime();
+
+sparkLauncher.addAppArgs(SparkCommand.CLUSTERING_SCHEDULE.toString(), 
client.getBasePath(),
+client.getTableConfig().getTableName(), clusteringInstantTime, 
sparkMemory, propsFilePath);
+UtilHelpers.validateAndAddProperties(configs, sparkLauncher);
+Process process = sparkLauncher.launch();
+InputStreamConsumer.captureOutput(process);
+int exitCode = process.waitFor();
+if (exitCode != 0) {
+  return "Failed to schedule clustering for " + clusteringInstantTime;
+}
+return "Attempted to schedule clustering for " + clusteringInstantTime;

Review comment:
   Succeed to schedule clustering for " + clusteringInstantTime




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



lw309637554 commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609224532



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ClusteringCommand.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.cli.commands.SparkMain.SparkCommand;
+import org.apache.hudi.cli.utils.InputStreamConsumer;
+import org.apache.hudi.cli.utils.SparkUtil;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.utilities.UtilHelpers;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.launcher.SparkLauncher;
+import org.apache.spark.util.Utils;
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+import scala.collection.JavaConverters;
+
+@Component
+public class ClusteringCommand implements CommandMarker {
+
+  private static final Logger LOG = 
LogManager.getLogger(ClusteringCommand.class);
+
+  @CliCommand(value = "clustering schedule", help = "Schedule Clustering")
+  public String scheduleClustering(
+  @CliOption(key = "sparkMemory", help = "Spark executor memory",
+  unspecifiedDefaultValue = "1G") final String sparkMemory,
+  @CliOption(key = "propsFilePath", help = "path to properties file on 
localfs or dfs with configurations for hoodie client for clustering",
+  unspecifiedDefaultValue = "") final String propsFilePath,
+  @CliOption(key = "hoodieConfigs", help = "Any configuration that can be 
set in the properties file can be passed here in the form of an array",
+  unspecifiedDefaultValue = "") final String[] configs) throws 
Exception {
+HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();

Review comment:
   why we do not need initfs just like compaction command?
   HoodieTableMetaClient client = checkAndGetMetaClient();
   boolean initialized = HoodieCLI.initConf();
   HoodieCLI.initFS(initialized);




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io commented on pull request #2785: [HUDI-1775] Add option for compaction parallelism

2021-04-07 Thread GitBox



codecov-io commented on pull request #2785:
URL: https://github.com/apache/hudi/pull/2785#issuecomment-815400787


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=h1) Report
   > Merging 
[#2785](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=desc) (4fca1f0) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/3a926aacf6552fc06005db4a7880a233db904330?el=desc)
 (3a926aa) will **decrease** coverage by `37.68%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2785/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2785   +/-   ##
   
   - Coverage 47.05%   9.37%   -37.69% 
   + Complexity 3357  48 -3309 
   
 Files   484  54  -430 
 Lines 230941995-21099 
 Branches   2456 235 -2221 
   
   - Hits  10868 187-10681 
   + Misses112801795 -9485 
   + Partials946  13  -933 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2785?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../org/apache/hudi/cli/commands/MetadataCommand.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL01ldGFkYXRhQ29tbWFuZC5qYXZh)
 | | | |
   | 
[...di/common/table/log/block/HoodieAvroDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVBdnJvRGF0YUJsb2NrLmphdmE=)
 | | | |
   | 
[...ache/hudi/cli/commands/ArchivedCommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0FyY2hpdmVkQ29tbWl0c0NvbW1hbmQuamF2YQ==)
 | | | |
   | 
[...rg/apache/hudi/common/model/HoodieAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUF2cm9QYXlsb2FkLmphdmE=)
 | | | |
   | 
[...di/hadoop/hive/HoodieCombineRealtimeFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | | | |
   | 
[...oning/compaction/CompactionV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uVjFNaWdyYXRpb25IYW5kbGVyLmphdmE=)
 | | | |
   | 
[.../hudi/table/format/cow/CopyOnWriteInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0NvcHlPbldyaXRlSW5wdXRGb3JtYXQuamF2YQ==)
 | | | |
   | 
[...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh)
 | | | |
   | 
[...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=)
 | | | |
   | 
[...g/apache/hudi/common/function/FunctionWrapper.java](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Z1bmN0aW9uL0Z1bmN0aW9uV3JhcHBlci5qYXZh)
 | | | |
   | ... and [418 
more](https://codecov.io/gh/apache/hudi/pull/2785/diff?src=pr&el=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1775) Add option for compaction parallelism

2021-04-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1775:
-
Labels: pull-request-available  (was: )

> Add option for compaction parallelism
> -
>
> Key: HUDI-1775
> URL: https://issues.apache.org/jira/browse/HUDI-1775
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] danny0405 opened a new pull request #2785: [HUDI-1775] Add option for compaction parallelism

2021-04-07 Thread GitBox



danny0405 opened a new pull request #2785:
URL: https://github.com/apache/hudi/pull/2785


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] lw309637554 commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



lw309637554 commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609215373



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1013,26 +1014,22 @@ public void testHoodieAsyncClusteringJob() throws 
Exception {
 HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
 deltaStreamerTestRunner(ds, cfg, (r) -> {
   TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs);
+  String scheduleClusteringInstantTime = 
HoodieActiveTimeline.createNewInstantTime();

Review comment:
   yes.  We also have a doc for async compaction usage
   
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1775) Add option for compaction parallelism

2021-04-07 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1775:
-
Issue Type: Task  (was: New Feature)

> Add option for compaction parallelism
> -
>
> Key: HUDI-1775
> URL: https://issues.apache.org/jira/browse/HUDI-1775
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1775) Add option for compaction parallelism

2021-04-07 Thread Danny Chen (Jira)

Danny Chen created HUDI-1775:


 Summary: Add option for compaction parallelism
 Key: HUDI-1775
 URL: https://issues.apache.org/jira/browse/HUDI-1775
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Flink Integration
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1674) add partition level delete DOC or example

2021-04-07 Thread liwei (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316820#comment-17316820
 ] 

liwei commented on HUDI-1674:
-

[~shivnarayan] spark datasource do not have the delete partition API. It need 
use the catalog.

https://stackoverflow.com/questions/52531327/drop-partitions-from-spark

After [https://github.com/apache/hudi/pull/2645] is landed, We can support 
'alter table xx drop partition  ()'

 

> add partition level delete DOC or example
> -
>
> Key: HUDI-1674
> URL: https://issues.apache.org/jira/browse/HUDI-1674
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Priority: Minor
>  Labels: docs, user-support-issues
> Attachments: image-2021-03-08-09-57-05-768.png
>
>
> !image-2021-03-08-09-57-05-768.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



zherenyu831 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609200500



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -245,7 +245,7 @@ public final void reset() {
   bootstrapIndex = null;
 
   // Initialize with new Hoodie timeline.
-  init(metaClient, getTimeline());
+  init(metaClient, metaClient.reloadActiveTimeline());

Review comment:
   I think the root problem is why we are calling reset() when close the 
timeline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



zherenyu831 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609192912



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -245,7 +245,7 @@ public final void reset() {
   bootstrapIndex = null;
 
   // Initialize with new Hoodie timeline.
-  init(metaClient, getTimeline());
+  init(metaClient, metaClient.reloadActiveTimeline());

Review comment:
   @satishkotha 
   Since this part will called after archival, and the archived commits still 
in the timeline.
   In the post process hudi will try to load the byte from them, and will cause 
IO error




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zherenyu831 commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



zherenyu831 commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609186934



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
pendingReplaceInstant) throws IOException {
+final HoodieInstant requestedInstant = 
HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp());
+
+Option content = 
metaClient.getActiveTimeline().getInstantDetails(requestedInstant);
+if (!content.isPresent() || content.get().length == 0) {
+  LOG.warn("No content found in requested file for instant " + 
pendingReplaceInstant);
+  return Option.of(new HoodieRequestedReplaceMetadata());

Review comment:
   I will try with what you suggested

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
pendingReplaceInstant) throws IOException {
+final HoodieInstant requestedInstant = 
HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp());
+
+Option content = 
metaClient.getActiveTimeline().getInstantDetails(requestedInstant);
+if (!content.isPresent() || content.get().length == 0) {
+  LOG.warn("No content found in requested file for instant " + 
pendingReplaceInstant);
+  return Option.of(new HoodieRequestedReplaceMetadata());

Review comment:
   Current logic is using `import 
org.apache.hudi.common.model.HoodieCommitMetadata.fromBytes()` to fetch empty 
deltacommit (bytes = []), and creating a new instance of metadata which bytes 
not empty. So I was thinking it may be better to keep same behaviour in 
ReplaceCommitRequestedInstant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#issuecomment-813928206


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=h1) Report
   > Merging 
[#2773](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=desc) (582e348) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/920537cac83d59ac05676fb952d5479c41adf757?el=desc)
 (920537c) will **increase** coverage by `17.31%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2773/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2773   +/-   ##
   =
   + Coverage 52.30%   69.61%   +17.31% 
   + Complexity 3689  373 -3316 
   =
 Files   483   54  -429 
 Lines 23099 1998-21101 
 Branches   2460  236 -2224 
   =
   - Hits  12082 1391-10691 
   + Misses 9949  475 -9474 
   + Partials   1068  132  -936 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.61% <0.00%> (-0.13%)` | `0.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2773?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/utilities/HoodieClusteringJob.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZUNsdXN0ZXJpbmdKb2IuamF2YQ==)
 | `62.50% <0.00%> (-2.72%)` | `9.00 <0.00> (ø)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.35%)` | `55.00% <0.00%> (-1.00%)` | |
   | 
[.../common/bloom/HoodieDynamicBoundedBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0hvb2RpZUR5bmFtaWNCb3VuZGVkQmxvb21GaWx0ZXIuamF2YQ==)
 | | | |
   | 
[...java/org/apache/hudi/common/fs/StorageSchemes.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1N0b3JhZ2VTY2hlbWVzLmphdmE=)
 | | | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | | | |
   | 
[...he/hudi/hadoop/SafeParquetRecordReaderWrapper.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL1NhZmVQYXJxdWV0UmVjb3JkUmVhZGVyV3JhcHBlci5qYXZh)
 | | | |
   | 
[...n/java/org/apache/hudi/common/HoodieCleanStat.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL0hvb2RpZUNsZWFuU3RhdC5qYXZh)
 | | | |
   | 
[.../hudi/common/config/SerializableConfiguration.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9TZXJpYWxpemFibGVDb25maWd1cmF0aW9uLmphdmE=)
 | | | |
   | 
[...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZURlbHRhU3RyZWFtZXJFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==)
 | | | |
   | ... and [421 
more](https://codecov.io/gh/apache/hudi/pull/2773/diff?src=pr&el=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm

[GitHub] [hudi] hddong commented on pull request #1946: [HUDI-1176]Upgrade tp log4j2

2021-04-07 Thread GitBox



hddong commented on pull request #1946:
URL: https://github.com/apache/hudi/pull/1946#issuecomment-815376490


   @wangxianghu: had upgrade to `2.13.3` and fix the warning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath

2021-04-07 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1750.
---
Resolution: Fixed

> Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into 
> spark classpath
> 
>
> Key: HUDI-1750
> URL: https://issues.apache.org/jira/browse/HUDI-1750
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-04-01-10-55-43-760.png
>
>
> Hudi use Class.forName(clazzName) to load user's class, which classloader is 
> same as call,see here:
> !image-2021-04-01-10-55-43-760.png!
> if user move hudi-spark-bundle jar into spark classPath, and use --jar to add 
> customer jars, then the caller classLoader will be AppClassLoader, and the 
> customer jars will be load by spark's MutableURLClassLoader, then lead to 
> ClassNotFoundException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer

2021-04-07 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1751.
---
Resolution: Fixed

> DeltaStream print many unnecessary warn log because of passing hoodie config 
> to kafka consumer
> --
>
> Key: HUDI-1751
> URL: https://issues.apache.org/jira/browse/HUDI-1751
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Because we add both kafka parameters and hudi configs at the same properties 
> file, such as kafka-source.properties, then when creating kafkaParams obj 
> will add some hoodie config also, which lead to the warn log printing:
> !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-07 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1749.
---
Resolution: Fixed

> Clean/Compaction/Rollback command maybe never exit when operation fail
> --
>
> Key: HUDI-1749
> URL: https://issues.apache.org/jira/browse/HUDI-1749
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> There are two issues:
> 1) After Clean/Compaction/Rollback command finish, yarn application will 
> always show fail because the command exit directly without waitting for 
> sparkContext stop.
> 2)when Clean/Compaction/Rollback command failed because of some exception, 
> the command will never exit because of sparkContext didn't stop. This is 
> because sparkUI use jetty, and introduce non-daemon thread, and 
> sparkContext.stop will stopUI to stop the non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] jintaoguan commented on a change in pull request #2773: [HUDI-1764] Add Hudi-CLI support for clustering

2021-04-07 Thread GitBox



jintaoguan commented on a change in pull request #2773:
URL: https://github.com/apache/hudi/pull/2773#discussion_r609161570



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1013,26 +1014,22 @@ public void testHoodieAsyncClusteringJob() throws 
Exception {
 HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
 deltaStreamerTestRunner(ds, cfg, (r) -> {
   TestHelpers.assertAtLeastNCommits(2, tableBasePath, dfs);
+  String scheduleClusteringInstantTime = 
HoodieActiveTimeline.createNewInstantTime();

Review comment:
   Sure I will make it compatible with the old usage mode. The behavior 
will be 
   1) if the user provides an instant time, we will use it to schedule 
clustering and return it to the user.
   2) if the user doesn't provide an instant time, we will generate one and 
return it to the user.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2770: [SUPPORT] How column _hoodie_is_deleted works?

2021-04-07 Thread GitBox



nsivabalan commented on issue #2770:
URL: https://github.com/apache/hudi/issues/2770#issuecomment-815315164


   Sorry, what feature you are looking for. can you please clarify. hudi 
automatically deletes those records which has "_hoodie_is_deleted" set to true. 
in other words, if you have a batch of write, with mixed set of records 
(inserts, updates, deletes), hudi will honor all 3. you just need to use 
"upsert" as your operation type. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #2770: [SUPPORT] How column _hoodie_is_deleted works?

2021-04-07 Thread GitBox



rubenssoto commented on issue #2770:
URL: https://github.com/apache/hudi/issues/2770#issuecomment-815296739


   @nsivabalan I think the error in on my side.
   
   I didn't filter the deleted records on the first batch, it could be a great 
feature to Hudi in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stackfun commented on issue #2771: [SUPPORT] Log files are not compacted

2021-04-07 Thread GitBox



stackfun commented on issue #2771:
URL: https://github.com/apache/hudi/issues/2771#issuecomment-815292886


   Setting the "hoodie.compaction.target.io" config worked like a charm. Thanks 
a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stackfun closed issue #2771: [SUPPORT] Log files are not compacted

2021-04-07 Thread GitBox



stackfun closed issue #2771:
URL: https://github.com/apache/hudi/issues/2771


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] satishkotha commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



satishkotha commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-815291682


   @ssdong  thanks for bringing this up and contributing. I added some 
comments, please take a look. Also, looks like there are some CI failures. 
Please fix those as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] satishkotha commented on a change in pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



satishkotha commented on a change in pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#discussion_r609094597



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -105,14 +114,15 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
 return archivedMetaWrapper;
   }
 
-  public static HoodieArchivedMetaEntry createMetaWrapper(HoodieInstant 
hoodieInstant,
-  HoodieCommitMetadata 
hoodieCommitMetadata) {
-HoodieArchivedMetaEntry archivedMetaWrapper = new 
HoodieArchivedMetaEntry();
-archivedMetaWrapper.setCommitTime(hoodieInstant.getTimestamp());
-archivedMetaWrapper.setActionState(hoodieInstant.getState().name());
-
archivedMetaWrapper.setHoodieCommitMetadata(convertCommitMetadata(hoodieCommitMetadata));
-archivedMetaWrapper.setActionType(ActionType.commit.name());
-return archivedMetaWrapper;
+  public static Option 
getRequestedReplaceMetadata(HoodieTableMetaClient metaClient, HoodieInstant 
pendingReplaceInstant) throws IOException {
+final HoodieInstant requestedInstant = 
HoodieTimeline.getReplaceCommitRequestedInstant(pendingReplaceInstant.getTimestamp());
+
+Option content = 
metaClient.getActiveTimeline().getInstantDetails(requestedInstant);
+if (!content.isPresent() || content.get().length == 0) {
+  LOG.warn("No content found in requested file for instant " + 
pendingReplaceInstant);
+  return Option.of(new HoodieRequestedReplaceMetadata());

Review comment:
   Why not return Option.empty() and skip archival for this case?

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
##
@@ -72,9 +76,14 @@ public static HoodieArchivedMetaEntry 
createMetaWrapper(HoodieInstant hoodieInst
   HoodieReplaceCommitMetadata replaceCommitMetadata = 
HoodieReplaceCommitMetadata
   
.fromBytes(metaClient.getActiveTimeline().getInstantDetails(hoodieInstant).get(),
 HoodieReplaceCommitMetadata.class);
   
archivedMetaWrapper.setHoodieReplaceCommitMetadata(ReplaceArchivalHelper.convertReplaceCommitMetadata(replaceCommitMetadata));
+} else if (hoodieInstant.isInflight()) {
+  // inflight replacecommit files have the same meta data body as 
HoodieCommitMetadata

Review comment:
   We also use replacecommit for 'clustering' operation. Clustering has 
empty replacecommit.inflight file, so this may not work

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -245,7 +245,7 @@ public final void reset() {
   bootstrapIndex = null;
 
   // Initialize with new Hoodie timeline.
-  init(metaClient, getTimeline());
+  init(metaClient, metaClient.reloadActiveTimeline());

Review comment:
   IIUC, this is breaking some fundamental assumptions.  There are many 
places where we pass "trimmed" timeline for time-travel queries etc. You are 
replacing that with all instants from active timeline, which is not desired.
   
   Is this change needed if we handle empty partitionToReplaceFileIds in 
archival?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kvallala commented on issue #2528: [SUPPORT] Spark read hudi data from hive (metastore)

2021-04-07 Thread GitBox



kvallala commented on issue #2528:
URL: https://github.com/apache/hudi/issues/2528#issuecomment-815182803


   We are having the same issue. It works with 
`spark.sql.hive.convertMetastoreParquet=false`  when querying Hudi table from 
spark session, but see duplicates when querying through external hive 
metastore.  Could you pls suggest the required configuration to be set for 
external Hive Metastore so it works when querying from Hue/spark SQL or any 
other speed layer (like Dremio) that connects to Hive Metastore to query the 
Hudi tables 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ze-engineering-code-challenge commented on pull request #2665: [HUDI-1160] Support update partial fields for CoW table

2021-04-07 Thread GitBox



ze-engineering-code-challenge commented on pull request #2665:
URL: https://github.com/apache/hudi/pull/2665#issuecomment-815168500


   Hello
   
   @liujinhui1994 Should I enable any option to work?
   
   Im trying to do an upsert in a Hudi table with 0.8.0 version and didn't work 
:(
   
   Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema 
mismatch: Avro field 'cf_categoria' not found


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vingov commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer

2021-04-07 Thread GitBox



vingov commented on pull request #2747:
URL: https://github.com/apache/hudi/pull/2747#issuecomment-815167427


   @yanghua - I don't see the unit tests for the existing transformers except 
for two functions, I don't have time now to write unit tests, can I handle it 
in a separate pull request where I can write unit tests for all transformers?
   
   This is blocking my data pipelines, can we make an exception and merge this 
pull request?  I'm happy to create a JIRA to track the unit tests for all 
transformers. thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io commented on pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



codecov-io commented on pull request #2784:
URL: https://github.com/apache/hudi/pull/2784#issuecomment-815166346


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=h1) Report
   > Merging 
[#2784](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=desc) (5572b9f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/920537cac83d59ac05676fb952d5479c41adf757?el=desc)
 (920537c) will **decrease** coverage by `42.93%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2784/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2784   +/-   ##
   
   - Coverage 52.30%   9.37%   -42.94% 
   + Complexity 3689  48 -3641 
   
 Files   483  54  -429 
 Lines 230991995-21104 
 Branches   2460 235 -2225 
   
   - Hits  12082 187-11895 
   + Misses 99491795 -8154 
   + Partials   1068  13 -1055 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (-60.38%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2784?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2784/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc

[GitHub] [hudi] ssdong commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-07 Thread GitBox



ssdong commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-815156685


   Hi @satishkotha @jsbali ! I've created the pull request for this issue. Had 
observed more when going down the road and I've tried my best to clarify them 
and hopefully had written a detailed enough description for the PR. Let me 
know. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ssdong opened a new pull request #2784: [HUDI-1740] Fix insert-overwrite API archival

2021-04-07 Thread GitBox



ssdong opened a new pull request #2784:
URL: https://github.com/apache/hudi/pull/2784


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   ### Summary:
   This pull request is to fix the archival logic within insert overwrite API 
against requested & inflight commit files.
   
   ### Issue:
   `0.7.0` throw exception `Caused by: java.lang.IllegalArgumentException: 
Positive number of partitions required` while
   `0.9.0-SNAPSHOT`(latest master branch) adds 
`java.util.NoSuchElementException: No value present in Option` on top of it 
when hudi tries to archive replace commit files(`COMPLETED`, `REQUESTED` and 
`INFLIGHT`)
   
   Please checkout issue https://github.com/apache/hudi/issues/2707 and ticket 
https://issues.apache.org/jira/browse/HUDI-1740 for further information about 
the above 2 exceptions. 
   
   The inner causes are somewhat sophisticated, and I've tried my best to 
understand them and applied the fix for it, instead of giving it a one-line 
tricky fix or patch and see errors gone. Of course, the approaches I am taking 
are open to discussions. 
   
   ### Fixes
   1. `Positive number of partitions required` error is easier to be fixed 
where we just have to filter out empty `partitionToReplaceFileIds` for 
`COMPLETED` replace commit files within `ReplaceArchivalHelper.java`.
   2.  `java.util.NoSuchElementException: No value present in Option` is much 
more complicated and it happened due to a call to 
`ClusteringUtils.getRequestedReplaceMetadata()` against _both_ `REQUESTED` and 
`INFLIGHT` commit files for retrieving their meta body. Now, I get the idea 
that we are encouraged to use existing utils classes for code reuse. However, a 
closer inspection upon `getRequestedReplaceMetadata` shows that clustering 
feature retrieves the meta for `INFLIGHT` commit file through a `REQUESTED` 
instant. Although this is _not_ fundamentally wrong since there is no 
"clustering plan" for both `REQUESTED` and `INFLIGHT` replace commit files so 
the outcome is the same for both, as it is also being pointed out in the 
comment within `getRequestedReplaceMetadata`. However, since `REQUESTED` 
instant is empty(there is a corresponding 
[ticket](https://issues.apache.org/jira/browse/HUDI-1740) for it), it generates 
an `Option.empty()` which is being fetched by `.get()` later which t
 riggered the `NoSuchElementException`. What's more, it _loses_ information in 
`INFLIGHT` commit file when fetching via `REQUESTED` instant and so we observed 
in the following screenshot that:
   https://user-images.githubusercontent.com/3754011/113918516-7e937a80-981d-11eb-84b6-e2c4bec2c3b1.png";>
   
   This does not make sense to me when we pretty much _abuse_ the `REQUESTED` 
concept to deal with `INFLIGHT` with `REQUESTED` itself being empty. I've taken 
the approach to define an extra field(placeholder) for `INFLIGHT` and reuse the 
`HoodieCommitMetadata` for deserializing `INFLIGHT` since they share the same 
structure where `COMPLETED` replace commit extends `HoodieCommitMetadata` for 
the extra `partitionToReplaceFileIds` field. Now, here's what I gained after 
adopting this strategy:
   https://user-images.githubusercontent.com/3754011/113919657-cff03980-981e-11eb-928e-65c719d15ca5.png";>
   
   ### The overall outcome after the fix on latest master branch:
   https://user-images.githubusercontent.com/3754011/113919768-ee563500-981e-11eb-80e8-fe62ba868709.png";>
   
   Let me know if there is anything I am missing.  :)
   
   _The simplest solution to the 2nd issue is to actually having 
`ClusteringUtils.getRequestedReplaceMetadata` return `Option.of(new 
HoodieRequestedReplaceMetadata())` upon retrieving the empty `REQUESTED` 
replace commit file for both `REQUESTED` and `INFLIGHT`. I didn't choose to fix 
the problem this way fearing it's basically altering the inappropriate approach 
by giving it a bandage and stopping the bleeding._
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
 - *Fix insert overwrite API archival*
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from

[GitHub] [hudi] nsivabalan commented on a change in pull request #2783: [DOCS]Add docs for 0.8.0 release

2021-04-07 Thread GitBox



nsivabalan commented on a change in pull request #2783:
URL: https://github.com/apache/hudi/pull/2783#discussion_r608886318



##
File path: docs/_docs/0.8.0/1_1_spark_quick_start_guide.md
##
@@ -0,0 +1,530 @@
+---
+version: 0.8.0
+title: "Quick-Start Guide"
+permalink: /docs/spark_quick-start-guide.html
+toc: true
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+This guide provides a quick peek at Hudi's capabilities using spark-shell. 
Using Spark datasources, we will walk through 
+code snippets that allows you to insert and update a Hudi table of default 
table type: 
+[Copy on Write](/docs/concepts.html#copy-on-write-table). 
+After each write operation we will also show how to read the data both 
snapshot and incrementally.
+# Scala example
+
+## Setup
+
+Hudi works with Spark-2.x & Spark 3.x versions. You can follow instructions 
[here](https://spark.apache.org/downloads.html) for setting up spark. 

Review comment:
   fix min versions here for spark2

##
File path: docs/_docs/0.8.0/0_3_migration_guide.md
##
@@ -0,0 +1,72 @@
+---
+version: 0.8.0
+title: Migration Guide
+keywords: hudi, migration, use case
+permalink: /docs/migration_guide.html
+summary: In this page, we will discuss some available tools for migrating your 
existing table into a Hudi table
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+Hudi maintains metadata such as commit timeline and indexes to manage a table. 
The commit timelines helps to understand the actions happening on a table as 
well as the current state of a table. Indexes are used by Hudi to maintain a 
record key to file id mapping to efficiently locate a record. At the moment, 
Hudi supports writing only parquet columnar formats.
+To be able to start using Hudi for your existing table, you will need to 
migrate your existing table into a Hudi managed table. There are a couple of 
ways to achieve this.
+
+
+## Approaches
+
+
+### Use Hudi for new partitions alone
+
+Hudi can be used to manage an existing table without affecting/altering the 
historical data already present in the
+table. Hudi has been implemented to be compatible with such a mixed table with 
a caveat that either the complete
+Hive partition is Hudi managed or not. Thus the lowest granularity at which 
Hudi manages a table is a Hive
+partition. Start using the datasource API or the WriteClient to write to the 
table and make sure you start writing
+to a new partition or convert your last N partitions into Hudi instead of the 
entire table. Note, since the historical
+ partitions are not managed by HUDI, none of the primitives provided by HUDI 
work on the data in those partitions. More concretely, one cannot perform 
upserts or incremental pull on such older partitions not managed by the HUDI 
table.
+Take this approach if your table is an append only type of table and you do 
not expect to perform any updates to existing (or non Hudi managed) partitions.
+
+
+### Convert existing table to Hudi
+
+Import your existing table into a Hudi managed table. Since all the data is 
Hudi managed, none of the limitations
+ of Approach 1 apply here. Updates spanning any partitions can be applied to 
this table and Hudi will efficiently
+ make the update available to queries. Note that not only do you get to use 
all Hudi primitives on this table,
+ there are other additional advantages of doing this. Hudi automatically 
manages file sizes of a Hudi managed table
+ . You can define the desired file size when converting this table and Hudi 
will ensure it writes out files
+ adhering to the config. It will also ensure that smaller files later get 
corrected by routing some new inserts into
+ small files rather than writing new small ones thus maintaining the health of 
your cluster.
+
+There are a few options when choosing this approach.
+
+**Option 1**

Review comment:
   shouldn't we also briefly talk about bootstrap here as one of the 
options? 

##
File path: docs/_docs/0.8.0/1_1_spark_quick_start_guide.md
##
@@ -0,0 +1,530 @@
+---
+version: 0.8.0
+title: "Quick-Start Guide"
+permalink: /docs/spark_quick-start-guide.html
+toc: true
+last_modified_at: 2019-12-30T15:59:57-04:00
+---
+
+This guide provides a quick peek at Hudi's capabilities using spark-shell. 
Using Spark datasources, we will walk through 
+code snippets that allows you to insert and update a Hudi table of default 
table type: 
+[Copy on Write](/docs/concepts.html#copy-on-write-table). 
+After each write operation we will also show how to read the data both 
snapshot and incrementally.
+# Scala example
+
+## Setup
+
+Hudi works with Spark-2.x & Spark 3.x versions. You can follow instructions 
[here](https://spark.apache.org/downloads.html) for setting up spark. 
+From the extracted directory run spark-shell with Hudi as:
+
+```scala
+// spark-shell
+spark-shell \
+  --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.7.0,org.apache.spark:spark-avro_2.12:3.0.1
 \
+  --conf 'spark.s

[jira] [Updated] (HUDI-1740) insert_overwrite_table and insert_overwrite first replacecommit has empty partitionToReplaceFileIds

2021-04-07 Thread Susu Dong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susu Dong updated HUDI-1740:

Description: 
insert_overwrite_table and insert_overwrite first replacecommit has empty 
partitionToReplaceFileIds which messes up archival code. 

Fix

The code needs to only proceed if partitionToReplaceFileIds is not Empty.

 

Updates: Archival also breaks upon requested/inflight commit in 0.9.0-SNAPSHOT. 
It wasn't an issue in 0.7.0 so this Jira ticket is fixing two things. Please 
refer to the detailed description in the PR.

  was:
insert_overwrite_table and insert_overwrite first replacecommit has empty 
partitionToReplaceFileIds which messes up archival code. 

Fix

The code needs to only proceed if partitionToReplaceFileIds is not Empty.


> insert_overwrite_table and insert_overwrite first replacecommit has empty 
> partitionToReplaceFileIds
> ---
>
> Key: HUDI-1740
> URL: https://issues.apache.org/jira/browse/HUDI-1740
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jagmeet Bali
>Assignee: Susu Dong
>Priority: Minor
>  Labels: pull-request-available
>
> insert_overwrite_table and insert_overwrite first replacecommit has empty 
> partitionToReplaceFileIds which messes up archival code. 
> Fix
> The code needs to only proceed if partitionToReplaceFileIds is not Empty.
>  
> Updates: Archival also breaks upon requested/inflight commit in 
> 0.9.0-SNAPSHOT. It wasn't an issue in 0.7.0 so this Jira ticket is fixing two 
> things. Please refer to the detailed description in the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1739) insert_overwrite_table and insert_overwrite create empty replacecommit.requested file which breaks archival

2021-04-07 Thread Susu Dong (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susu Dong reassigned HUDI-1739:
---

Assignee: Susu Dong

> insert_overwrite_table and insert_overwrite create empty 
> replacecommit.requested file which breaks archival
> ---
>
> Key: HUDI-1739
> URL: https://issues.apache.org/jira/browse/HUDI-1739
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jagmeet Bali
>Assignee: Susu Dong
>Priority: Minor
>
> Fixes can be to 
>  # Ignore empty replacecommit.requested files.
>  # Standardise the replacecommit.requested format across all invocations be 
> it from clustering or this use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1774) Add support or delete_partition with spark ds

2021-04-07 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1774:
-

Assignee: liwei

> Add support or delete_partition with spark ds 
> --
>
> Key: HUDI-1774
> URL: https://issues.apache.org/jira/browse/HUDI-1774
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: liwei
>Priority: Major
>
> I see we have added support for delete_partitions at write client. but we 
> don't have the support with spark datasource. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on issue #2743: Do we have any TTL mechanism in Hudi?

2021-04-07 Thread GitBox



nsivabalan commented on issue #2743:
URL: https://github.com/apache/hudi/issues/2743#issuecomment-815015923


   @lw309637554 @satishkotha : fyi we are yet to add spark ds support for this 
"delete_partition" operation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1674) add partition level delete DOC or example

2021-04-07 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316432#comment-17316432
 ] 

sivabalan narayanan commented on HUDI-1674:
---

[~309637554]: we are yet to add this operation to spark ds: 
https://issues.apache.org/jira/browse/HUDI-1774

> add partition level delete DOC or example
> -
>
> Key: HUDI-1674
> URL: https://issues.apache.org/jira/browse/HUDI-1674
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Priority: Minor
>  Labels: docs, user-support-issues
> Attachments: image-2021-03-08-09-57-05-768.png
>
>
> !image-2021-03-08-09-57-05-768.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited

2021-04-07 Thread GitBox



nsivabalan commented on issue #2399:
URL: https://github.com/apache/hudi/issues/2399#issuecomment-815011207


   btw, we have filed a feature request to support reusing existing hudi 
configs https://issues.apache.org/jira/browse/HUDI-1640 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-1760) Incorrect Documentation for HoodieWriteConfigs

2021-04-07 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li reassigned HUDI-1760:
-

Assignee: Gary Li

> Incorrect Documentation for HoodieWriteConfigs
> --
>
> Key: HUDI-1760
> URL: https://issues.apache.org/jira/browse/HUDI-1760
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Pratyaksh Sharma
>Assignee: Gary Li
>Priority: Major
>
> GH Issue - https://github.com/apache/hudi/issues/2760



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] BenjMaq commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited

2021-04-07 Thread GitBox



BenjMaq commented on issue #2399:
URL: https://github.com/apache/hudi/issues/2399#issuecomment-814990246


   Just want to add that I faced the same issue. For me, the problem was 
related to the option 
`.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, 
"org.apache.hudi.keygen.ComplexKeyGenerator") that `I set for the `UPSERT` but 
not for the `DELETE`.
   
   Thanks @afeldman1 for reverting back to explain the fix, it led me on the 
right track.
   
   btw, I'm also wondering why this fails?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer

2021-04-07 Thread Gary Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Li updated HUDI-73:

Fix Version/s: (was: 0.8.0)
   0.9.0

> Support vanilla Avro Kafka Source in HoodieDeltaStreamer
> 
>
> Key: HUDI-73
> URL: https://issues.apache.org/jira/browse/HUDI-73
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> Context : [https://github.com/uber/hudi/issues/597]
> Currently, Avro Kafka Source expects the installation to use Confluent 
> version with SchemaRegistry server running. We need to support the Kafka 
> installations which do not use Schema Registry by allowing 
> FileBasedSchemaProvider to be integrated to AvroKafkaSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1774) Add support or delete_partition with spark ds

2021-04-07 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-1774:
-

 Summary: Add support or delete_partition with spark ds 
 Key: HUDI-1774
 URL: https://issues.apache.org/jira/browse/HUDI-1774
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Spark Integration
Reporter: sivabalan narayanan


I see we have added support for delete_partitions at write client. but we don't 
have the support with spark datasource. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] garyli1019 opened a new pull request #2783: [DOCS]Add docs for 0.8.0 release

2021-04-07 Thread GitBox



garyli1019 opened a new pull request #2783:
URL: https://github.com/apache/hudi/pull/2783


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] li36909 commented on pull request #2754: [HUDI-1751] Remove irrelevant properties from passing to kafkaConsumer which in turn prints lot of warn logs

2021-04-07 Thread GitBox



li36909 commented on pull request #2754:
URL: https://github.com/apache/hudi/pull/2754#issuecomment-814932095


   @n3nash @pratyakshsharma thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] li36909 commented on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-07 Thread GitBox



li36909 commented on pull request #2752:
URL: https://github.com/apache/hudi/pull/2752#issuecomment-814931307


   @n3nash  thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] li36909 commented on pull request #2753: [HUDI-1750] Fail to load user's class if user move hudi-spark-bundle jar into spark classpath

2021-04-07 Thread GitBox



li36909 commented on pull request #2753:
URL: https://github.com/apache/hudi/pull/2753#issuecomment-814930600


   @nsivabalan thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-792430670


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=h1) Report
   > Merging 
[#2645](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=desc) (647e322) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc)
 (e970e1f) will **increase** coverage by `17.40%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2645/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2645   +/-   ##
   =
   + Coverage 52.32%   69.72%   +17.40% 
   + Complexity 3689  373 -3316 
   =
 Files   483   54  -429 
 Lines 23095 1995-21100 
 Branches   2460  235 -2225 
   =
   - Hits  12084 1391-10693 
   + Misses 9942  473 -9469 
   + Partials   1069  131  -938 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.72% <ø> (+0.03%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2645?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../versioning/compaction/CompactionPlanMigrator.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY29tcGFjdGlvbi9Db21wYWN0aW9uUGxhbk1pZ3JhdG9yLmphdmE=)
 | | | |
   | 
[...ecution/datasources/Spark3ParsePartitionUtil.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvc3Bhcmsvc3FsL2V4ZWN1dGlvbi9kYXRhc291cmNlcy9TcGFyazNQYXJzZVBhcnRpdGlvblV0aWwuc2NhbGE=)
 | | | |
   | 
[...rg/apache/hudi/cli/commands/CompactionCommand.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbXBhY3Rpb25Db21tYW5kLmphdmE=)
 | | | |
   | 
[...di/hadoop/BootstrapColumnStichingRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0Jvb3RzdHJhcENvbHVtblN0aWNoaW5nUmVjb3JkUmVhZGVyLmphdmE=)
 | | | |
   | 
[...org/apache/hudi/common/util/SpillableMapUtils.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU3BpbGxhYmxlTWFwVXRpbHMuamF2YQ==)
 | | | |
   | 
[...ain/scala/org/apache/hudi/HoodieBootstrapRDD.scala](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJERC5zY2FsYQ==)
 | | | |
   | 
[...he/hudi/common/model/EmptyHoodieRecordPayload.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0VtcHR5SG9vZGllUmVjb3JkUGF5bG9hZC5qYXZh)
 | | | |
   | 
[...i/table/format/cow/Int64TimestampColumnReader.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L0ludDY0VGltZXN0YW1wQ29sdW1uUmVhZGVyLmphdmE=)
 | | | |
   | 
[...e/hudi/table/format/mor/MergeOnReadTableState.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvbW9yL01lcmdlT25SZWFkVGFibGVTdGF0ZS5qYXZh)
 | | | |
   | 
[.../hudi/table/format/cow/ParquetSplitReaderUtil.java](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvY293L1BhcnF1ZXRTcGxpdFJlYWRlclV0aWwuamF2YQ==)
 | | | |
   | ... and [409 
more](https://codecov.io/gh/apache/hudi/pull/2645/diff?src=pr&el=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For

[GitHub] [hudi] codecov-io edited a comment on pull request #2765: [HUDI-1716]: Resolving default values for schema from dataframe

2021-04-07 Thread GitBox



codecov-io edited a comment on pull request #2765:
URL: https://github.com/apache/hudi/pull/2765#issuecomment-813008111






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a change in pull request #2769: [HUDI-1762] Added HiveStylePartitionExtractor to support Hive style partitions

2021-04-07 Thread GitBox



nsivabalan commented on a change in pull request #2769:
URL: https://github.com/apache/hudi/pull/2769#discussion_r608577463



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveStylePartitionValueExtractor.java
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hive;
+
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Extractor for Hive Style Partitioned tables, when the parition folders are 
key value pairs.
+ *
+ * This implementation extracts the partition value of -mm-dd from the 
path of type datestr=-mm-dd.
+ */
+public class HiveStylePartitionValueExtractor implements 
PartitionValueExtractor {
+  private static final long serialVersionUID = 1L;
+
+  @Override
+  public List extractPartitionValuesInPath(String partitionPath) {
+// partition path is expected to be in this format 
partition_key=partition_value.
+String[] splits = partitionPath.split("=");

Review comment:
   may be I am being very nit picky. but is there a chance that 
partitionpath has "=" in its field name? I know it does not make sense. 
anyways. 
   partitionpath field name = "datestr" in your example.
   can it be "partition=path" ? 
   If yes, Collections.singletonList(splits[1]) at line 40 might break right? 
   @n3nash : do we make any such assumptions in hudi's code base wrt partition 
path's field name? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #2720: [HUDI-1719]hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-04-07 Thread GitBox



nsivabalan commented on pull request #2720:
URL: https://github.com/apache/hudi/pull/2720#issuecomment-814844457


   @xushiyan : I see you have disabled test in 
TestHoodieCombineHiveInputFormat. can you help explain the reason. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io commented on pull request #2782: [MINOR] ut code optimize

2021-04-07 Thread GitBox



codecov-io commented on pull request #2782:
URL: https://github.com/apache/hudi/pull/2782#issuecomment-814831752


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=h1) Report
   > Merging 
[#2782](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=desc) (ca38e68) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e926c1a45ca95fa1911f6f88a0577554f2797760?el=desc)
 (e926c1a) will **decrease** coverage by `41.35%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2782/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2782   +/-   ##
   
   - Coverage 50.73%   9.37%   -41.36% 
   + Complexity 3064  48 -3016 
   
 Files   419  54  -365 
 Lines 187971995-16802 
 Branches   1922 235 -1687 
   
   - Hits   9536 187 -9349 
   + Misses 84851795 -6690 
   + Partials776  13  -763 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.37% <ø> (-60.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2782?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2782/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlc

[jira] [Closed] (HUDI-1773) HoodieFileGroup code optimize

2021-04-07 Thread vinoyang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1773.
--
Resolution: Done

3a926aacf6552fc06005db4a7880a233db904330

> HoodieFileGroup code optimize
> -
>
> Key: HUDI-1773
> URL: https://issues.apache.org/jira/browse/HUDI-1773
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 谢波
>Assignee: 谢波
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> optimize HoodieFileGroup  getAllFileSlicesIncludingInflight and 
> getAllFileSlices
> remove unused import.
> import java.util.Map;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1773) HoodieFileGroup code optimize

2021-04-07 Thread vinoyang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1773:
---
Fix Version/s: 0.9.0

> HoodieFileGroup code optimize
> -
>
> Key: HUDI-1773
> URL: https://issues.apache.org/jira/browse/HUDI-1773
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 谢波
>Assignee: 谢波
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> optimize HoodieFileGroup  getAllFileSlicesIncludingInflight and 
> getAllFileSlices
> remove unused import.
> import java.util.Map;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch master updated: [HUDI-1773] HoodieFileGroup code optimize (#2781)

2021-04-07 Thread vinoyang

This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a926aa  [HUDI-1773] HoodieFileGroup code optimize (#2781)
3a926aa is described below

commit 3a926aacf6552fc06005db4a7880a233db904330
Author: hiscat <46845236+mylanpan...@users.noreply.github.com>
AuthorDate: Wed Apr 7 18:16:03 2021 +0800

[HUDI-1773] HoodieFileGroup code optimize (#2781)
---
 .../main/java/org/apache/hudi/common/model/HoodieFileGroup.java| 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java
index 849f08e..6979c30 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java
@@ -26,7 +26,6 @@ import org.apache.hudi.common.util.collection.Pair;
 import java.io.Serializable;
 import java.util.Comparator;
 import java.util.List;
-import java.util.Map;
 import java.util.TreeMap;
 import java.util.stream.Collectors;
 import java.util.stream.Stream;
@@ -133,7 +132,7 @@ public class HoodieFileGroup implements Serializable {
* Get all the the file slices including in-flight ones as seen in 
underlying file-system.
*/
   public Stream getAllFileSlicesIncludingInflight() {
-return fileSlices.entrySet().stream().map(Map.Entry::getValue);
+return fileSlices.values().stream();
   }
 
   /**
@@ -148,7 +147,7 @@ public class HoodieFileGroup implements Serializable {
*/
   public Stream getAllFileSlices() {
 if (!timeline.empty()) {
-  return 
fileSlices.entrySet().stream().map(Map.Entry::getValue).filter(this::isFileSliceCommitted);
+  return fileSlices.values().stream().filter(this::isFileSliceCommitted);
 }
 return Stream.empty();
   }
@@ -182,7 +181,7 @@ public class HoodieFileGroup implements Serializable {
* Obtain the latest file slice, upto an instantTime i.e < maxInstantTime.
* 
* @param maxInstantTime Max Instant Time
-   * @return
+   * @return the latest file slice
*/
   public Option getLatestFileSliceBefore(String maxInstantTime) {
 return Option.fromJavaOptional(getAllFileSlices().filter(

[GitHub] [hudi] yanghua merged pull request #2781: [HUDI-1773] HoodieFileGroup code optimize

2021-04-07 Thread GitBox



yanghua merged pull request #2781:
URL: https://github.com/apache/hudi/pull/2781


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-1772) HoodieFileGroupId compareTo logical error(fileId self compare)

2021-04-07 Thread vinoyang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1772.
--
Resolution: Fixed

f4f9dd9d83a6a852c0e733802c6c49747cde5531

> HoodieFileGroupId compareTo logical error(fileId self compare)
> --
>
> Key: HUDI-1772
> URL: https://issues.apache.org/jira/browse/HUDI-1772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: 谢波
>Assignee: 谢波
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch master updated (dadd081 -> f4f9dd9)

2021-04-07 Thread vinoyang

This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from dadd081  [HUDI-1751] DeltaStreamer print many unnecessary warn log 
(#2754)
 add f4f9dd9  [HUDI-1772] HoodieFileGroupId compareTo logical error(fileId 
self compare) (#2780)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/common/model/HoodieFileGroupId.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[GitHub] [hudi] yanghua merged pull request #2780: [HUDI-1772] HoodieFileGroupId compareTo logical error(fileId self compare)

2021-04-07 Thread GitBox



yanghua merged pull request #2780:
URL: https://github.com/apache/hudi/pull/2780


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



ztcheck edited a comment on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442


   @n3nash,
   Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` .
   I use the default value `HUDI_HIVE_UBER_JAR`  in the script, such like ' 
HUDI_HIVE_UBER_JAR=`ls -c 
$DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep 
-v source | head -1` '


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



ztcheck edited a comment on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442


   @n3nash,
   Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` .
   I use the default value `HUDI_HIVE_UBER_JAR`  in the script.Such like ' 
HUDI_HIVE_UBER_JAR=`ls -c 
$DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep 
-v source | head -1` '


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



ztcheck edited a comment on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-800020976


   My environment is k8s.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ztcheck edited a comment on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



ztcheck edited a comment on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442


   @n3nash,
   Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` .
   I use the default value `HUDI_HIVE_UBER_JAR`  in the script.Just like ' 
HUDI_HIVE_UBER_JAR=`ls -c 
$DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep 
-v source | head -1` '


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ztcheck commented on issue #2680: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-07 Thread GitBox



ztcheck commented on issue #2680:
URL: https://github.com/apache/hudi/issues/2680#issuecomment-814772442


   @n3nash,
   Yes,`hudi-hive-sync-bundle` already in the script `run_sync_tool .sh` .
   I use the default value `HUDI_HIVE_UBER_JAR`  in the script.Just like 
`HUDI_HIVE_UBER_JAR=`ls -c 
$DIR/../../packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-*.jar | grep 
-v source | head -1``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 128 matches

Mail list logo