[jira] [Created] (HUDI-1551) Support Partition with BigDecimal field

2021-01-25 Thread Chanh Le (Jira)
Chanh Le created HUDI-1551:
--

 Summary: Support Partition with BigDecimal field
 Key: HUDI-1551
 URL: https://issues.apache.org/jira/browse/HUDI-1551
 Project: Apache Hudi
  Issue Type: New Feature
  Components: newbie
Reporter: Chanh Le
 Fix For: 0.7.0


In my data the time indicator field is in BigDecimal -> due to trading data 
related so need to records in more precision than normal.

I would like to add support to partition based on this field type for 
TimestampBasedKeyGenerator.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] shenh062326 commented on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-01-25 Thread GitBox


shenh062326 commented on pull request #2382:
URL: https://github.com/apache/hudi/pull/2382#issuecomment-767297371


   > @shenh062326 Thanks for your contribution, would you please add some tests 
to verify the java client functionally?
   
   Add TestJavaCopyOnWriteActionExecutor.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


vinothchandar commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499


   This is now out in the 0.7.0 release. 
   
   See 
https://github.com/apache/hudi/blame/release-0.7.0/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L183
 this test for examples



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2021-01-25 Thread GitBox


nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175


   @bvaradar : guess you missed to follow up on this thread. can you check it 
out and respond when you can. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2021-01-25 Thread GitBox


nsivabalan commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596


   @vinothchandar @umehrot2 : can either of you respond here wrt metadata 
support(rfc-15) in Athena. when can we possibly expect. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox


jingweiz2017 commented on issue #1971:
URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422


   @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by 
bvaradar should work for me case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox


wangxianghu commented on a change in pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#discussion_r563537637



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>
+partitionColumns.map(e => s"$e:SIMPLE").mkString(",")

Review comment:
   we can not simply put `SIMPLE` and `partitionBy` field together. Since 
when user use `CustomKeyGenerator ` and the partitionpath field is of timestamp 
type,  the str after the `partitionBy` field should be `TIMESTAMP`

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>
+partitionColumns.map(e => s"$e:SIMPLE").mkString(",")

Review comment:
   > @wangxianghu Thank you for your review. My opinion is this:In 
accordance with the habit of using Spark, the partition field value 
corresponding to partitionBy is the original value, so the default is to use 
SIMPLE. If we automatically infer whether to use TIMESTAMP based on the field 
type, the rules are not easy to determine. For example, if a field is long, we 
Do you need to convert to TIMESTAMP? If you want to convert, but the value is 
not a timestamp, an error will be reported, so SIMPLE is used by default. If 
you want to use TIMESTAMP, users can directly use 
`hoodie.datasource.write.partitionpath. field`Go to specify
   
   yes, I get your point. we'd better support both `SIMPLE` and `TIMESTAMP` 
type patitionpath in a unified way





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox


nsivabalan closed issue #1958:
URL: https://github.com/apache/hudi/issues/1958


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-759677298







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox


rubenssoto commented on issue #2484:
URL: https://github.com/apache/hudi/issues/2484#issuecomment-767143513







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2021-01-25 Thread GitBox


nsivabalan commented on issue #1982:
URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667


   @Ac-Rush : would you mind update the ticket. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2488: 0.7.0 Doc Revamp

2021-01-25 Thread GitBox


vinothchandar commented on pull request #2488:
URL: https://github.com/apache/hudi/pull/2488#issuecomment-767158167


   I am going to also cut the release versions for the doc, once I finalize 
everything w.r.t the release. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox


nsivabalan commented on issue #1958:
URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126


   https://github.com/apache/hudi/pull/1978 have fixed it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-25 Thread GitBox


Karl-WangSK commented on pull request #2260:
URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660


   cc @yanghua 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


nsivabalan commented on a change in pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151



##
File path: 
hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+
+/**
+ * Payload used in index table for Hoodie Record level index.
+ */
+public class HoodieRecordLevelIndexPayload implements 
HoodieRecordPayload {
+
+  private String key;
+  private String partitionPath;
+  private String instantTime;
+  private String fileId;
+
+  public HoodieRecordLevelIndexPayload(Option record) {
+if (record.isPresent()) {
+  // This can be simplified using SpecificData.deepcopy once this bug is 
fixed
+  // https://issues.apache.org/jira/browse/AVRO-1811
+  key = record.get().get("key").toString();
+  partitionPath = record.get().get("partitionPath").toString();
+  instantTime = record.get().get("instantTime").toString();
+  fileId = record.get().get("fileId").toString();
+}
+  }
+
+  private HoodieRecordLevelIndexPayload(String key, String partitionPath, 
String instantTime, String fileId) {
+this.key = key;
+this.partitionPath = partitionPath;
+this.instantTime = instantTime;
+this.fileId = fileId;
+  }
+
+  @Override
+  public HoodieRecordLevelIndexPayload 
preCombine(HoodieRecordLevelIndexPayload another) {
+if (this.instantTime.compareTo(another.instantTime) >= 0) {

Review comment:
   Note: this needs some fixing . Can we just convert the string to long 
and compare. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto closed issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox


rubenssoto closed issue #2484:
URL: https://github.com/apache/hudi/issues/2484


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format

2021-01-25 Thread GitBox


codecov-io commented on pull request #2486:
URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=h1) Report
   > Merging 
[#2486](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=desc) (5476bf0) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/c4afd179c1983a382b8a5197d800b0f5dba254de?el=desc)
 (c4afd17) will **decrease** coverage by `1.27%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2486/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2486  +/-   ##
   
   - Coverage 50.18%   48.90%   -1.28% 
   + Complexity 3050 2155 -895 
   
 Files   419  266 -153 
 Lines 1893112041-6890 
 Branches   1948 1133 -815 
   
   - Hits   9500 5889-3611 
   + Misses 8656 5715-2941 
   + Partials775  437 -338 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.47% <ø> (-0.03%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `?` | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[.../hadoop/realtime/RealtimeUnmergedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lVW5tZXJnZWRSZWNvcmRSZWFkZXIuamF2YQ==)
 | | | |
   | 
[...hudi/utilities/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9GaWxlYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | | | |
   | 
[...ties/exception/HoodieIncrementalPullException.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...in/java/org/apache/hudi/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvU2NoZW1hUHJvdmlkZXIuamF2YQ==)
 | | | |
   | 
[...udi/utilities/schema/DelegatingSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9EZWxlZ2F0aW5nU2NoZW1hUHJvdmlkZXIuamF2YQ==)
 | | | |
   | 
[...adoop/realtime/RealtimeBootstrapBaseFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQm9vdHN0cmFwQmFzZUZpbGVTcGxpdC5qYXZh)
 | | | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | | | |
   | 
[...hadoop/realtime/RealtimeCompactedRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lQ29tcGFjdGVkUmVjb3JkUmVhZGVyLmphdmE=)
 | | | |
   | 
[...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh)
 | | | |
   | ... and [142 
more](https://codecov.io/gh/apache/hudi/pull/2486/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to 

[GitHub] [hudi] vinothchandar commented on pull request #2442: Adding new configurations in 0.7.0

2021-01-25 Thread GitBox


vinothchandar commented on pull request #2442:
URL: https://github.com/apache/hudi/pull/2442#issuecomment-767102394


   Will close this and open a new one



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2430:
URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox


vinothchandar commented on pull request #2485:
URL: https://github.com/apache/hudi/pull/2485#issuecomment-766593559


   cc @garyli1019 mind taking a first pass at this PR? :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2443:
URL: https://github.com/apache/hudi/pull/2443#issuecomment-760147630







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2486:
URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


codecov-io commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=h1) Report
   > Merging 
[#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=desc) (8b07157) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2487   +/-   ##
   =
   + Coverage 50.18%   69.43%   +19.24% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9500 1340 -8160 
   + Misses 8656  456 -8200 
   + Partials775  134  -641 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==)
 | | | |
   | 
[.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==)
 | | | |
   | 
[...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==)
 | | | |
   | 
[...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | | | |
   | 
[.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=)
 | | | |
   | 
[...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh)
 | | | |
   | 
[...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh)
 | | | |
   | ... and [356 
more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


nsivabalan commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986


   @garyli1019 : can you give any updates you have on on this regard. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


vinothchandar closed issue #2013:
URL: https://github.com/apache/hudi/issues/2013


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] kirkuz commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-25 Thread GitBox


kirkuz commented on issue #2323:
URL: https://github.com/apache/hudi/issues/2323#issuecomment-766649165


   Hi @nsivabalan,
   
   I think we can close this issue for now. I've changed from GLOBAL_BLOOM to 
SIMPLE index with static partition keys, cause GLOBAL_BLOOM was too slow in my 
use case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox


vinothchandar commented on issue #2484:
URL: https://github.com/apache/hudi/issues/2484#issuecomment-767154231







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox


nsivabalan commented on issue #1971:
URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636


   @jingweiz2017 : can you please check above response and let us know if you 
need anything more from Hudi community. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar closed pull request #2442: Adding new configurations in 0.7.0

2021-01-25 Thread GitBox


vinothchandar closed pull request #2442:
URL: https://github.com/apache/hudi/pull/2442


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on pull request #2283: [HUDI-1415] Incorrect query result for hudi hive table when using spa…

2021-01-25 Thread GitBox


rubenssoto commented on pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#issuecomment-767117951


   I had the same problem, but I saw less rows not more.
   Reading with spark datasource I have more than 30 million rows and using 
spark sql with hive only 4 million.
   
   I had this problem only these two options are enabled
   
"spark.sql.hive.convertMetastoreParquet": "false"
"spark.hadoop.hoodie.metadata.enable": "true"
   
   @pengzhiwei2018 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on pull request #1880: [WIP] [HUDI-1125] build framework to support structured streaming

2021-01-25 Thread GitBox


pengzhiwei2018 commented on pull request #1880:
URL: https://github.com/apache/hudi/pull/1880#issuecomment-766562247


   > Hello,
   > 
   > Hudi will have nice features like clustering and clustering probably will 
rewrite a lot of data, so is it possible this rewrites without new data doesn't 
affect downstream consumer of spark structured streaming?
   > 
   > It is something like delta lake has on compaction operation
   > 
   > https://docs.delta.io/latest/best-practices.html
   > 
   > On compaction has .option("dataChange", "false"), so the downstream 
consumer won't be affected.
   > 
   > Thank you.
   
   Hi @leesf  @n3nash @rubenssoto A new PR has proposed at 
https://github.com/apache/hudi/pull/2485, we can move the discuss there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox


teeyog commented on a change in pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#discussion_r563598187



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>
+partitionColumns.map(e => s"$e:SIMPLE").mkString(",")

Review comment:
   @wangxianghu  Thank you for your review. My opinion is this:In 
accordance with the habit of using Spark, the partition field value 
corresponding to partitionBy is the original value, so the default is to use 
SIMPLE. If we automatically infer whether to use TIMESTAMP based on the field 
type, the rules are not easy to determine. For example, if a field is long, we 
Do you need to convert to TIMESTAMP? If you want to convert, but the value is 
not a timestamp, an error will be reported, so SIMPLE is used by default. If 
you want to use TIMESTAMP, users can directly use 
```hoodie.datasource.write.partitionpath. field```Go to specify

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>
+partitionColumns.map(e => s"$e:SIMPLE").mkString(",")

Review comment:
   Yes, now if the parameters include ```TIMESTAMP_TYPE_FIELD_PROP``` and 
```TIMESTAMP_OUTPUT_DATE_FORMAT_PROP```, TIMESTAMP is used by default, 
otherwise SIMPLE





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox


vinothchandar commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-766590769







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha commented on a change in pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-25 Thread GitBox


satishkotha commented on a change in pull request #2483:
URL: https://github.com/apache/hudi/pull/2483#discussion_r563962124



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
##
@@ -198,6 +198,31 @@ class TestCOWDataSource extends HoodieClientTestBase {
   .mode(SaveMode.Append)
   .save(basePath)
 
+val records2 = recordsToStrings(dataGen.generateInserts("002", 5)).toList
+val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+inputDF2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.INSERT_OVERWRITE_OPERATION_OPT_VAL)
+  .mode(SaveMode.Append)
+  .save(basePath)
+
+val metaClient = new 
HoodieTableMetaClient(spark.sparkContext.hadoopConfiguration, basePath, true)
+val commits = 
metaClient.getActiveTimeline.filterCompletedInstants().getInstants.toArray
+  .map(instant => (instant.asInstanceOf[HoodieInstant]).getAction)
+assertEquals(2, commits.size)
+assertEquals("commit", commits(0))
+assertEquals("replacecommit", commits(1))

Review comment:
   Hi, Can you also read back the records and verify that only records2 
show up. (data in records1  doesnt show up)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2111: [HUDI-1234] Insert new records to data files without merging for "Insert" operation.

2021-01-25 Thread GitBox


vinothchandar commented on pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#issuecomment-767103157


   @nsivabalan I thought we were going to get this in to 0.7.0? checked back 
again, to see why this was missing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-25 Thread GitBox


vburenin commented on pull request #2476:
URL: https://github.com/apache/hudi/pull/2476#issuecomment-766947415


   Can anybody merge this PR, please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cadl closed issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2021-01-25 Thread GitBox


cadl closed issue #2063:
URL: https://github.com/apache/hudi/issues/2063


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2204: [SUPPORT] Hive count(*) query on _rt table failing with exception

2021-01-25 Thread GitBox


nsivabalan commented on issue #2204:
URL: https://github.com/apache/hudi/issues/2204#issuecomment-766437535


   @BalaMahesh : Would you mind updating the ticket. We will close this out in 
a weeks time if there are no activity. But feel free to re-open or create a new 
ticket if you have more questions/issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-25 Thread GitBox


nsivabalan commented on issue #2284:
URL: https://github.com/apache/hudi/issues/2284#issuecomment-766436747


   @sanket-khedikar : can you please respond if the suggested approaches work 
for you. or you still need more enhancements from Hudi? If it's solved, would 
appreciate if you can close this ticket.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox


nsivabalan commented on issue #2330:
URL: https://github.com/apache/hudi/issues/2330#issuecomment-766433970


   @vinothchandar  @borislitvak : since we have a tracking jira, do you think 
we can close this? or is there anything pending to be resolved or discussed. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-25 Thread GitBox


nsivabalan closed issue #2429:
URL: https://github.com/apache/hudi/issues/2429


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan merged pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-25 Thread GitBox


xushiyan merged pull request #2478:
URL: https://github.com/apache/hudi/pull/2478


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar merged pull request #2481: [MINOR] Removing spring repos from pom

2021-01-25 Thread GitBox


vinothchandar merged pull request #2481:
URL: https://github.com/apache/hudi/pull/2481


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] git-raj commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-25 Thread GitBox


git-raj commented on issue #2284:
URL: https://github.com/apache/hudi/issues/2284#issuecomment-766523668


   using AWS Glue pySpark and Hudi and S3 as data store: i'm trying to do the 
traditional SCD Type 2 where old record gets updated with the insert datetime 
on 'effective to' field, 'isActive' field becomes 'false', and new row is 
inserted with the insert datetime in 'effective from' field with 'isActive' 
becoming 'true'. Any solution post, or pointers to solve that if possible is 
highly appreciated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-25 Thread GitBox


nsivabalan commented on issue #2323:
URL: https://github.com/apache/hudi/issues/2323#issuecomment-766435871


   @Kirkuz: Do you have any updates in this regard. Can you please respond or 
let us know if you have more questions. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-25 Thread GitBox


nsivabalan closed issue #2480:
URL: https://github.com/apache/hudi/issues/2480


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto edited a comment on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox


rubenssoto edited a comment on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox


vinothchandar commented on issue #2330:
URL: https://github.com/apache/hudi/issues/2330#issuecomment-766441408


   we can close this out 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-25 Thread GitBox


vinothchandar commented on issue #2479:
URL: https://github.com/apache/hudi/issues/2479#issuecomment-766369989


   Great. No thank you for catching :). eventually as m2 caches are lost, I 
think build would have failed. may be month or so from now :). Will merge the 
fix 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox


rubenssoto commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox


zherenyu831 commented on issue #2285:
URL: https://github.com/apache/hudi/issues/2285#issuecomment-766482729


   @bvaradar 
   Hi Bavaradar, it will be little difficult to replicate the problem, since it 
only happens on huge amount of data.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2382:
URL: https://github.com/apache/hudi/pull/2382#issuecomment-751367927







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox


nsivabalan edited a comment on issue #2285:
URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364


   @zherenyu831 : can you please respond with any updates on your end. 
   @n3nash : can you please take a look when you have time. If you were able to 
narrow down the issue, please do file a jira and add "user-support-issues" 
label. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2021-01-25 Thread GitBox


nsivabalan commented on issue #2135:
URL: https://github.com/apache/hudi/issues/2135#issuecomment-766439085


   @andaag : I have created a Hudi ticket for this. Feel free to update the 
desc of the ticket with more details
   https://issues.apache.org/jira/browse/HUDI-1549
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2123: Timestamp not parsed correctly on Athena

2021-01-25 Thread GitBox


nsivabalan commented on issue #2123:
URL: https://github.com/apache/hudi/issues/2123#issuecomment-766439219


   @satishkotha : when you get a chance, can you please follow up on this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox


nsivabalan commented on issue #2285:
URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364


   @zherenyu831 : can you please respond with any updates on your end. 
   @n3nash : can you take a look when you have time. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2467: [Travis issue] TestJsonStringToHoodieRecordMapFunction.testMapFunction failed

2021-01-25 Thread GitBox


nsivabalan commented on issue #2467:
URL: https://github.com/apache/hudi/issues/2467#issuecomment-766427684


   Have created a tracking jira https://issues.apache.org/jira/browse/HUDI-1547
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-25 Thread GitBox


nsivabalan commented on issue #2121:
URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932


   @liujinhui1994 : We already have an [example in our 
HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java#L101)
 using array type. Let us know if this helps.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar closed issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox


vinothchandar closed issue #2330:
URL: https://github.com/apache/hudi/issues/2330


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-25 Thread GitBox


nsivabalan commented on issue #2429:
URL: https://github.com/apache/hudi/issues/2429#issuecomment-766428773


   @vinothchandar : closing this for now. feel free to re-open if you see more 
issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox


vinothchandar commented on issue #2285:
URL: https://github.com/apache/hudi/issues/2285#issuecomment-766450275


   cc @garyli1019 as well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited

2021-01-25 Thread GitBox


nsivabalan commented on issue #2399:
URL: https://github.com/apache/hudi/issues/2399#issuecomment-766431496


   @afeldman1 : can you respond when you can. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox


codecov-io commented on pull request #2485:
URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=h1) Report
   > Merging 
[#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=desc) (91cf083) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **decrease** coverage by `40.49%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2485/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2485   +/-   ##
   
   - Coverage 50.18%   9.68%   -40.50% 
   + Complexity 3050  48 -3002 
   
 Files   419  53  -366 
 Lines 189311930-17001 
 Branches   1948 230 -1718 
   
   - Hits   9500 187 -9313 
   + Misses 86561730 -6926 
   + Partials775  13  -762 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] nsivabalan edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-25 Thread GitBox


nsivabalan edited a comment on issue #2100:
URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534


   @n3nash @bhasudha : sorry the thread is bit long, so couldn't gauge 
correctly. I see some workarounds have been proposed and it worked. But do we 
need to fixes in Hudi in general? if yes, can you file a jira and close this 
out. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2329: [SUPPORT] Time Travel (querying the historical versions of data) ability for Hudi Table

2021-01-25 Thread GitBox


nsivabalan commented on issue #2329:
URL: https://github.com/apache/hudi/issues/2329#issuecomment-766435383


   https://issues.apache.org/jira/browse/HUDI-1460
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-25 Thread GitBox


nsivabalan commented on issue #2480:
URL: https://github.com/apache/hudi/issues/2480#issuecomment-766427153


   Sure, will take it up. 
   Closing it as we have a tracking jira. 
https://issues.apache.org/jira/browse/HUDI-1546
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-25 Thread GitBox


nsivabalan edited a comment on issue #2121:
URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932


   @liujinhui1994 : Sorry about the delay. We already have an [example in our 
HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java#L101)
 using array type. Let us know if this helps.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

2021-01-25 Thread GitBox


nsivabalan commented on issue #2066:
URL: https://github.com/apache/hudi/issues/2066#issuecomment-766449665


   @KarthickAN : did you get a chance to try out the suggestion from Balaji. 
please do update the issue w/ any updates. If the issue is resolved, feel free 
to close it out. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP

2021-01-25 Thread GitBox


nsivabalan commented on issue #2367:
URL: https://github.com/apache/hudi/issues/2367#issuecomment-766431687


   Sure. sorry about the delay. will get to this in a day or two. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2485:
URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2331: Why does Hudi not support field deletions?

2021-01-25 Thread GitBox


nsivabalan commented on issue #2331:
URL: https://github.com/apache/hudi/issues/2331#issuecomment-766432877


   @prashantwason : In lieu of this ticket, do you think we can update our 
documentation wrt schema evolution. If you don't mind can you take it up and 
fix our documentation.  https://issues.apache.org/jira/browse/HUDI-1548



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2021-01-25 Thread GitBox


nsivabalan commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-766438221


   @KarthickAN : hope you got a chance to go through our [blog on indexes in 
Hudi](https://hudi.apache.org/blog/hudi-indexing-mechanisms/). Wrt this gh 
issue, please do let us know if you have any more specific questions. If not, 
will close this out in a weeks time. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-25 Thread GitBox


nsivabalan commented on issue #2100:
URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534


   @n3nash @bhasudha : sorry the thread is bit long. I see some workarounds 
have been proposed and it worked. But do we need to fixes in Hudi in general? 
if yes, can you file a jira and close this out. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2021-01-25 Thread GitBox


nsivabalan commented on issue #2063:
URL: https://github.com/apache/hudi/issues/2063#issuecomment-766449860


   @cadl : did you get a chance to try out the setting? We plan to close out 
this issue due to inactivity in a weeks time. But feel free to reopen to create 
a new ticket if you find any more issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lshg opened a new issue #2490: spark read hudi data from hive

2021-01-25 Thread GitBox


lshg opened a new issue #2490:
URL: https://github.com/apache/hudi/issues/2490


   package com.gjr.recommend
   
   import org.apache.spark.sql.hive.HiveContext
   import org.apache.spark.sql.{Row, SparkSession}
   import org.apache.spark.{SparkConf, SparkContext}
   
   object DWDTenderLog {
 def main(args: Array[String]): Unit = {
   
   
   
   val conf = new 
SparkConf().setAppName(this.getClass.getSimpleName).setMaster("local[2]").set("spark.executor.memory",
 "512m")
   val sc: SparkContext = new SparkContext(conf)
   
   val spark: SparkSession = 
SparkSession.builder().config(conf).getOrCreate()
   
   
   val hc = new HiveContext(sc)
   hc.setConf("spark.sql.crossJoin.enabled","true");
   
   
   val tenderLog: Array[Row] = hc.sql(
 """
   |  SELECT
   |projectid,
   |provinceid,
   |typeId,
   |tender_tag
   |FROM
   |(
   |SELECT
   |projectid,
   |provinceid,
   |typeId,
   |antistop
   |FROM
   |app.dwd_recommend_tender_ds
   |WHERE
   |createTime >= 1608280608479 AND createTime <= 1611628847000
   |AND antistop != ''
   |GROUP BY
   |projectid,
   |provinceid,
   |typeId,
   |antistop
   |) AS a lateral VIEW explode (split(antistop, "#")) table_tmp AS 
tender_tag
   """.stripMargin).collect()
   
   
   println(tenderLog.toBuffer)
   
   
   sc.stop()
   
 }
   }
   
   0[main] INFO  org.apache.spark.SparkContext  - Running Spark version 
2.4.7
   346  [main] INFO  org.apache.spark.SparkContext  - Submitted application: 
DWDTenderLog$
   390  [main] INFO  org.apache.spark.SecurityManager  - Changing view acls to: 
lsh
   390  [main] INFO  org.apache.spark.SecurityManager  - Changing modify acls 
to: lsh
   390  [main] INFO  org.apache.spark.SecurityManager  - Changing view acls 
groups to: 
   390  [main] INFO  org.apache.spark.SecurityManager  - Changing modify acls 
groups to: 
   391  [main] INFO  org.apache.spark.SecurityManager  - SecurityManager: 
authentication disabled; ui acls disabled; users  with view permissions: 
Set(lsh); groups with view permissions: Set(); users  with modify permissions: 
Set(lsh); groups with modify permissions: Set()
   2533 [main] INFO  org.apache.spark.util.Utils  - Successfully started 
service 'sparkDriver' on port 54347.
   2575 [main] INFO  org.apache.spark.SparkEnv  - Registering MapOutputTracker
   2588 [main] INFO  org.apache.spark.SparkEnv  - Registering BlockManagerMaster
   2589 [main] INFO  org.apache.spark.storage.BlockManagerMasterEndpoint  - 
Using org.apache.spark.storage.DefaultTopologyMapper for getting topology 
information
   2590 [main] INFO  org.apache.spark.storage.BlockManagerMasterEndpoint  - 
BlockManagerMasterEndpoint up
   2596 [main] INFO  org.apache.spark.storage.DiskBlockManager  - Created local 
directory at 
C:\Users\lsh\AppData\Local\Temp\blockmgr-d134fb11-0552-4b4b-8f20-ea7e04fd086d
   2609 [main] INFO  org.apache.spark.storage.memory.MemoryStore  - MemoryStore 
started with capacity 1979.1 MB
   2619 [main] INFO  org.apache.spark.SparkEnv  - Registering 
OutputCommitCoordinator
   2675 [main] INFO  org.spark_project.jetty.util.log  - Logging initialized 
@23630ms
   2720 [main] INFO  org.spark_project.jetty.server.Server  - 
jetty-9.3.z-SNAPSHOT, build timestamp: 2019-02-16T00:53:49+08:00, git hash: 
eb70b240169fcf1abbd86af36482d1c49826fa0b
   2731 [main] INFO  org.spark_project.jetty.server.Server  - Started @23687ms
   2747 [main] INFO  org.spark_project.jetty.server.AbstractConnector  - 
Started ServerConnector@4d63b624{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
   2747 [main] INFO  org.apache.spark.util.Utils  - Successfully started 
service 'SparkUI' on port 4040.
   2767 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started o.s.j.s.ServletContextHandler@27eb3298{/jobs,null,AVAILABLE,@Spark}
   2768 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started o.s.j.s.ServletContextHandler@1b58ff9e{/jobs/json,null,AVAILABLE,@Spark}
   2768 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started o.s.j.s.ServletContextHandler@2f66e802{/jobs/job,null,AVAILABLE,@Spark}
   2769 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started 
o.s.j.s.ServletContextHandler@76318a7d{/jobs/job/json,null,AVAILABLE,@Spark}
   2770 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started o.s.j.s.ServletContextHandler@2a492f2a{/stages,null,AVAILABLE,@Spark}
   2770 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started 
o.s.j.s.ServletContextHandler@3277e499{/stages/json,null,AVAILABLE,@Spark}
   2771 [main] INFO  org.spark_project.jetty.server.handler.ContextHandler  - 
Started 
o.s.j.s.ServletContextHandler@585811a4{/stages/stage,null,AVAILABLE,@Spark}
   2772 [main] 

[GitHub] [hudi] lshg opened a new issue #2489: [SUPPORT]

2021-01-25 Thread GitBox


lshg opened a new issue #2489:
URL: https://github.com/apache/hudi/issues/2489


   hive (app)> SELECT
 > projectid,
 > provinceid,
 > typeId,
 > antistop
 > FROM
 > app.dwd_recommend_tender_ds
 > WHERE
 > createTime >= 1608280608479 
 > AND antistop != '' limit 2;
   OK
   projectid   provinceid  typeid  antistop
   7876350 15  9   装修#扩建
   7876350 15  9   装修#扩建
   Time taken: 0.133 seconds, Fetched: 2 row(s)
   
   it is work !
   
   but !
   
   hive (app)> select count(1) as total from app.dwd_recommend_tender_ds;
   Query ID = root_20210126113448_acb65504-b31b-4309-9d65-39c48743326e
   Total jobs = 1
   Launching Job 1 out of 1
   Tez session was closed. Reopening...
   Session re-established.
   
--
   VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING 
 FAILED  KILLED  
   
--
   Map 1container  INITIALIZING -1  00   -1 
  0   0  
   Reducer 2containerINITED  1  001 
  0   0  
   
--
   VERTICES: 00/02  [>>--] 0%ELAPSED TIME: 0.55 s   
  
   
--
   Status: Failed
   Vertex failed, vertexName=Map 1, vertexId=vertex_1611208773582_0084_1_00, 
diagnostics=[Vertex vertex_1611208773582_0084_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: dwd_recommend_tender_ds initializer 
failed, vertex=vertex_1611208773582_0084_1_00 [Map 1], 
java.lang.NoSuchMethodError: 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getCommitsTimeline()Lorg/apache/hudi/common/table/HoodieTimeline;
   at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:238)
   at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
   at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
   at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
   at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
   at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
   at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
   at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   ]
   Vertex killed, vertexName=Reducer 2, 
vertexId=vertex_1611208773582_0084_1_01, diagnostics=[Vertex received Kill in 
INITED state., Vertex vertex_1611208773582_0084_1_01 [Reducer 2] killed/failed 
due to:OTHER_VERTEX_FAILURE]
   DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
   FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1611208773582_0084_1_00, diagnostics=[Vertex 
vertex_1611208773582_0084_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: dwd_recommend_tender_ds initializer 
failed, vertex=vertex_1611208773582_0084_1_00 [Map 1], 
java.lang.NoSuchMethodError: 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getCommitsTimeline()Lorg/apache/hudi/common/table/HoodieTimeline;
   at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:238)
   at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   at 

[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


vinothchandar commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499


   This is now out in the 0.7.0 release. 
   
   See 
https://github.com/apache/hudi/blame/release-0.7.0/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala#L183
 this test for examples



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


vinothchandar closed issue #2013:
URL: https://github.com/apache/hudi/issues/2013


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-25 Thread GitBox


Karl-WangSK commented on pull request #2260:
URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660


   cc @yanghua 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox


jingweiz2017 commented on issue #1971:
URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422


   @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by 
bvaradar should work for me case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


codecov-io edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


codecov-io commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=h1) Report
   > Merging 
[#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=desc) (8b07157) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2487   +/-   ##
   =
   + Coverage 50.18%   69.43%   +19.24% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9500 1340 -8160 
   + Misses 8656  456 -8200 
   + Partials775  134  -641 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==)
 | | | |
   | 
[.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==)
 | | | |
   | 
[...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==)
 | | | |
   | 
[...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | | | |
   | 
[.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=)
 | | | |
   | 
[...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh)
 | | | |
   | 
[...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh)
 | | | |
   | ... and [356 
more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

svn commit: r45595 - in /release/hudi: 0.7.0/ hudi-0.7.0/

2021-01-25 Thread vinoth
Author: vinoth
Date: Tue Jan 26 01:37:48 2021
New Revision: 45595

Log:
Renaming for Hudi 0.7.0

Added:
release/hudi/0.7.0/
  - copied from r45594, release/hudi/hudi-0.7.0/
Removed:
release/hudi/hudi-0.7.0/



[jira] [Commented] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-25 Thread wangxianghu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271766#comment-17271766
 ] 

wangxianghu commented on HUDI-1547:
---

[~vinoth] I can take it

> CI intermittent failure: 
> TestJsonStringToHoodieRecordMapFunction.testMapFunction 
> -
>
> Key: HUDI-1547
> URL: https://issues.apache.org/jira/browse/HUDI-1547
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Release  Administrative
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Assignee: wangxianghu
>Priority: Major
>  Labels: user-support-issues
>
> [https://github.com/apache/hudi/issues/2467]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-25 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu reassigned HUDI-1547:
-

Assignee: wangxianghu

> CI intermittent failure: 
> TestJsonStringToHoodieRecordMapFunction.testMapFunction 
> -
>
> Key: HUDI-1547
> URL: https://issues.apache.org/jira/browse/HUDI-1547
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Release  Administrative
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Assignee: wangxianghu
>Priority: Major
>  Labels: user-support-issues
>
> [https://github.com/apache/hudi/issues/2467]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox


nsivabalan closed issue #1958:
URL: https://github.com/apache/hudi/issues/1958


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox


nsivabalan commented on issue #1958:
URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126


   https://github.com/apache/hudi/pull/1978 have fixed it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox


nsivabalan commented on a change in pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151



##
File path: 
hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+
+/**
+ * Payload used in index table for Hoodie Record level index.
+ */
+public class HoodieRecordLevelIndexPayload implements 
HoodieRecordPayload {
+
+  private String key;
+  private String partitionPath;
+  private String instantTime;
+  private String fileId;
+
+  public HoodieRecordLevelIndexPayload(Option record) {
+if (record.isPresent()) {
+  // This can be simplified using SpecificData.deepcopy once this bug is 
fixed
+  // https://issues.apache.org/jira/browse/AVRO-1811
+  key = record.get().get("key").toString();
+  partitionPath = record.get().get("partitionPath").toString();
+  instantTime = record.get().get("instantTime").toString();
+  fileId = record.get().get("fileId").toString();
+}
+  }
+
+  private HoodieRecordLevelIndexPayload(String key, String partitionPath, 
String instantTime, String fileId) {
+this.key = key;
+this.partitionPath = partitionPath;
+this.instantTime = instantTime;
+this.fileId = fileId;
+  }
+
+  @Override
+  public HoodieRecordLevelIndexPayload 
preCombine(HoodieRecordLevelIndexPayload another) {
+if (this.instantTime.compareTo(another.instantTime) >= 0) {

Review comment:
   Note: this needs some fixing . Can we just convert the string to long 
and compare. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2021-01-25 Thread GitBox


nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175


   @bvaradar : guess you missed to follow up on this thread. can you check it 
out and respond when you can. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox


nsivabalan commented on issue #1971:
URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636


   @jingweiz2017 : can you please check above response and let us know if you 
need anything more from Hudi community. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2021-01-25 Thread GitBox


nsivabalan commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596


   @vinothchandar @umehrot2 : can either of you respond here wrt metadata 
support(rfc-15) in Athena. when can we possibly expect. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2021-01-25 Thread GitBox


nsivabalan commented on issue #1982:
URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667


   @Ac-Rush : would you mind update the ticket. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox


nsivabalan commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986


   @garyli1019 : can you give any updates you have on on this regard. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (HUDI-284) Need Tests for Hudi handling of schema evolution

2021-01-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-284:
-

> Need  Tests for Hudi handling of schema evolution
> -
>
> Key: HUDI-284
> URL: https://issues.apache.org/jira/browse/HUDI-284
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Common Core, newbie, Testing
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: help-wanted, pull-request-available, starter
> Fix For: 0.7.0
>
>
> Context in : 
> https://github.com/apache/incubator-hudi/pull/927#pullrequestreview-293449514



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2021-01-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-575.
-
Resolution: Fixed

> Support Async Compaction for spark streaming writes to hudi table
> -
>
> Key: HUDI-575
> URL: https://issues.apache.org/jira/browse/HUDI-575
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>
> Currenlty, only inline compaction is supported for Structured streaming 
> writes. 
>  
> We need to 
>  * Enable configuring async compaction for streaming writes 
>  * Implement a parallel compaction process like we did for delta streamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-284) Need Tests for Hudi handling of schema evolution

2021-01-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-284.
-
Resolution: Fixed

> Need  Tests for Hudi handling of schema evolution
> -
>
> Key: HUDI-284
> URL: https://issues.apache.org/jira/browse/HUDI-284
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Common Core, newbie, Testing
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: help-wanted, pull-request-available, starter
> Fix For: 0.7.0
>
>
> Context in : 
> https://github.com/apache/incubator-hudi/pull/927#pullrequestreview-293449514



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2021-01-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-575:
-

> Support Async Compaction for spark streaming writes to hudi table
> -
>
> Key: HUDI-575
> URL: https://issues.apache.org/jira/browse/HUDI-575
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>
> Currenlty, only inline compaction is supported for Structured streaming 
> writes. 
>  
> We need to 
>  * Enable configuring async compaction for streaming writes 
>  * Implement a parallel compaction process like we did for delta streamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-791) Replace null by Option in Delta Streamer

2021-01-25 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-791.
-
Resolution: Fixed

> Replace null by Option in Delta Streamer
> 
>
> Key: HUDI-791
> URL: https://issues.apache.org/jira/browse/HUDI-791
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Yanjia Gary Li
>Assignee: liwei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.7.0
>
>
> There is a lot of null in Delta Streamer. That will be great if we can 
> replace those null by Option. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >