[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc delete records

2023-02-15 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-5798:
--
Summary: spark sql query fail on mor table after flink cdc delete records  
(was: spark sql query fail on mor table after flink cdc application delete 
records)

> spark sql query fail on mor table after flink cdc delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc application delete records

2023-02-15 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-5798:
--
Summary: spark sql query fail on mor table after flink cdc application 
delete records  (was: spark-sql query fail on mor table after flink cdc 
application delete records)

> spark sql query fail on mor table after flink cdc application delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records

2023-02-15 Thread lrz (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689110#comment-17689110
 ] 

lrz commented on HUDI-5798:
---

I fix this issue by add a special avro shade jar at spark/jars, and it seems 
not good to introduce into hudi project  

> spark-sql query fail on mor table after flink cdc application delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5805) hive query on mor get empty result before compaction

2023-02-15 Thread lrz (Jira)
lrz created HUDI-5805:
-

 Summary: hive query on mor get empty result before compaction
 Key: HUDI-5805
 URL: https://issues.apache.org/jira/browse/HUDI-5805
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Attachments: image-2023-02-15-20-48-08-819.png, 
image-2023-02-15-20-48-21-988.png

when a mor table write data with flink cdc only, then before compaction the 
partition will only have log file, and no base file. then befor compaction, 
hive query result will always be empty.

it's because when hive getSplit on a native table, hive will ignore a partition 
which only has files start with '.', and because hudi has not set storageHandle 
when sync hive meta, then hive treat it as native table

!image-2023-02-15-20-48-08-819.png!

!image-2023-02-15-20-48-21-988.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records

2023-02-14 Thread lrz (Jira)
lrz created HUDI-5798:
-

 Summary: spark-sql query fail on mor table after flink cdc 
application delete records
 Key: HUDI-5798
 URL: https://issues.apache.org/jira/browse/HUDI-5798
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz


after flink cdc application delete records for a mor table, spark sql will 
query fail on the table with below exception:

 

Serialization trace:
orderingVal (org.apache.hudi.common.model.DeleteRecord)
    at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
    at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
    at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
    at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
    at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
    at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
    at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
    at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
    at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
    at 
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
    at 
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
    at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
    at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
    ... 23 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hudi.org.apache.avro.util.Utf8
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
    ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-05-07 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1759.
---
Resolution: Fixed

> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
> !image-2021-04-02-15-48-42-895.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1744) [Rollback] rollback fail on mor table when the partition path hasn't any files

2021-04-20 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1744.
---
Resolution: Fixed

> [Rollback] rollback fail on mor table when the partition path hasn't any files
> --
>
> Key: HUDI-1744
> URL: https://issues.apache.org/jira/browse/HUDI-1744
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> when rollback on a mor table, and if the partition path hasn't any files, 
> then will throw exception because of call rdd.flatmap with 0 as numpartitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: upsertFail.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: unsupportInt96.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: upsertFail2.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)
lrz created HUDI-1779:
-

 Summary: Fail to bootstrap/upsert a table which contains timestamp 
column
 Key: HUDI-1779
 URL: https://issues.apache.org/jira/browse/HUDI-1779
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Fix For: 0.9.0


current when hudi bootstrap a parquet file, or upsert into a parquet file which 
contains timestmap column, it will fail because these issues:

1) At bootstrap operation, if the origin parquet file was written by a spark 
application, then spark will default save timestamp as int96(see 
spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because of 
Hudi can not read Int96 type now.(this issue can be solve by upgrade parquet to 
1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
[https://github|https://github/] 
<[https://github/]>.com/apache/parquet-mr/pull/831/files) 

2) after bootstrap, doing upsert will fail because we use hoodie schema to read 
origin parquet file. The schema is not match because hoodie schema  treat 
timestamp as long and at origin file it’s Int96 

3) after bootstrap, and partial update for a parquet file will fail, because we 
copy the old record and save by hoodie schema( we miss a convertFixedToLong 
operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath

2021-04-07 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1750.
---
Resolution: Fixed

> Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into 
> spark classpath
> 
>
> Key: HUDI-1750
> URL: https://issues.apache.org/jira/browse/HUDI-1750
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-04-01-10-55-43-760.png
>
>
> Hudi use Class.forName(clazzName) to load user's class, which classloader is 
> same as call,see here:
> !image-2021-04-01-10-55-43-760.png!
> if user move hudi-spark-bundle jar into spark classPath, and use --jar to add 
> customer jars, then the caller classLoader will be AppClassLoader, and the 
> customer jars will be load by spark's MutableURLClassLoader, then lead to 
> ClassNotFoundException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer

2021-04-07 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1751.
---
Resolution: Fixed

> DeltaStream print many unnecessary warn log because of passing hoodie config 
> to kafka consumer
> --
>
> Key: HUDI-1751
> URL: https://issues.apache.org/jira/browse/HUDI-1751
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Because we add both kafka parameters and hudi configs at the same properties 
> file, such as kafka-source.properties, then when creating kafkaParams obj 
> will add some hoodie config also, which lead to the warn log printing:
> !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-07 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz resolved HUDI-1749.
---
Resolution: Fixed

> Clean/Compaction/Rollback command maybe never exit when operation fail
> --
>
> Key: HUDI-1749
> URL: https://issues.apache.org/jira/browse/HUDI-1749
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> There are two issues:
> 1) After Clean/Compaction/Rollback command finish, yarn application will 
> always show fail because the command exit directly without waitting for 
> sparkContext stop.
> 2)when Clean/Compaction/Rollback command failed because of some exception, 
> the command will never exit because of sparkContext didn't stop. This is 
> because sparkUI use jetty, and introduce non-daemon thread, and 
> sparkContext.stop will stopUI to stop the non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1751:
--
Summary: DeltaStream print many unnecessary warn log because of passing 
hoodie config to kafka consumer  (was: DeltaStream print many unnecessary warn 
log)

> DeltaStream print many unnecessary warn log because of passing hoodie config 
> to kafka consumer
> --
>
> Key: HUDI-1751
> URL: https://issues.apache.org/jira/browse/HUDI-1751
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Because we add both kafka parameters and hudi configs at the same properties 
> file, such as kafka-source.properties, then when creating kafkaParams obj 
> will add some hoodie config also, which lead to the warn log printing:
> !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1759:
--
Description: 
when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

!image-2021-04-02-15-48-42-895.png!

 

  was:
when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

 


> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
> !image-2021-04-02-15-48-42-895.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1759:
--
Attachment: image-2021-04-02-15-48-42-895.png

> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)
lrz created HUDI-1759:
-

 Summary: Save one connection retry when hiveSyncTool run with 
useJdbc=false
 Key: HUDI-1759
 URL: https://issues.apache.org/jira/browse/HUDI-1759
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: lrz
 Fix For: 0.9.0
 Attachments: image-2021-04-02-15-43-15-854.png

when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1751) DeltaStream print many unnecessary warn log

2021-03-31 Thread lrz (Jira)
lrz created HUDI-1751:
-

 Summary: DeltaStream print many unnecessary warn log
 Key: HUDI-1751
 URL: https://issues.apache.org/jira/browse/HUDI-1751
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: lrz
 Fix For: 0.9.0


Because we add both kafka parameters and hudi configs at the same properties 
file, such as kafka-source.properties, then when creating kafkaParams obj will 
add some hoodie config also, which lead to the warn log printing:

!https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath

2021-03-31 Thread lrz (Jira)
lrz created HUDI-1750:
-

 Summary: Fail to load user's class if user move 
hudi-spark-bundle_2.11-0.7.0.jar into spark classpath
 Key: HUDI-1750
 URL: https://issues.apache.org/jira/browse/HUDI-1750
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Fix For: 0.9.0
 Attachments: image-2021-04-01-10-55-43-760.png

Hudi use Class.forName(clazzName) to load user's class, which classloader is 
same as call,see here:

!image-2021-04-01-10-55-43-760.png!

if user move hudi-spark-bundle jar into spark classPath, and use --jar to add 
customer jars, then the caller classLoader will be AppClassLoader, and the 
customer jars will be load by spark's MutableURLClassLoader, then lead to 
ClassNotFoundException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail

2021-03-31 Thread lrz (Jira)
lrz created HUDI-1749:
-

 Summary: Clean/Compaction/Rollback command maybe never exit when 
operation fail
 Key: HUDI-1749
 URL: https://issues.apache.org/jira/browse/HUDI-1749
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz


There are two issues:

1) After Clean/Compaction/Rollback command finish, yarn application will always 
show fail because the command exit directly without waitting for sparkContext 
stop.

2)when Clean/Compaction/Rollback command failed because of some exception, the 
command will never exit because of sparkContext didn't stop. This is because 
sparkUI use jetty, and introduce non-daemon thread, and sparkContext.stop will 
stopUI to stop the non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail

2021-03-31 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1749:
--
Fix Version/s: 0.9.0

> Clean/Compaction/Rollback command maybe never exit when operation fail
> --
>
> Key: HUDI-1749
> URL: https://issues.apache.org/jira/browse/HUDI-1749
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
>
> There are two issues:
> 1) After Clean/Compaction/Rollback command finish, yarn application will 
> always show fail because the command exit directly without waitting for 
> sparkContext stop.
> 2)when Clean/Compaction/Rollback command failed because of some exception, 
> the command will never exit because of sparkContext didn't stop. This is 
> because sparkUI use jetty, and introduce non-daemon thread, and 
> sparkContext.stop will stopUI to stop the non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1748) Read operation will possibility fail on mor table rt view when a write operations is concurrency running

2021-03-31 Thread lrz (Jira)
lrz created HUDI-1748:
-

 Summary: Read operation will possibility fail on mor table rt view 
when a write operations is concurrency running
 Key: HUDI-1748
 URL: https://issues.apache.org/jira/browse/HUDI-1748
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Fix For: 0.9.0


during reading operation, a new base file maybe produced by a writting 
operation. then the reading will opooibility to get a NPE when getSplit. here 
is the exception stack:

!https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/7bacca8042104499b0991d50b4bc3f2a/image.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1744) [Rollback] rollback fail on mor table when the partition path hasn't any files

2021-03-31 Thread lrz (Jira)
lrz created HUDI-1744:
-

 Summary: [Rollback] rollback fail on mor table when the partition 
path hasn't any files
 Key: HUDI-1744
 URL: https://issues.apache.org/jira/browse/HUDI-1744
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Fix For: 0.9.0


when rollback on a mor table, and if the partition path hasn't any files, then 
will throw exception because of call rdd.flatmap with 0 as numpartitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage

2020-11-20 Thread lrz (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235994#comment-17235994
 ] 

lrz commented on HUDI-57:
-

Hi, [~vinoth] we are eager to use this feature.  could you update any 
information when you free. also if you can help to deaggregate the sub tasks, 
then we would love to pick up some task, thank you very much 
 

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Mani Jindal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)