from:"leesf"

[jira] [Updated] (HUDI-209) Implement JMX metrics reporter

2020-03-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-209:
---
Fix Version/s: (was: 0.5.1)
   0.6.0

> Implement JMX metrics reporter
> --
>
> Key: HUDI-209
> URL: https://issues.apache.org/jira/browse/HUDI-209
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: vinoyang
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, there are only two reporters {{MetricsGraphiteReporter}} and 
> {{InMemoryMetricsReporter}}. {{InMemoryMetricsReporter}} is used for testing. 
> So actually we only have one metrics reporter. Since JMX is a standard of the 
> monitor on the JVM platform, I propose to provide a JMX metrics reporter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-437) Support user-defined index

2020-03-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-437.

Resolution: Fixed

Fixed via master: f1d7bb381d4a370beeedb45132b24c2cac00aabf

> Support user-defined index
> --
>
> Key: HUDI-437
> URL: https://issues.apache.org/jira/browse/HUDI-437
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Index, newbie, Writer Core
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, Hudi does not support user-defined index, and will throw exception 
> if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-539) RO Path filter does not pick up hadoop configs from the spark context

2020-03-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-539:
---
Status: Open  (was: New)

> RO Path filter does not pick up hadoop configs from the spark context
> -
>
> Key: HUDI-539
> URL: https://issues.apache.org/jira/browse/HUDI-539
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.5.1
> Environment: Spark version : 2.4.4
> Hadoop version : 2.7.3
> Databricks Runtime: 6.1
>Reporter: Sam Somuah
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hi,
>  I'm trying to use hudi to write to one of the Azure storage container file 
> systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file 
> schemes. The issue I'm facing is that in {{HoodieROTablePathFilter}} it tries 
> to get a file path passing in a blank hadoop configuration. This manifests as 
> {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't 
> have any of the configuration in the environment.
> The problematic line is
> [https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96]
>  
> Stacktrace
> java.io.IOException: No FileSystem for scheme: abfss
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at 
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
> at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer

2020-03-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-616.

Resolution: Fixed

Fixed via master: [https://github.com/apache/incubator-hudi/pull/1434]

> Parquet files not getting created on DFS docker instance but on local FS in 
> TestHoodieDeltaStreamer
> ---
>
> Key: HUDI-616
> URL: https://issues.apache.org/jira/browse/HUDI-616
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer, Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In TestHoodieDeltaStreamer, 
> PARQUET_SOURCE_ROOT gets initialised even before function annotated with 
> @BeforeClass gets called as below - 
> private static final String PARQUET_SOURCE_ROOT = dfsBasePath + 
> "/parquetFiles";
> At this point, dfsBasePath variable is null and as a result, parquet files 
> get created on local FS which need to be cleared manually after testing. This 
> needs to be rectified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data

2020-03-08 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-409.
--
Resolution: Fixed

Fixed via master: 9d46ce380a3929605b3838238e8aa07a9918ab7a

> Replace Log Magic header with a secure hash to avoid clashes with data
> --
>
> Key: HUDI-409
> URL: https://issues.apache.org/jira/browse/HUDI-409
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Ramachandran M S
>Priority: Major
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-836) Implement datadog metrics reporter

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-836.
--

> Implement datadog metrics reporter
> --
>
> Key: HUDI-836
> URL: https://issues.apache.org/jira/browse/HUDI-836
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
> Fix For: 0.6.0
>
>
> To implement a new metrics reporter type for datadog API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-803.
--

> Improve Unit test coverage of HoodieAvroUtils around default values
> ---
>
> Key: HUDI-803
> URL: https://issues.apache.org/jira/browse/HUDI-803
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Recently there has been lot of work and improvements around schema evolution 
> and HoodieAvroUtils class in particular. Few bugs have already been fixed 
> around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow 
> around default values of Schema.Field has changed significantly. This Jira 
> aims to improve the test coverage of HoodieAvroUtils class so that our 
> functionality remains intact with respect to default values and schema 
> evolution. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-803.

Resolution: Fixed

Fixed via master: 6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7

> Improve Unit test coverage of HoodieAvroUtils around default values
> ---
>
> Key: HUDI-803
> URL: https://issues.apache.org/jira/browse/HUDI-803
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Recently there has been lot of work and improvements around schema evolution 
> and HoodieAvroUtils class in particular. Few bugs have already been fixed 
> around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow 
> around default values of Schema.Field has changed significantly. This Jira 
> aims to improve the test coverage of HoodieAvroUtils class so that our 
> functionality remains intact with respect to default values and schema 
> evolution. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-888) NPE when compacting via hudi-cli and providing a compaction props file

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-888:
---
Status: Closed  (was: Patch Available)

> NPE when compacting via hudi-cli and providing a compaction props file
> --
>
> Key: HUDI-888
> URL: https://issues.apache.org/jira/browse/HUDI-888
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Roland Johann
>Priority: Major
>  Labels: pull-request-available
>
> When we schedule compaction via hudi-cli and provide compaction props via 
> `propsFilePath` argument, we get a NPE because the file system has not been 
> initialized at the constructor of HoodieCompactor.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-858) Allow multiple operations to be executed within a single commit

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-858.
--

Fixed via master: e6f3bf10cf2c62a1008b82765abdcd33cfd64c67

> Allow multiple operations to be executed within a single commit
> ---
>
> Key: HUDI-858
> URL: https://issues.apache.org/jira/browse/HUDI-858
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> There are users who had been directly using RDD APIs and have relied on a 
> behavior in 0.4.x to allow multiple write operations (upsert/buk-insert/...) 
> to be executed within a single commit. 
> Given Hudi commit protocol, these are generally unsafe operations and user 
> need to handle failure scenarios. It only works with COW table. Hudi 0.5.x 
> had stopped this behavior.
> Given the importance of supporting such cases for the user's migration to 
> 0.5.x, we are proposing a safety flag (disabled by default) which will allow 
> this old behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-846) Turn on incremental cleaning bu default in 0.6.0

2020-05-23 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-846.
--

> Turn on incremental cleaning bu default in 0.6.0
> 
>
> Key: HUDI-846
> URL: https://issues.apache.org/jira/browse/HUDI-846
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Cleaner
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>
> Incremental cleaner will track commits that have happened since the last 
> clean operation to figure out partitions which needs to be scanned for 
> cleaning. This avoids the costly scanning of all partition paths.
> Incremental cleaning is currently disabled by default. We need to enable it 
> by default in 0.6.0.
> No special handling is required for upgrade/downgrade scenarios as 
> incremental cleaning relies on standard format of commit metadata 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-938) Update hudi name in NOTICE file

2020-05-24 Thread leesf (Jira)

leesf created HUDI-938:
--

 Summary: Update hudi name in NOTICE file
 Key: HUDI-938
 URL: https://issues.apache.org/jira/browse/HUDI-938
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-939) Update release scripts

2020-05-24 Thread leesf (Jira)

leesf created HUDI-939:
--

 Summary: Update release scripts
 Key: HUDI-939
 URL: https://issues.apache.org/jira/browse/HUDI-939
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-938) Remove incubating from NOTICE

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-938:
---
Summary: Remove incubating from NOTICE  (was: Update hudi name in NOTICE 
file)

> Remove incubating from NOTICE
> -
>
> Key: HUDI-938
> URL: https://issues.apache.org/jira/browse/HUDI-938
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-935) update travis name

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-935.
--

> update travis name
> --
>
> Key: HUDI-935
> URL: https://issues.apache.org/jira/browse/HUDI-935
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-935) update travis name

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-935.

Resolution: Not A Problem

> update travis name
> --
>
> Key: HUDI-935
> URL: https://issues.apache.org/jira/browse/HUDI-935
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-938) Remove incubating from NOTICE

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-938:
--

Assignee: leesf

> Remove incubating from NOTICE
> -
>
> Key: HUDI-938
> URL: https://issues.apache.org/jira/browse/HUDI-938
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-928) Consider changes needed in pom.xml to exit incubation

2020-05-24 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115115#comment-17115115
 ] 

leesf commented on HUDI-928:


also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2

> Consider changes needed in pom.xml to exit incubation
> -
>
> Key: HUDI-928
> URL: https://issues.apache.org/jira/browse/HUDI-928
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-939) Update release scripts

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-939.
--
Resolution: Fixed

also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2

> Update release scripts
> --
>
> Key: HUDI-939
> URL: https://issues.apache.org/jira/browse/HUDI-939
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-933) Examine the DOAP file for any necessary changes

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-933.
--
Resolution: Fixed

also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2

> Examine the DOAP file for any necessary changes
> ---
>
> Key: HUDI-933
> URL: https://issues.apache.org/jira/browse/HUDI-933
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-938) Remove incubating from NOTICE

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-938.
--
Resolution: Fixed

Fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2

> Remove incubating from NOTICE
> -
>
> Key: HUDI-938
> URL: https://issues.apache.org/jira/browse/HUDI-938
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-928) Consider changes needed in pom.xml to exit incubation

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-928:
---
Status: Closed  (was: Patch Available)

> Consider changes needed in pom.xml to exit incubation
> -
>
> Key: HUDI-928
> URL: https://issues.apache.org/jira/browse/HUDI-928
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-926) Removing DISCLAIMER from the repo

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-926:
--

Assignee: leesf

> Removing DISCLAIMER from the repo
> -
>
> Key: HUDI-926
> URL: https://issues.apache.org/jira/browse/HUDI-926
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>
> We need to understand if we still need the DISCLAIMER placed in the code 
> repo.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-935) update travis name

2020-05-24 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115081#comment-17115081
 ] 

leesf commented on HUDI-935:


yes, new PRs are good. we should change the incubator-hudi to hudi for old PRs 
manually.

> update travis name
> --
>
> Key: HUDI-935
> URL: https://issues.apache.org/jira/browse/HUDI-935
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-926) Removing DISCLAIMER from the repo

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-926.
--

> Removing DISCLAIMER from the repo
> -
>
> Key: HUDI-926
> URL: https://issues.apache.org/jira/browse/HUDI-926
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> We need to understand if we still need the DISCLAIMER placed in the code 
> repo.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HUDI-928) Consider changes needed in pom.xml to exit incubation

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reopened HUDI-928:


> Consider changes needed in pom.xml to exit incubation
> -
>
> Key: HUDI-928
> URL: https://issues.apache.org/jira/browse/HUDI-928
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-928) Consider changes needed in pom.xml to exit incubation

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-928.

Resolution: Fixed

> Consider changes needed in pom.xml to exit incubation
> -
>
> Key: HUDI-928
> URL: https://issues.apache.org/jira/browse/HUDI-928
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-928) Consider changes needed in pom.xml to exit incubation

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-928.
--

> Consider changes needed in pom.xml to exit incubation
> -
>
> Key: HUDI-928
> URL: https://issues.apache.org/jira/browse/HUDI-928
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-926) Removing DISCLAIMER from the repo

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-926.

Resolution: Fixed

Fixed via master: f22c3e933e828d1342bed67874b9ab3fee0ad099

> Removing DISCLAIMER from the repo
> -
>
> Key: HUDI-926
> URL: https://issues.apache.org/jira/browse/HUDI-926
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> We need to understand if we still need the DISCLAIMER placed in the code 
> repo.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-926) Removing DISCLAIMER from the repo

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-926:
---
Fix Version/s: 0.5.3

> Removing DISCLAIMER from the repo
> -
>
> Key: HUDI-926
> URL: https://issues.apache.org/jira/browse/HUDI-926
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> We need to understand if we still need the DISCLAIMER placed in the code 
> repo.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-927:
---
Status: Open  (was: New)

> https://hudi.incubator.apache.org should auto redirect to 
> https://hudi.apache.org
> -
>
> Key: HUDI-927
> URL: https://issues.apache.org/jira/browse/HUDI-927
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>
> This is still not happening.. need to wait for few days out a bit and if not 
> working still, raise a INFRA jira.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-927.
--

> https://hudi.incubator.apache.org should auto redirect to 
> https://hudi.apache.org
> -
>
> Key: HUDI-927
> URL: https://issues.apache.org/jira/browse/HUDI-927
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.5.3
>
>
> This is still not happening.. need to wait for few days out a bit and if not 
> working still, raise a INFRA jira.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-927.

Fix Version/s: 0.5.3
   Resolution: Fixed

> https://hudi.incubator.apache.org should auto redirect to 
> https://hudi.apache.org
> -
>
> Key: HUDI-927
> URL: https://issues.apache.org/jira/browse/HUDI-927
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 0.5.3
>
>
> This is still not happening.. need to wait for few days out a bit and if not 
> working still, raise a INFRA jira.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-304) Bring back spotless plugin

2020-05-24 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115345#comment-17115345
 ] 

leesf commented on HUDI-304:


[~shivnarayan] sorry, I do have much time to focus on the PR recently and make 
it HELP-WANTED if anyone wants to work on issue.

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, Testing
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: bug-bash-0.6.0, help-wanted, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-304) Bring back spotless plugin

2020-05-24 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-304:
---
Labels: bug-bash-0.6.0 help-wanted pull-request-available  (was: 
bug-bash-0.6.0 pull-request-available)

> Bring back spotless plugin 
> ---
>
> Key: HUDI-304
> URL: https://issues.apache.org/jira/browse/HUDI-304
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, Testing
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: bug-bash-0.6.0, help-wanted, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> spotless plugin has been turned off as the eclipse style format it was 
> referencing was removed due to compliance reasons. 
> We use google style eclipse format with some changes
> 90c90
> < 
> ---
> > 
> 242c242
> <  value="100"/>
> ---
> >  > value="120"/>
>  
> The eclipse style sheet was originally obtained from 
> [https://github.com/google/styleguide] which CC -By 3.0 license which is not 
> compatible for source distribution (See 
> [https://www.apache.org/legal/resolved.html#cc-by]) 
>  
> We need to figure out a way to bring this back
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-935) update travis name

2020-05-23 Thread leesf (Jira)

leesf created HUDI-935:
--

 Summary: update travis name
 Key: HUDI-935
 URL: https://issues.apache.org/jira/browse/HUDI-935
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-819.
--

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-819:
---
Status: Open  (was: New)

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-819.

Fix Version/s: 0.6.0
   Resolution: Fixed

> missing write status in MergeOnReadLazyInsertIterable
> -
>
> Key: HUDI-819
> URL: https://issues.apache.org/jira/browse/HUDI-819
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Variable declared 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53]
>  masks protected statuses variable. 
> So although hoodie writes data, will not include writestatus in the completed 
> section. This can cause duplicates being written



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-850.
--

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-850.

Resolution: Fixed

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode

2020-05-03 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-850:
---
Status: Open  (was: New)

> Avoid unnecessary listings in incremental cleaning mode
> ---
>
> Key: HUDI-850
> URL: https://issues.apache.org/jira/browse/HUDI-850
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Cleaner, Performance
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Came up during https://github.com/apache/incubator-hudi/issues/1552 
> Even with incremental cleaning turned on, we would have a scenario where 
> there are no commits yet to clean, but we end up listing needlessly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-29 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1225.
---

> Avro Date logical type not handled correctly when converting to Spark Row
> -
>
> Key: HUDI-1225
> URL: https://issues.apache.org/jira/browse/HUDI-1225
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> [https://github.com/apache/hudi/issues/2034]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-29 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1225.
-
Resolution: Fixed

> Avro Date logical type not handled correctly when converting to Spark Row
> -
>
> Key: HUDI-1225
> URL: https://issues.apache.org/jira/browse/HUDI-1225
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> [https://github.com/apache/hudi/issues/2034]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1268) Fix UpgradeDowngrade Rename Exception in aliyun OSS

2020-09-03 Thread leesf (Jira)

leesf created HUDI-1268:
---

 Summary: Fix UpgradeDowngrade Rename Exception in aliyun  OSS
 Key: HUDI-1268
 URL: https://issues.apache.org/jira/browse/HUDI-1268
 Project: Apache Hudi
  Issue Type: Bug
  Components: Writer Core
Reporter: leesf
 Fix For: 0.6.1


when using HoodieWriteClient API to write data to hudi with following config:

```

Properties properties = new Properties();
properties.setProperty(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, 
tableName);
properties.setProperty(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, 
tableType.name());
properties.setProperty(HoodieTableConfig.HOODIE_PAYLOAD_CLASS_PROP_NAME, 
OverwriteWithLatestAvroPayload.class.getName());
properties.setProperty(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, 
"archived");
return HoodieTableMetaClient.initTableAndGetMetaClient(hadoopConf, basePath, 
properties);

```

the exception will be thrown with FileAlreadyExistsException in aliyun OSS, 
after debugging, it is the following code throws the exception.

 

```

// Rename the .updated file to hoodie.properties. This is atomic in hdfs, but 
not in cloud stores.
// But as long as this does not leave a partial hoodie.properties file, we are 
okay.
fs.rename(updatedPropsFilePath, propsFilePath);

```

however, we would ignore the FileAlreadyExistsException since hoodie.properties 
already exists.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-802.

Resolution: Fixed

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-802.
--

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1181.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1181.
---

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1181:

Status: Open  (was: New)

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1130:

Status: Open  (was: New)

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1254.
---

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1254.
-
Resolution: Fixed

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1130.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1254:

Status: Open  (was: New)

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1130.
---

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-12 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1255.
---

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1227) Document the usage of CLI

2020-08-26 Thread leesf (Jira)

leesf created HUDI-1227:
---

 Summary: Document the usage of CLI
 Key: HUDI-1227
 URL: https://issues.apache.org/jira/browse/HUDI-1227
 Project: Apache Hudi
  Issue Type: Bug
  Components: CLI
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1227) Document the usage of CLI

2020-08-26 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1227:

Issue Type: Improvement  (was: Bug)

> Document the usage of CLI
> -
>
> Key: HUDI-1227
> URL: https://issues.apache.org/jira/browse/HUDI-1227
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI
>Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1234) Insert new records regardless of small file when using insert operation

2020-08-28 Thread leesf (Jira)

leesf created HUDI-1234:
---

 Summary: Insert new records regardless of small file when using 
insert operation
 Key: HUDI-1234
 URL: https://issues.apache.org/jira/browse/HUDI-1234
 Project: Apache Hudi
  Issue Type: Bug
  Components: Writer Core
Reporter: leesf


context here [https://github.com/apache/hudi/issues/2051]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1231) Duplicate record while querying from hive synced table

2020-08-28 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186874#comment-17186874
 ] 

leesf commented on HUDI-1231:
-

[~vbalaji] would you please take a look

> Duplicate record while querying from hive synced table
> --
>
> Key: HUDI-1231
> URL: https://issues.apache.org/jira/browse/HUDI-1231
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ashok Kumar
>Priority: Major
>
> I am writting in upsert mode with precombine flag enabled. Still when i query 
> i see same record available 3 times in same parquet file
>  
> spark.sql("select 
> _hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name
>  from hudi5_mor_ro where id1=1086187 and timestamp=1598461500 and 
> _hoodie_record_key='timestamp:1598461500,id1:1086187,id2:1872725,flowId:23'").show(10,false)
>  
> +--+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|
> +--+
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> +--+
>  
> This issue i am getting with both kind of table i.e COW and MOR. 
> I have tried it 0.6.3 version but i had tried 0.5.3 and in that also this bug 
> was coming.
> This issue is not coming with small data set. 
>  
> Strange thing is when i query only parquet file it gives only one record(i.e 
> correct)
> df.filter(col("_hoodie_record_key")==="timestamp:1598461500,id1:1086187,id2:1872725,flowId:23").count
>  res13: Long = 1
>  
> Note:
> When i query filesystem, its fine.
> This issue i see when i query from hive synced table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1025:

Status: Open  (was: New)

> Meter RPC calls in HoodieWrapperFileSystem
> --
>
> Key: HUDI-1025
> URL: https://issues.apache.org/jira/browse/HUDI-1025
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Hudi issues very a large number of RPC calls to DFS. When making changes to 
> Hudi, we try to ensure that the number of RPC calls does not increase 
> appreciably, as this could impact the DFS. 
> We should therefore meter HoodieWrapperFileSystem so that we can track the 
> RPC calls. This will help in service observability / SLA tracking and will 
> make it easier to tell when change results in increased RPC load. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1025.
---

> Meter RPC calls in HoodieWrapperFileSystem
> --
>
> Key: HUDI-1025
> URL: https://issues.apache.org/jira/browse/HUDI-1025
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Hudi issues very a large number of RPC calls to DFS. When making changes to 
> Hudi, we try to ensure that the number of RPC calls does not increase 
> appreciably, as this could impact the DFS. 
> We should therefore meter HoodieWrapperFileSystem so that we can track the 
> RPC calls. This will help in service observability / SLA tracking and will 
> make it easier to tell when change results in increased RPC load. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1025.
-
Resolution: Fixed

> Meter RPC calls in HoodieWrapperFileSystem
> --
>
> Key: HUDI-1025
> URL: https://issues.apache.org/jira/browse/HUDI-1025
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Hudi issues very a large number of RPC calls to DFS. When making changes to 
> Hudi, we try to ensure that the number of RPC calls does not increase 
> appreciably, as this could impact the DFS. 
> We should therefore meter HoodieWrapperFileSystem so that we can track the 
> RPC calls. This will help in service observability / SLA tracking and will 
> make it easier to tell when change results in increased RPC load. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1025:

Fix Version/s: 0.6.1

> Meter RPC calls in HoodieWrapperFileSystem
> --
>
> Key: HUDI-1025
> URL: https://issues.apache.org/jira/browse/HUDI-1025
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Hudi issues very a large number of RPC calls to DFS. When making changes to 
> Hudi, we try to ensure that the number of RPC calls does not increase 
> appreciably, as this could impact the DFS. 
> We should therefore meter HoodieWrapperFileSystem so that we can track the 
> RPC calls. This will help in service observability / SLA tracking and will 
> make it easier to tell when change results in increased RPC load. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1083:

Status: Open  (was: New)

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1083.
---

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1083.
-
Resolution: Fixed

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1083:

Fix Version/s: 0.6.1

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1197) Fix build issue in scala 2.12

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1197.
---

> Fix build issue in scala 2.12
> -
>
> Key: HUDI-1197
> URL: https://issues.apache.org/jira/browse/HUDI-1197
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> During release process ran into build issues with scala 2.12 in 
> HoodieWriterUtils. The error msg looks like below:
> [ERROR] 
> /...hudi/hudi-spark/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:32:
>  error: reference to mapAsJavaMap is ambiguous;
> [ERROR] it is imported twice in the same scope by
> [ERROR] import scala.collection.JavaConverters._
> [ERROR] and import scala.collection.JavaConversions._
> [ERROR] mapAsJavaMap(parametersWithWriteDefaults(parameters.asScala.toMap))
> [ERROR] ^
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1188.
---

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1177.
---

> fix TimestampBasedKeyGenerator  Task not serializableException
> --
>
> Key: HUDI-1177
> URL: https://issues.apache.org/jira/browse/HUDI-1177
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: liujinhui
>Assignee: Pratyaksh Sharma
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1188.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1188:

Status: Open  (was: New)

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table

2020-09-25 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-1161:
---

Assignee: Nicholas Jiang  (was: leesf)

> Support update partial fields for MoR table
> ---
>
> Key: HUDI-1161
> URL: https://issues.apache.org/jira/browse/HUDI-1161
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: leesf
>Assignee: Nicholas Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-972) Update hudi logo

2020-05-27 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118297#comment-17118297
 ] 

leesf commented on HUDI-972:


hi [~shivnarayan] the logo has been updated. you need refresh the website.

> Update hudi logo
> 
>
> Key: HUDI-972
> URL: https://issues.apache.org/jira/browse/HUDI-972
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Assignee: leesf
>Priority: Major
> Attachments: Screen Shot 2020-05-28 at 12.10.12 AM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-974) Fields out of order in MOR mode when using Hive

2020-05-28 Thread leesf (Jira)

leesf created HUDI-974:
--

 Summary: Fields out of order in MOR mode when using Hive
 Key: HUDI-974
 URL: https://issues.apache.org/jira/browse/HUDI-974
 Project: Apache Hudi
  Issue Type: Bug
  Components: Hive Integration
Reporter: leesf
Assignee: liwei
 Fix For: 0.6.0
 Attachments: image-2020-05-28-21-06-02-396.png, 
image-2020-05-28-21-07-30-803.png

When querying MOR hudi dataset via hive

hive table:

CREATE EXTERNAL TABLE `unknown_rt`(
 `_hoodie_commit_time` string,
 `_hoodie_commit_seqno` string,
 `_hoodie_record_key` string,
 `_hoodie_partition_path` string,
 `_hoodie_file_name` string,
 `age` bigint,
 `name` string,
 `sex` string,
 `ts` bigint)
PARTITIONED BY (
 `location` string)
ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
 'file:/Users/sflee/personal/backup_demo'
TBLPROPERTIES (
 'last_commit_time_sync'='20200528153331',
 'transient_lastDdlTime'='1590650733')

 

sql:

set hoodie.realtime.merge.skip = true;

select sex, name, age from unknown_rt;

result:

!image-2020-05-28-21-06-02-396.png!

the fields is out of order when setting hoodie.realtime.merge.skip = true;

sql:

set hoodie.realtime.merge.skip = false;

select sex, name, age from unknown_rt

!image-2020-05-28-21-07-30-803.png!

query result is ok when setting hoodie.realtime.merge.skip = false;

after debugging, I found that hudi use getWriterSchema in 
RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-974) Fields out of order in MOR mode when using Hive

2020-05-28 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-974:
---
Description: 
When querying MOR hudi dataset via hive

hive table:

CREATE EXTERNAL TABLE `unknown_rt`(
 `_hoodie_commit_time` string,
 `_hoodie_commit_seqno` string,
 `_hoodie_record_key` string,
 `_hoodie_partition_path` string,
 `_hoodie_file_name` string,
 `age` bigint,
 `name` string,
 `sex` string,
 `ts` bigint)
 PARTITIONED BY (
 `location` string)
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
 STORED AS INPUTFORMAT
 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
 LOCATION
 'file:/Users/sflee/personal/backup_demo'
 TBLPROPERTIES (
 'last_commit_time_sync'='20200528153331',
 'transient_lastDdlTime'='1590650733')

 

sql:

set hoodie.realtime.merge.skip = true;

select sex, name, age from unknown_rt;

result:

!image-2020-05-28-21-06-02-396.png!

the fields is out of order when setting hoodie.realtime.merge.skip = true;

sql:

set hoodie.realtime.merge.skip = false;

select sex, name, age from unknown_rt

!image-2020-05-28-21-07-30-803.png!

query result is ok when setting hoodie.realtime.merge.skip = false;

after debugging, I found that hudi use getWriterSchema in 
RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it.

 

cc [~vbalaji]

 

  was:
When querying MOR hudi dataset via hive

hive table:

CREATE EXTERNAL TABLE `unknown_rt`(
 `_hoodie_commit_time` string,
 `_hoodie_commit_seqno` string,
 `_hoodie_record_key` string,
 `_hoodie_partition_path` string,
 `_hoodie_file_name` string,
 `age` bigint,
 `name` string,
 `sex` string,
 `ts` bigint)
PARTITIONED BY (
 `location` string)
ROW FORMAT SERDE
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
 'file:/Users/sflee/personal/backup_demo'
TBLPROPERTIES (
 'last_commit_time_sync'='20200528153331',
 'transient_lastDdlTime'='1590650733')

 

sql:

set hoodie.realtime.merge.skip = true;

select sex, name, age from unknown_rt;

result:

!image-2020-05-28-21-06-02-396.png!

the fields is out of order when setting hoodie.realtime.merge.skip = true;

sql:

set hoodie.realtime.merge.skip = false;

select sex, name, age from unknown_rt

!image-2020-05-28-21-07-30-803.png!

query result is ok when setting hoodie.realtime.merge.skip = false;

after debugging, I found that hudi use getWriterSchema in 
RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it.

 


> Fields out of order in MOR mode when using Hive
> ---
>
> Key: HUDI-974
> URL: https://issues.apache.org/jira/browse/HUDI-974
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: leesf
>Assignee: liwei
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: image-2020-05-28-21-06-02-396.png, 
> image-2020-05-28-21-07-30-803.png
>
>
> When querying MOR hudi dataset via hive
> hive table:
> CREATE EXTERNAL TABLE `unknown_rt`(
>  `_hoodie_commit_time` string,
>  `_hoodie_commit_seqno` string,
>  `_hoodie_record_key` string,
>  `_hoodie_partition_path` string,
>  `_hoodie_file_name` string,
>  `age` bigint,
>  `name` string,
>  `sex` string,
>  `ts` bigint)
>  PARTITIONED BY (
>  `location` string)
>  ROW FORMAT SERDE
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>  STORED AS INPUTFORMAT
>  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
>  OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>  LOCATION
>  'file:/Users/sflee/personal/backup_demo'
>  TBLPROPERTIES (
>  'last_commit_time_sync'='20200528153331',
>  'transient_lastDdlTime'='1590650733')
>  
> sql:
> set hoodie.realtime.merge.skip = true;
> select sex, name, age from unknown_rt;
> result:
> !image-2020-05-28-21-06-02-396.png!
> the fields is out of order when setting hoodie.realtime.merge.skip = true;
> sql:
> set hoodie.realtime.merge.skip = false;
> select sex, name, age from unknown_rt
> !image-2020-05-28-21-07-30-803.png!
> query result is ok when setting hoodie.realtime.merge.skip = false;
> after debugging, I found that hudi use getWriterSchema in 
> RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it.
>  
> cc [~vbalaji]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-786) InlineFileSystem.read API should ensure content beyond inline length gets an EOF

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-786.

Resolution: Fixed

Fixed via master: 5a0d3f1cf963e0061364d915ac86a465dd079bac

> InlineFileSystem.read API should ensure content beyond inline length gets an 
> EOF
> 
>
> Key: HUDI-786
> URL: https://issues.apache.org/jira/browse/HUDI-786
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While trying to investigate a flaky test, noticed that the readFully() just 
> proceeds to read bytes from the outerStream without any bounds checking
> {code}
> @Override
>   public void readFully(long position, byte[] buffer, int offset, int length) 
> throws IOException {
> if ((length - offset) > this.length) {
>   throw new IOException("Attempting to read past inline content");
> }
> outerStream.readFully(startOffset + position, buffer, offset, length);
>   }
>   @Override
>   public void readFully(long position, byte[] buffer)
>   throws IOException {
> readFully(position, buffer, 0, buffer.length);
>   }
> {code}
> we need to throw an error for buffers that are trying to read past the inline 
> content.. (potentially buggy) example shown above.
> I have also ignored the TestInlineFileSystem#testFileSystemAPIs() ... we need 
> to make a change to respect suffix length (we randomly generate) while 
> attempting to read past the 1000 bytes of inline content.. 
> {code}
>  actualBytes = new byte[1000 + outerPathInfo.suffixLength];
> fsDataInputStream.readFully(0, actualBytes);
> verifyArrayEquality(outerPathInfo.expectedBytes, 0, 1000, actualBytes, 0, 
> 1000);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-476) Add a hudi-examples module

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-476.

Resolution: Fixed

Fixed via master: bde7a7043e100242fec8fc0111e489a269a1d997

> Add a hudi-examples module
> --
>
> Key: HUDI-476
> URL: https://issues.apache.org/jira/browse/HUDI-476
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
>
> add a hudi-examples module to add some examples code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-973) RemoteHoodieTableFileSystemView supports non-partitioned table queries

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-973.
--

> RemoteHoodieTableFileSystemView supports non-partitioned table queries
> --
>
> Key: HUDI-973
> URL: https://issues.apache.org/jira/browse/HUDI-973
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: dzcxzl
>Assignee: Balaji Varadarajan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> When hoodie.embed.timeline.server = true, the written table is a 
> non-partitioned table, will get an exception.
>  
> {code:java}
> io.javalin.BadRequestResponse: Query parameter 'partition' with value '' 
> cannot be null or empty
>   at io.javalin.validation.TypedValidator.getOrThrow(Validator.kt:25)
>   at 
> org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerDataFilesAPI$3(FileSystemViewHandler.java:172)
> {code}
>  
> Because api checks whether the value of partition is null or empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-936) Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove unnecessary casting to String

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-936.
--

> Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to  remove 
> unnecessary casting to String
> --
>
> Key: HUDI-936
> URL: https://issues.apache.org/jira/browse/HUDI-936
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-936) Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove unnecessary casting to String

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-936.

Resolution: Fixed

Fixed via master: 9697fbf71ead328cae6d56e9f99872e871342887

> Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to  remove 
> unnecessary casting to String
> --
>
> Key: HUDI-936
> URL: https://issues.apache.org/jira/browse/HUDI-936
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-476) Add a hudi-examples module

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-476.
--

> Add a hudi-examples module
> --
>
> Key: HUDI-476
> URL: https://issues.apache.org/jira/browse/HUDI-476
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>  Labels: pull-request-available
>
> add a hudi-examples module to add some examples code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-690.

Resolution: Fixed

Fixed via master: 6c450957ced051de6231ad047bce22752210b786

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
> Fix For: 0.6.0
>
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 5

[jira] [Closed] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-690.
--

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0, pull-request-available
> Fix For: 0.6.0
>
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "

[jira] [Closed] (HUDI-980) Some remnants after running TestHiveSyncTool is created in local source dir

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-980.
--

> Some remnants after running TestHiveSyncTool is created in local source dir
> ---
>
> Key: HUDI-980
> URL: https://issues.apache.org/jira/browse/HUDI-980
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Release  Administrative
>Affects Versions: 0.5.3
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> After running tests in TestHiveSyncTool, metadatastore_db directory is 
> created in hudi-hive-sync/ . Need to fix this to be generated under the work 
> dir created as part of the test. 
>  
> This in turn creates issues while compiling, due to license header missing in 
> these generated files. 
>  
> ```
> [INFO] hudi-integ-test  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 20.168 s
> [INFO] Finished at: 2020-05-29T11:50:08-04:00
> [INFO] 
> 
> [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.12:check 
> (default) on project hudi-hive: Too many files with unapproved license: 5 See 
> RAT report in: 
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/target/rat.txt
>  -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> ```
> Contents of rat file
> ```
> 5 Unknown Licenses
> *
> Files with unapproved licenses:
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/db.lck
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/seg0/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/log/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/service.properties
> ```
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-980) Some remnants after running TestHiveSyncTool is created in local source dir

2020-05-30 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-980:
---
Fix Version/s: 0.6.0

> Some remnants after running TestHiveSyncTool is created in local source dir
> ---
>
> Key: HUDI-980
> URL: https://issues.apache.org/jira/browse/HUDI-980
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Release  Administrative
>Affects Versions: 0.5.3
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> After running tests in TestHiveSyncTool, metadatastore_db directory is 
> created in hudi-hive-sync/ . Need to fix this to be generated under the work 
> dir created as part of the test. 
>  
> This in turn creates issues while compiling, due to license header missing in 
> these generated files. 
>  
> ```
> [INFO] hudi-integ-test  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 20.168 s
> [INFO] Finished at: 2020-05-29T11:50:08-04:00
> [INFO] 
> 
> [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.12:check 
> (default) on project hudi-hive: Too many files with unapproved license: 5 See 
> RAT report in: 
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/target/rat.txt
>  -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> ```
> Contents of rat file
> ```
> 5 Unknown Licenses
> *
> Files with unapproved licenses:
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/db.lck
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/seg0/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/log/README_DO_NOT_TOUCH_FILES.txt
>  
> /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/service.properties
> ```
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2020-09-20 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198992#comment-17198992
 ] 

leesf commented on HUDI-1288:
-

[~soltar] I found there are still some users face the issue 
https://github.com/apache/avro/pull/290#issuecomment-625731714. and does  
0.5.2-incubating works well?

> DeltaSync:writeToSink fails with Unknown datum type 
> org.apache.avro.JsonProperties$Null
> ---
>
> Key: HUDI-1288
> URL: https://issues.apache.org/jira/browse/HUDI-1288
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Michal Swiatowy
>Priority: Major
>
> After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into 
> following error message on write to HDFS:
> {code:java}
> 2020-09-18 12:54:38,651 [Driver] INFO  
> HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing 
> Table of type MERGE_ON_READ from 
> /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC
> 2020-09-18 12:54:38,663 [Driver] INFO  DeltaSync:setupWriteClient:470 - 
> Setting up Hoodie Write Client
> 2020-09-18 12:54:38,695 [Driver] INFO  DeltaSync:registerAvroSchemas:522 - 
> Registering Schema 
> :[{"type":"record","name":"Value","namespace":"ARC_6FQS_W.dbo.S_INCOMINGMESSAGEDETAIL","fields":[{"name":"ID","type":"long"},{"name":"OPTIMISTICLOCK","type":{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}},{"name":"DOCUMENTAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DOCUMENTDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DOCUMENTNUMBER","type":["null","string"],"default":null},{"name":"PAYMENTTYPE","type":["null","string"],"default":null},{"name":"PURCHASEORDERNUMBER","type":["null","string"],"default":null},{"name":"VALUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"INCOMINGMESSAGEHEADERID","type":["null","long"],"default":null},{"name":"MESSAGETEXTID","type":["null","long"],"default":null},{"name":"DUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DEBTORASCNUMBER","type":["null","string"],"default":null},{"name":"DOCUMENTTYPE","type":["null","string"],"default":null},{"name":"NUMBEROFDUEDATES","type":["null","string"],"default":null},{"name":"DUEDATEINDICATOR","type":["null","string"],"default":null},{"name":"DISPUTECODE","type":["null","string"],"default":null},{"name":"INSTRUCTIONCODE","type":["null","string"],"default":null},{"name":"PAYMENTTERMS","type":["null","string"],"default":null},{"name":"PAYMENTCONDITION","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS1","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS2","type":["null","string"],"default":null},{"name":"ERRORID","type

[jira] [Updated] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1124:

Status: Open  (was: New)

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1124.
---

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1124.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1123:

Status: Open  (was: New)

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1123.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1123.
---

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>    Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1287) Make deltastrmer supports custom ETL transformer

2020-09-20 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198994#comment-17198994
 ] 

leesf commented on HUDI-1287:
-

[~liujinhui] DeltaStreamer should support user custom Transformer. you would 
just implement your own transformer to implement Transformer interface.

> Make deltastrmer supports custom ETL transformer
> 
>
> Key: HUDI-1287
> URL: https://issues.apache.org/jira/browse/HUDI-1287
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: liujinhui
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1087) Realtime Record Reader needs to handle decimal types

2020-07-19 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1087.
---

> Realtime Record Reader needs to handle decimal types
> 
>
> Key: HUDI-1087
> URL: https://issues.apache.org/jira/browse/HUDI-1087
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> For MOR, Realtime queries, decimal types are not getting handled correctly 
> resulting in the following exception:
>  
>  
> {{scala> spark.sql("select * from testTable_rt").show
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be 
> cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveDecimalObjectInspector.getPrimitiveWritableObject(WritableHiveDecimalObjectInspector.java:41)
>   at 
> org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:107)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:291)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:283)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)}}
> {{}}
> {{Issue : [https://github.com/apache/hudi/issues/1790]}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1112) Blog on Tracking Hudi Data along transaction time and buisness time

2020-08-04 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-1112:
---

Assignee: Sandeep Maji

> Blog on Tracking Hudi Data along transaction time and buisness time
> ---
>
> Key: HUDI-1112
> URL: https://issues.apache.org/jira/browse/HUDI-1112
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Sandeep Maji
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/apache/hudi/issues/1705



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1112) Blog on Tracking Hudi Data along transaction time and buisness time

2020-08-04 Thread leesf (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171214#comment-17171214
 ] 

leesf commented on HUDI-1112:
-

[~nandini57] Assigned to you.

> Blog on Tracking Hudi Data along transaction time and buisness time
> ---
>
> Key: HUDI-1112
> URL: https://issues.apache.org/jira/browse/HUDI-1112
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Sandeep Maji
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/apache/hudi/issues/1705



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-841.
--

> Abstract common meta sync module support multiple meta service
> --
>
> Key: HUDI-841
> URL: https://issues.apache.org/jira/browse/HUDI-841
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Blocker
> Fix For: 0.6.0
>
>
> Currently Hudi only supports sync dataset metadata to Hive through hive jdbc 
> and IMetaStoreClient. When you need to sync to other frameworks, such as aws 
> glue, aliyun DataLake analytics, etc. You need to copy a lot of code from 
> HoodieHiveClient, which creates a lot of redundant code. So need to redesign 
> the hudi-hive-sync module to support other frameworks and reuse current code 
> as much as possible. Only the interface is provided by Hudi, and the 
> implement is customized by different services as hive 、aws glue、aliyun 
> DataLake analytics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

< 4 5 6 7 8 9 10 11 12 >

801 - 900 of 1109 matches

Mail list logo