[jira] [Updated] (HUDI-209) Implement JMX metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-209: --- Fix Version/s: (was: 0.5.1) 0.6.0 > Implement JMX metrics reporter > -- > > Key: HUDI-209 > URL: https://issues.apache.org/jira/browse/HUDI-209 > Project: Apache Hudi (incubating) > Issue Type: New Feature >Reporter: vinoyang >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, there are only two reporters {{MetricsGraphiteReporter}} and > {{InMemoryMetricsReporter}}. {{InMemoryMetricsReporter}} is used for testing. > So actually we only have one metrics reporter. Since JMX is a standard of the > monitor on the JVM platform, I propose to provide a JMX metrics reporter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-437) Support user-defined index
[ https://issues.apache.org/jira/browse/HUDI-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-437. Resolution: Fixed Fixed via master: f1d7bb381d4a370beeedb45132b24c2cac00aabf > Support user-defined index > -- > > Key: HUDI-437 > URL: https://issues.apache.org/jira/browse/HUDI-437 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Index, newbie, Writer Core >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, Hudi does not support user-defined index, and will throw exception > if configured other index type except for HBASE/INMEMORY/BLOOM/GLOBAL_BLOOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-539) RO Path filter does not pick up hadoop configs from the spark context
[ https://issues.apache.org/jira/browse/HUDI-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-539: --- Status: Open (was: New) > RO Path filter does not pick up hadoop configs from the spark context > - > > Key: HUDI-539 > URL: https://issues.apache.org/jira/browse/HUDI-539 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Common Core >Affects Versions: 0.5.1 > Environment: Spark version : 2.4.4 > Hadoop version : 2.7.3 > Databricks Runtime: 6.1 >Reporter: Sam Somuah >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Hi, > I'm trying to use hudi to write to one of the Azure storage container file > systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file > schemes. The issue I'm facing is that in {{HoodieROTablePathFilter}} it tries > to get a file path passing in a blank hadoop configuration. This manifests as > {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't > have any of the configuration in the environment. > The problematic line is > [https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96] > > Stacktrace > java.io.IOException: No FileSystem for scheme: abfss > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96) > at > org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-616) Parquet files not getting created on DFS docker instance but on local FS in TestHoodieDeltaStreamer
[ https://issues.apache.org/jira/browse/HUDI-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-616. Resolution: Fixed Fixed via master: [https://github.com/apache/incubator-hudi/pull/1434] > Parquet files not getting created on DFS docker instance but on local FS in > TestHoodieDeltaStreamer > --- > > Key: HUDI-616 > URL: https://issues.apache.org/jira/browse/HUDI-616 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: DeltaStreamer, Testing >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In TestHoodieDeltaStreamer, > PARQUET_SOURCE_ROOT gets initialised even before function annotated with > @BeforeClass gets called as below - > private static final String PARQUET_SOURCE_ROOT = dfsBasePath + > "/parquetFiles"; > At this point, dfsBasePath variable is null and as a result, parquet files > get created on local FS which need to be cleared manually after testing. This > needs to be rectified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-409) Replace Log Magic header with a secure hash to avoid clashes with data
[ https://issues.apache.org/jira/browse/HUDI-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-409. -- Resolution: Fixed Fixed via master: 9d46ce380a3929605b3838238e8aa07a9918ab7a > Replace Log Magic header with a secure hash to avoid clashes with data > -- > > Key: HUDI-409 > URL: https://issues.apache.org/jira/browse/HUDI-409 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Common Core >Reporter: Nishith Agarwal >Assignee: Ramachandran M S >Priority: Major > Fix For: 0.5.2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-836) Implement datadog metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-836. -- > Implement datadog metrics reporter > -- > > Key: HUDI-836 > URL: https://issues.apache.org/jira/browse/HUDI-836 > Project: Apache Hudi (incubating) > Issue Type: New Feature > Components: Common Core >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Major > Labels: bug-bash-0.6.0, pull-request-available > Fix For: 0.6.0 > > > To implement a new metrics reporter type for datadog API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values
[ https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-803. -- > Improve Unit test coverage of HoodieAvroUtils around default values > --- > > Key: HUDI-803 > URL: https://issues.apache.org/jira/browse/HUDI-803 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Testing >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Recently there has been lot of work and improvements around schema evolution > and HoodieAvroUtils class in particular. Few bugs have already been fixed > around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow > around default values of Schema.Field has changed significantly. This Jira > aims to improve the test coverage of HoodieAvroUtils class so that our > functionality remains intact with respect to default values and schema > evolution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-803) Improve Unit test coverage of HoodieAvroUtils around default values
[ https://issues.apache.org/jira/browse/HUDI-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-803. Resolution: Fixed Fixed via master: 6a0aa9a645d11ed7b50e18aa0563dafcd9d145f7 > Improve Unit test coverage of HoodieAvroUtils around default values > --- > > Key: HUDI-803 > URL: https://issues.apache.org/jira/browse/HUDI-803 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Testing >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Recently there has been lot of work and improvements around schema evolution > and HoodieAvroUtils class in particular. Few bugs have already been fixed > around this. With the version bump of avro from 1.7.7 to 1.8.2, the flow > around default values of Schema.Field has changed significantly. This Jira > aims to improve the test coverage of HoodieAvroUtils class so that our > functionality remains intact with respect to default values and schema > evolution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-888) NPE when compacting via hudi-cli and providing a compaction props file
[ https://issues.apache.org/jira/browse/HUDI-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-888: --- Status: Closed (was: Patch Available) > NPE when compacting via hudi-cli and providing a compaction props file > -- > > Key: HUDI-888 > URL: https://issues.apache.org/jira/browse/HUDI-888 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: Roland Johann >Priority: Major > Labels: pull-request-available > > When we schedule compaction via hudi-cli and provide compaction props via > `propsFilePath` argument, we get a NPE because the file system has not been > initialized at the constructor of HoodieCompactor.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-858) Allow multiple operations to be executed within a single commit
[ https://issues.apache.org/jira/browse/HUDI-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-858. -- Fixed via master: e6f3bf10cf2c62a1008b82765abdcd33cfd64c67 > Allow multiple operations to be executed within a single commit > --- > > Key: HUDI-858 > URL: https://issues.apache.org/jira/browse/HUDI-858 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > > There are users who had been directly using RDD APIs and have relied on a > behavior in 0.4.x to allow multiple write operations (upsert/buk-insert/...) > to be executed within a single commit. > Given Hudi commit protocol, these are generally unsafe operations and user > need to handle failure scenarios. It only works with COW table. Hudi 0.5.x > had stopped this behavior. > Given the importance of supporting such cases for the user's migration to > 0.5.x, we are proposing a safety flag (disabled by default) which will allow > this old behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-846) Turn on incremental cleaning bu default in 0.6.0
[ https://issues.apache.org/jira/browse/HUDI-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-846. -- > Turn on incremental cleaning bu default in 0.6.0 > > > Key: HUDI-846 > URL: https://issues.apache.org/jira/browse/HUDI-846 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Cleaner >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0, 0.5.3 > > > Incremental cleaner will track commits that have happened since the last > clean operation to figure out partitions which needs to be scanned for > cleaning. This avoids the costly scanning of all partition paths. > Incremental cleaning is currently disabled by default. We need to enable it > by default in 0.6.0. > No special handling is required for upgrade/downgrade scenarios as > incremental cleaning relies on standard format of commit metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-938) Update hudi name in NOTICE file
leesf created HUDI-938: -- Summary: Update hudi name in NOTICE file Key: HUDI-938 URL: https://issues.apache.org/jira/browse/HUDI-938 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-939) Update release scripts
leesf created HUDI-939: -- Summary: Update release scripts Key: HUDI-939 URL: https://issues.apache.org/jira/browse/HUDI-939 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-938) Remove incubating from NOTICE
[ https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-938: --- Summary: Remove incubating from NOTICE (was: Update hudi name in NOTICE file) > Remove incubating from NOTICE > - > > Key: HUDI-938 > URL: https://issues.apache.org/jira/browse/HUDI-938 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-935) update travis name
[ https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-935. -- > update travis name > -- > > Key: HUDI-935 > URL: https://issues.apache.org/jira/browse/HUDI-935 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-935) update travis name
[ https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-935. Resolution: Not A Problem > update travis name > -- > > Key: HUDI-935 > URL: https://issues.apache.org/jira/browse/HUDI-935 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-938) Remove incubating from NOTICE
[ https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-938: -- Assignee: leesf > Remove incubating from NOTICE > - > > Key: HUDI-938 > URL: https://issues.apache.org/jira/browse/HUDI-938 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-928) Consider changes needed in pom.xml to exit incubation
[ https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115115#comment-17115115 ] leesf commented on HUDI-928: also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2 > Consider changes needed in pom.xml to exit incubation > - > > Key: HUDI-928 > URL: https://issues.apache.org/jira/browse/HUDI-928 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-939) Update release scripts
[ https://issues.apache.org/jira/browse/HUDI-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-939. -- Resolution: Fixed also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2 > Update release scripts > -- > > Key: HUDI-939 > URL: https://issues.apache.org/jira/browse/HUDI-939 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Assignee: Suneel Marthi >Priority: Major > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-933) Examine the DOAP file for any necessary changes
[ https://issues.apache.org/jira/browse/HUDI-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-933. -- Resolution: Fixed also fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2 > Examine the DOAP file for any necessary changes > --- > > Key: HUDI-933 > URL: https://issues.apache.org/jira/browse/HUDI-933 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-938) Remove incubating from NOTICE
[ https://issues.apache.org/jira/browse/HUDI-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-938. -- Resolution: Fixed Fixed via master: 492f324bc79febd8299fbb837b67c900ace18ac2 > Remove incubating from NOTICE > - > > Key: HUDI-938 > URL: https://issues.apache.org/jira/browse/HUDI-938 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-928) Consider changes needed in pom.xml to exit incubation
[ https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-928: --- Status: Closed (was: Patch Available) > Consider changes needed in pom.xml to exit incubation > - > > Key: HUDI-928 > URL: https://issues.apache.org/jira/browse/HUDI-928 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-926) Removing DISCLAIMER from the repo
[ https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-926: -- Assignee: leesf > Removing DISCLAIMER from the repo > - > > Key: HUDI-926 > URL: https://issues.apache.org/jira/browse/HUDI-926 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > > We need to understand if we still need the DISCLAIMER placed in the code > repo.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-935) update travis name
[ https://issues.apache.org/jira/browse/HUDI-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115081#comment-17115081 ] leesf commented on HUDI-935: yes, new PRs are good. we should change the incubator-hudi to hudi for old PRs manually. > update travis name > -- > > Key: HUDI-935 > URL: https://issues.apache.org/jira/browse/HUDI-935 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-926) Removing DISCLAIMER from the repo
[ https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-926. -- > Removing DISCLAIMER from the repo > - > > Key: HUDI-926 > URL: https://issues.apache.org/jira/browse/HUDI-926 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > Labels: pull-request-available > > We need to understand if we still need the DISCLAIMER placed in the code > repo.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HUDI-928) Consider changes needed in pom.xml to exit incubation
[ https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reopened HUDI-928: > Consider changes needed in pom.xml to exit incubation > - > > Key: HUDI-928 > URL: https://issues.apache.org/jira/browse/HUDI-928 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-928) Consider changes needed in pom.xml to exit incubation
[ https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-928. Resolution: Fixed > Consider changes needed in pom.xml to exit incubation > - > > Key: HUDI-928 > URL: https://issues.apache.org/jira/browse/HUDI-928 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-928) Consider changes needed in pom.xml to exit incubation
[ https://issues.apache.org/jira/browse/HUDI-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-928. -- > Consider changes needed in pom.xml to exit incubation > - > > Key: HUDI-928 > URL: https://issues.apache.org/jira/browse/HUDI-928 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-926) Removing DISCLAIMER from the repo
[ https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-926. Resolution: Fixed Fixed via master: f22c3e933e828d1342bed67874b9ab3fee0ad099 > Removing DISCLAIMER from the repo > - > > Key: HUDI-926 > URL: https://issues.apache.org/jira/browse/HUDI-926 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > Labels: pull-request-available > > We need to understand if we still need the DISCLAIMER placed in the code > repo.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-926) Removing DISCLAIMER from the repo
[ https://issues.apache.org/jira/browse/HUDI-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-926: --- Fix Version/s: 0.5.3 > Removing DISCLAIMER from the repo > - > > Key: HUDI-926 > URL: https://issues.apache.org/jira/browse/HUDI-926 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.5.3 > > > We need to understand if we still need the DISCLAIMER placed in the code > repo.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org
[ https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-927: --- Status: Open (was: New) > https://hudi.incubator.apache.org should auto redirect to > https://hudi.apache.org > - > > Key: HUDI-927 > URL: https://issues.apache.org/jira/browse/HUDI-927 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > > This is still not happening.. need to wait for few days out a bit and if not > working still, raise a INFRA jira.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org
[ https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-927. -- > https://hudi.incubator.apache.org should auto redirect to > https://hudi.apache.org > - > > Key: HUDI-927 > URL: https://issues.apache.org/jira/browse/HUDI-927 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Fix For: 0.5.3 > > > This is still not happening.. need to wait for few days out a bit and if not > working still, raise a INFRA jira.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-927) https://hudi.incubator.apache.org should auto redirect to https://hudi.apache.org
[ https://issues.apache.org/jira/browse/HUDI-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-927. Fix Version/s: 0.5.3 Resolution: Fixed > https://hudi.incubator.apache.org should auto redirect to > https://hudi.apache.org > - > > Key: HUDI-927 > URL: https://issues.apache.org/jira/browse/HUDI-927 > Project: Apache Hudi > Issue Type: Sub-task > Components: Release Administrative >Reporter: Vinoth Chandar >Assignee: Suneel Marthi >Priority: Major > Fix For: 0.5.3 > > > This is still not happening.. need to wait for few days out a bit and if not > working still, raise a INFRA jira.. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-304) Bring back spotless plugin
[ https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115345#comment-17115345 ] leesf commented on HUDI-304: [~shivnarayan] sorry, I do have much time to focus on the PR recently and make it HELP-WANTED if anyone wants to work on issue. > Bring back spotless plugin > --- > > Key: HUDI-304 > URL: https://issues.apache.org/jira/browse/HUDI-304 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, Testing >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Major > Labels: bug-bash-0.6.0, help-wanted, pull-request-available > Fix For: 0.6.0 > > Time Spent: 10m > Remaining Estimate: 0h > > spotless plugin has been turned off as the eclipse style format it was > referencing was removed due to compliance reasons. > We use google style eclipse format with some changes > 90c90 > < > --- > > > 242c242 > < value="100"/> > --- > > > value="120"/> > > The eclipse style sheet was originally obtained from > [https://github.com/google/styleguide] which CC -By 3.0 license which is not > compatible for source distribution (See > [https://www.apache.org/legal/resolved.html#cc-by]) > > We need to figure out a way to bring this back > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-304) Bring back spotless plugin
[ https://issues.apache.org/jira/browse/HUDI-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-304: --- Labels: bug-bash-0.6.0 help-wanted pull-request-available (was: bug-bash-0.6.0 pull-request-available) > Bring back spotless plugin > --- > > Key: HUDI-304 > URL: https://issues.apache.org/jira/browse/HUDI-304 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, Testing >Reporter: Balaji Varadarajan >Assignee: leesf >Priority: Major > Labels: bug-bash-0.6.0, help-wanted, pull-request-available > Fix For: 0.6.0 > > Time Spent: 10m > Remaining Estimate: 0h > > spotless plugin has been turned off as the eclipse style format it was > referencing was removed due to compliance reasons. > We use google style eclipse format with some changes > 90c90 > < > --- > > > 242c242 > < value="100"/> > --- > > > value="120"/> > > The eclipse style sheet was originally obtained from > [https://github.com/google/styleguide] which CC -By 3.0 license which is not > compatible for source distribution (See > [https://www.apache.org/legal/resolved.html#cc-by]) > > We need to figure out a way to bring this back > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-935) update travis name
leesf created HUDI-935: -- Summary: update travis name Key: HUDI-935 URL: https://issues.apache.org/jira/browse/HUDI-935 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable
[ https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-819. -- > missing write status in MergeOnReadLazyInsertIterable > - > > Key: HUDI-819 > URL: https://issues.apache.org/jira/browse/HUDI-819 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: satish >Assignee: satish >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Variable declared > [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53] > masks protected statuses variable. > So although hoodie writes data, will not include writestatus in the completed > section. This can cause duplicates being written -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable
[ https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-819: --- Status: Open (was: New) > missing write status in MergeOnReadLazyInsertIterable > - > > Key: HUDI-819 > URL: https://issues.apache.org/jira/browse/HUDI-819 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: satish >Assignee: satish >Priority: Major > Labels: pull-request-available > > Variable declared > [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53] > masks protected statuses variable. > So although hoodie writes data, will not include writestatus in the completed > section. This can cause duplicates being written -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-819) missing write status in MergeOnReadLazyInsertIterable
[ https://issues.apache.org/jira/browse/HUDI-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-819. Fix Version/s: 0.6.0 Resolution: Fixed > missing write status in MergeOnReadLazyInsertIterable > - > > Key: HUDI-819 > URL: https://issues.apache.org/jira/browse/HUDI-819 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: satish >Assignee: satish >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Variable declared > [here|https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/execution/MergeOnReadLazyInsertIterable.java#L53] > masks protected statuses variable. > So although hoodie writes data, will not include writestatus in the completed > section. This can cause duplicates being written -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode
[ https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-850. -- > Avoid unnecessary listings in incremental cleaning mode > --- > > Key: HUDI-850 > URL: https://issues.apache.org/jira/browse/HUDI-850 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Cleaner, Performance >Reporter: Vinoth Chandar >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Came up during https://github.com/apache/incubator-hudi/issues/1552 > Even with incremental cleaning turned on, we would have a scenario where > there are no commits yet to clean, but we end up listing needlessly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode
[ https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-850. Resolution: Fixed > Avoid unnecessary listings in incremental cleaning mode > --- > > Key: HUDI-850 > URL: https://issues.apache.org/jira/browse/HUDI-850 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Cleaner, Performance >Reporter: Vinoth Chandar >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Came up during https://github.com/apache/incubator-hudi/issues/1552 > Even with incremental cleaning turned on, we would have a scenario where > there are no commits yet to clean, but we end up listing needlessly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-850) Avoid unnecessary listings in incremental cleaning mode
[ https://issues.apache.org/jira/browse/HUDI-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-850: --- Status: Open (was: New) > Avoid unnecessary listings in incremental cleaning mode > --- > > Key: HUDI-850 > URL: https://issues.apache.org/jira/browse/HUDI-850 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Cleaner, Performance >Reporter: Vinoth Chandar >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Came up during https://github.com/apache/incubator-hudi/issues/1552 > Even with incremental cleaning turned on, we would have a scenario where > there are no commits yet to clean, but we end up listing needlessly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1225. --- > Avro Date logical type not handled correctly when converting to Spark Row > - > > Key: HUDI-1225 > URL: https://issues.apache.org/jira/browse/HUDI-1225 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > [https://github.com/apache/hudi/issues/2034] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1225. - Resolution: Fixed > Avro Date logical type not handled correctly when converting to Spark Row > - > > Key: HUDI-1225 > URL: https://issues.apache.org/jira/browse/HUDI-1225 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > [https://github.com/apache/hudi/issues/2034] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1268) Fix UpgradeDowngrade Rename Exception in aliyun OSS
leesf created HUDI-1268: --- Summary: Fix UpgradeDowngrade Rename Exception in aliyun OSS Key: HUDI-1268 URL: https://issues.apache.org/jira/browse/HUDI-1268 Project: Apache Hudi Issue Type: Bug Components: Writer Core Reporter: leesf Fix For: 0.6.1 when using HoodieWriteClient API to write data to hudi with following config: ``` Properties properties = new Properties(); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, tableName); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, tableType.name()); properties.setProperty(HoodieTableConfig.HOODIE_PAYLOAD_CLASS_PROP_NAME, OverwriteWithLatestAvroPayload.class.getName()); properties.setProperty(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived"); return HoodieTableMetaClient.initTableAndGetMetaClient(hadoopConf, basePath, properties); ``` the exception will be thrown with FileAlreadyExistsException in aliyun OSS, after debugging, it is the following code throws the exception. ``` // Rename the .updated file to hoodie.properties. This is atomic in hdfs, but not in cloud stores. // But as long as this does not leave a partial hoodie.properties file, we are okay. fs.rename(updatedPropsFilePath, propsFilePath); ``` however, we would ignore the FileAlreadyExistsException since hoodie.properties already exists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-802. Resolution: Fixed > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.1 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-802. -- > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.1 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1181. - Fix Version/s: 0.6.1 Resolution: Fixed > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1181. --- > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1181: Status: Open (was: New) > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1130: Status: Open (was: New) > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1254. --- > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1254. - Resolution: Fixed > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1130. - Fix Version/s: 0.6.1 Resolution: Fixed > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1254: Status: Open (was: New) > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1130. --- > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1255. --- > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1227) Document the usage of CLI
leesf created HUDI-1227: --- Summary: Document the usage of CLI Key: HUDI-1227 URL: https://issues.apache.org/jira/browse/HUDI-1227 Project: Apache Hudi Issue Type: Bug Components: CLI Reporter: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1227) Document the usage of CLI
[ https://issues.apache.org/jira/browse/HUDI-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1227: Issue Type: Improvement (was: Bug) > Document the usage of CLI > - > > Key: HUDI-1227 > URL: https://issues.apache.org/jira/browse/HUDI-1227 > Project: Apache Hudi > Issue Type: Improvement > Components: CLI >Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1234) Insert new records regardless of small file when using insert operation
leesf created HUDI-1234: --- Summary: Insert new records regardless of small file when using insert operation Key: HUDI-1234 URL: https://issues.apache.org/jira/browse/HUDI-1234 Project: Apache Hudi Issue Type: Bug Components: Writer Core Reporter: leesf context here [https://github.com/apache/hudi/issues/2051] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1231) Duplicate record while querying from hive synced table
[ https://issues.apache.org/jira/browse/HUDI-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186874#comment-17186874 ] leesf commented on HUDI-1231: - [~vbalaji] would you please take a look > Duplicate record while querying from hive synced table > -- > > Key: HUDI-1231 > URL: https://issues.apache.org/jira/browse/HUDI-1231 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ashok Kumar >Priority: Major > > I am writting in upsert mode with precombine flag enabled. Still when i query > i see same record available 3 times in same parquet file > > spark.sql("select > _hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name > from hudi5_mor_ro where id1=1086187 and timestamp=1598461500 and > _hoodie_record_key='timestamp:1598461500,id1:1086187,id2:1872725,flowId:23'").show(10,false) > > +--+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name| > +--+ > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > +--+ > > This issue i am getting with both kind of table i.e COW and MOR. > I have tried it 0.6.3 version but i had tried 0.5.3 and in that also this bug > was coming. > This issue is not coming with small data set. > > Strange thing is when i query only parquet file it gives only one record(i.e > correct) > df.filter(col("_hoodie_record_key")==="timestamp:1598461500,id1:1086187,id2:1872725,flowId:23").count > res13: Long = 1 > > Note: > When i query filesystem, its fine. > This issue i see when i query from hive synced table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem
[ https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1025: Status: Open (was: New) > Meter RPC calls in HoodieWrapperFileSystem > -- > > Key: HUDI-1025 > URL: https://issues.apache.org/jira/browse/HUDI-1025 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.1 > > > Hudi issues very a large number of RPC calls to DFS. When making changes to > Hudi, we try to ensure that the number of RPC calls does not increase > appreciably, as this could impact the DFS. > We should therefore meter HoodieWrapperFileSystem so that we can track the > RPC calls. This will help in service observability / SLA tracking and will > make it easier to tell when change results in increased RPC load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem
[ https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1025. --- > Meter RPC calls in HoodieWrapperFileSystem > -- > > Key: HUDI-1025 > URL: https://issues.apache.org/jira/browse/HUDI-1025 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.1 > > > Hudi issues very a large number of RPC calls to DFS. When making changes to > Hudi, we try to ensure that the number of RPC calls does not increase > appreciably, as this could impact the DFS. > We should therefore meter HoodieWrapperFileSystem so that we can track the > RPC calls. This will help in service observability / SLA tracking and will > make it easier to tell when change results in increased RPC load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem
[ https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1025. - Resolution: Fixed > Meter RPC calls in HoodieWrapperFileSystem > -- > > Key: HUDI-1025 > URL: https://issues.apache.org/jira/browse/HUDI-1025 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.1 > > > Hudi issues very a large number of RPC calls to DFS. When making changes to > Hudi, we try to ensure that the number of RPC calls does not increase > appreciably, as this could impact the DFS. > We should therefore meter HoodieWrapperFileSystem so that we can track the > RPC calls. This will help in service observability / SLA tracking and will > make it easier to tell when change results in increased RPC load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1025) Meter RPC calls in HoodieWrapperFileSystem
[ https://issues.apache.org/jira/browse/HUDI-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1025: Fix Version/s: 0.6.1 > Meter RPC calls in HoodieWrapperFileSystem > -- > > Key: HUDI-1025 > URL: https://issues.apache.org/jira/browse/HUDI-1025 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.1 > > > Hudi issues very a large number of RPC calls to DFS. When making changes to > Hudi, we try to ensure that the number of RPC calls does not increase > appreciably, as this could impact the DFS. > We should therefore meter HoodieWrapperFileSystem so that we can track the > RPC calls. This will help in service observability / SLA tracking and will > make it easier to tell when change results in increased RPC load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1083: Status: Open (was: New) > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1083. --- > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1083. - Resolution: Fixed > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1083: Fix Version/s: 0.6.1 > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1197) Fix build issue in scala 2.12
[ https://issues.apache.org/jira/browse/HUDI-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1197. --- > Fix build issue in scala 2.12 > - > > Key: HUDI-1197 > URL: https://issues.apache.org/jira/browse/HUDI-1197 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > During release process ran into build issues with scala 2.12 in > HoodieWriterUtils. The error msg looks like below: > [ERROR] > /...hudi/hudi-spark/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:32: > error: reference to mapAsJavaMap is ambiguous; > [ERROR] it is imported twice in the same scope by > [ERROR] import scala.collection.JavaConverters._ > [ERROR] and import scala.collection.JavaConversions._ > [ERROR] mapAsJavaMap(parametersWithWriteDefaults(parameters.asScala.toMap)) > [ERROR] ^ > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1188. --- > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1177. --- > fix TimestampBasedKeyGenerator Task not serializableException > -- > > Key: HUDI-1177 > URL: https://issues.apache.org/jira/browse/HUDI-1177 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: liujinhui >Assignee: Pratyaksh Sharma >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1188. - Fix Version/s: 0.6.1 Resolution: Fixed > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1188: Status: Open (was: New) > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table
[ https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1161: --- Assignee: Nicholas Jiang (was: leesf) > Support update partial fields for MoR table > --- > > Key: HUDI-1161 > URL: https://issues.apache.org/jira/browse/HUDI-1161 > Project: Apache Hudi > Issue Type: Sub-task > Components: Writer Core >Reporter: leesf >Assignee: Nicholas Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-972) Update hudi logo
[ https://issues.apache.org/jira/browse/HUDI-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118297#comment-17118297 ] leesf commented on HUDI-972: hi [~shivnarayan] the logo has been updated. you need refresh the website. > Update hudi logo > > > Key: HUDI-972 > URL: https://issues.apache.org/jira/browse/HUDI-972 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: sivabalan narayanan >Assignee: leesf >Priority: Major > Attachments: Screen Shot 2020-05-28 at 12.10.12 AM.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-974) Fields out of order in MOR mode when using Hive
leesf created HUDI-974: -- Summary: Fields out of order in MOR mode when using Hive Key: HUDI-974 URL: https://issues.apache.org/jira/browse/HUDI-974 Project: Apache Hudi Issue Type: Bug Components: Hive Integration Reporter: leesf Assignee: liwei Fix For: 0.6.0 Attachments: image-2020-05-28-21-06-02-396.png, image-2020-05-28-21-07-30-803.png When querying MOR hudi dataset via hive hive table: CREATE EXTERNAL TABLE `unknown_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `age` bigint, `name` string, `sex` string, `ts` bigint) PARTITIONED BY ( `location` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/Users/sflee/personal/backup_demo' TBLPROPERTIES ( 'last_commit_time_sync'='20200528153331', 'transient_lastDdlTime'='1590650733') sql: set hoodie.realtime.merge.skip = true; select sex, name, age from unknown_rt; result: !image-2020-05-28-21-06-02-396.png! the fields is out of order when setting hoodie.realtime.merge.skip = true; sql: set hoodie.realtime.merge.skip = false; select sex, name, age from unknown_rt !image-2020-05-28-21-07-30-803.png! query result is ok when setting hoodie.realtime.merge.skip = false; after debugging, I found that hudi use getWriterSchema in RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-974) Fields out of order in MOR mode when using Hive
[ https://issues.apache.org/jira/browse/HUDI-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-974: --- Description: When querying MOR hudi dataset via hive hive table: CREATE EXTERNAL TABLE `unknown_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `age` bigint, `name` string, `sex` string, `ts` bigint) PARTITIONED BY ( `location` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/Users/sflee/personal/backup_demo' TBLPROPERTIES ( 'last_commit_time_sync'='20200528153331', 'transient_lastDdlTime'='1590650733') sql: set hoodie.realtime.merge.skip = true; select sex, name, age from unknown_rt; result: !image-2020-05-28-21-06-02-396.png! the fields is out of order when setting hoodie.realtime.merge.skip = true; sql: set hoodie.realtime.merge.skip = false; select sex, name, age from unknown_rt !image-2020-05-28-21-07-30-803.png! query result is ok when setting hoodie.realtime.merge.skip = false; after debugging, I found that hudi use getWriterSchema in RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it. cc [~vbalaji] was: When querying MOR hudi dataset via hive hive table: CREATE EXTERNAL TABLE `unknown_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `age` bigint, `name` string, `sex` string, `ts` bigint) PARTITIONED BY ( `location` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/Users/sflee/personal/backup_demo' TBLPROPERTIES ( 'last_commit_time_sync'='20200528153331', 'transient_lastDdlTime'='1590650733') sql: set hoodie.realtime.merge.skip = true; select sex, name, age from unknown_rt; result: !image-2020-05-28-21-06-02-396.png! the fields is out of order when setting hoodie.realtime.merge.skip = true; sql: set hoodie.realtime.merge.skip = false; select sex, name, age from unknown_rt !image-2020-05-28-21-07-30-803.png! query result is ok when setting hoodie.realtime.merge.skip = false; after debugging, I found that hudi use getWriterSchema in RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it. > Fields out of order in MOR mode when using Hive > --- > > Key: HUDI-974 > URL: https://issues.apache.org/jira/browse/HUDI-974 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: leesf >Assignee: liwei >Priority: Major > Fix For: 0.6.0 > > Attachments: image-2020-05-28-21-06-02-396.png, > image-2020-05-28-21-07-30-803.png > > > When querying MOR hudi dataset via hive > hive table: > CREATE EXTERNAL TABLE `unknown_rt`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `age` bigint, > `name` string, > `sex` string, > `ts` bigint) > PARTITIONED BY ( > `location` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'file:/Users/sflee/personal/backup_demo' > TBLPROPERTIES ( > 'last_commit_time_sync'='20200528153331', > 'transient_lastDdlTime'='1590650733') > > sql: > set hoodie.realtime.merge.skip = true; > select sex, name, age from unknown_rt; > result: > !image-2020-05-28-21-06-02-396.png! > the fields is out of order when setting hoodie.realtime.merge.skip = true; > sql: > set hoodie.realtime.merge.skip = false; > select sex, name, age from unknown_rt > !image-2020-05-28-21-07-30-803.png! > query result is ok when setting hoodie.realtime.merge.skip = false; > after debugging, I found that hudi use getWriterSchema in > RealtimeUnmergedRecordReader instead of getHiveSchema, we need fix it. > > cc [~vbalaji] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-786) InlineFileSystem.read API should ensure content beyond inline length gets an EOF
[ https://issues.apache.org/jira/browse/HUDI-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-786. Resolution: Fixed Fixed via master: 5a0d3f1cf963e0061364d915ac86a465dd079bac > InlineFileSystem.read API should ensure content beyond inline length gets an > EOF > > > Key: HUDI-786 > URL: https://issues.apache.org/jira/browse/HUDI-786 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Major > Labels: bug-bash-0.6.0, pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > While trying to investigate a flaky test, noticed that the readFully() just > proceeds to read bytes from the outerStream without any bounds checking > {code} > @Override > public void readFully(long position, byte[] buffer, int offset, int length) > throws IOException { > if ((length - offset) > this.length) { > throw new IOException("Attempting to read past inline content"); > } > outerStream.readFully(startOffset + position, buffer, offset, length); > } > @Override > public void readFully(long position, byte[] buffer) > throws IOException { > readFully(position, buffer, 0, buffer.length); > } > {code} > we need to throw an error for buffers that are trying to read past the inline > content.. (potentially buggy) example shown above. > I have also ignored the TestInlineFileSystem#testFileSystemAPIs() ... we need > to make a change to respect suffix length (we randomly generate) while > attempting to read past the 1000 bytes of inline content.. > {code} > actualBytes = new byte[1000 + outerPathInfo.suffixLength]; > fsDataInputStream.readFully(0, actualBytes); > verifyArrayEquality(outerPathInfo.expectedBytes, 0, 1000, actualBytes, 0, > 1000); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-476) Add a hudi-examples module
[ https://issues.apache.org/jira/browse/HUDI-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-476. Resolution: Fixed Fixed via master: bde7a7043e100242fec8fc0111e489a269a1d997 > Add a hudi-examples module > -- > > Key: HUDI-476 > URL: https://issues.apache.org/jira/browse/HUDI-476 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: dengziming >Assignee: dengziming >Priority: Major > Labels: pull-request-available > > add a hudi-examples module to add some examples code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-973) RemoteHoodieTableFileSystemView supports non-partitioned table queries
[ https://issues.apache.org/jira/browse/HUDI-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-973. -- > RemoteHoodieTableFileSystemView supports non-partitioned table queries > -- > > Key: HUDI-973 > URL: https://issues.apache.org/jira/browse/HUDI-973 > Project: Apache Hudi > Issue Type: Bug >Reporter: dzcxzl >Assignee: Balaji Varadarajan >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.3 > > > When hoodie.embed.timeline.server = true, the written table is a > non-partitioned table, will get an exception. > > {code:java} > io.javalin.BadRequestResponse: Query parameter 'partition' with value '' > cannot be null or empty > at io.javalin.validation.TypedValidator.getOrThrow(Validator.kt:25) > at > org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerDataFilesAPI$3(FileSystemViewHandler.java:172) > {code} > > Because api checks whether the value of partition is null or empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-936) Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove unnecessary casting to String
[ https://issues.apache.org/jira/browse/HUDI-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-936. -- > Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove > unnecessary casting to String > -- > > Key: HUDI-936 > URL: https://issues.apache.org/jira/browse/HUDI-936 > Project: Apache Hudi > Issue Type: Bug >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-936) Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove unnecessary casting to String
[ https://issues.apache.org/jira/browse/HUDI-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-936. Resolution: Fixed Fixed via master: 9697fbf71ead328cae6d56e9f99872e871342887 > Fix fetching orderingval logic in HoodieSparkSqlWriter.write(...) to remove > unnecessary casting to String > -- > > Key: HUDI-936 > URL: https://issues.apache.org/jira/browse/HUDI-936 > Project: Apache Hudi > Issue Type: Bug >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-476) Add a hudi-examples module
[ https://issues.apache.org/jira/browse/HUDI-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-476. -- > Add a hudi-examples module > -- > > Key: HUDI-476 > URL: https://issues.apache.org/jira/browse/HUDI-476 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: dengziming >Assignee: dengziming >Priority: Major > Labels: pull-request-available > > add a hudi-examples module to add some examples code -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-690. Resolution: Fixed Fixed via master: 6c450957ced051de6231ad047bce22752210b786 > filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR > tables > > > Key: HUDI-690 > URL: https://issues.apache.org/jira/browse/HUDI-690 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jasmine Omeke >Assignee: Raymond Xu >Priority: Major > Labels: bug-bash-0.6.0, pull-request-available > Fix For: 0.6.0 > > > Hi. I encountered an error while using the HudiSnapshotCopier class to make a > Backup of merge on read tables: > [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java] > > The error: > > {code:java} > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from > /.hoodie/hoodie.properties > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ from > 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants > java.util.stream.ReferencePipeline$Head@77f7352a > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) > with ID 2 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has > registered (new total is 1) > 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager > ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, > BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283 > 1, None) > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) > with ID 4 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has > registered (new total is 2)Exception in thread "main" > java.lang.IllegalStateException: Hudi File Id > (HoodieFileGroupId{partitionPath='created_at_month=2020-03', > fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending > compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", > "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496", > ".7104bb0b-20f6-4dec-981b-c11 > bf20ade4a-0_20200308213934.log.2_3-761601-172985464", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377- > 177872977", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"], > "dataFilePath": > "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet", > "fileId": "7 > 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": > "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, > "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, > "TOTAL_IO_WRITE_MB": 512.0, > "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), > (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814", > ".7104bb0b-20f6-4dec-981b-c11bf20ad > e4a-0_20200308180755.log.4_3-727192-165430450", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"], > "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2 > 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", > "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": > 5.0, "TOTAL_IO_READ_MB": 5
[jira] [Closed] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-690. -- > filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR > tables > > > Key: HUDI-690 > URL: https://issues.apache.org/jira/browse/HUDI-690 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jasmine Omeke >Assignee: Raymond Xu >Priority: Major > Labels: bug-bash-0.6.0, pull-request-available > Fix For: 0.6.0 > > > Hi. I encountered an error while using the HudiSnapshotCopier class to make a > Backup of merge on read tables: > [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java] > > The error: > > {code:java} > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from > /.hoodie/hoodie.properties > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ from > 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants > java.util.stream.ReferencePipeline$Head@77f7352a > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) > with ID 2 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has > registered (new total is 1) > 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager > ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, > BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283 > 1, None) > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) > with ID 4 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has > registered (new total is 2)Exception in thread "main" > java.lang.IllegalStateException: Hudi File Id > (HoodieFileGroupId{partitionPath='created_at_month=2020-03', > fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending > compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", > "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496", > ".7104bb0b-20f6-4dec-981b-c11 > bf20ade4a-0_20200308213934.log.2_3-761601-172985464", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377- > 177872977", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"], > "dataFilePath": > "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet", > "fileId": "7 > 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": > "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, > "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, > "TOTAL_IO_WRITE_MB": 512.0, > "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), > (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814", > ".7104bb0b-20f6-4dec-981b-c11bf20ad > e4a-0_20200308180755.log.4_3-727192-165430450", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"], > "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2 > 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", > "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": > 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": > 44197.0, "
[jira] [Closed] (HUDI-980) Some remnants after running TestHiveSyncTool is created in local source dir
[ https://issues.apache.org/jira/browse/HUDI-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-980. -- > Some remnants after running TestHiveSyncTool is created in local source dir > --- > > Key: HUDI-980 > URL: https://issues.apache.org/jira/browse/HUDI-980 > Project: Apache Hudi > Issue Type: Bug > Components: Release Administrative >Affects Versions: 0.5.3 >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > After running tests in TestHiveSyncTool, metadatastore_db directory is > created in hudi-hive-sync/ . Need to fix this to be generated under the work > dir created as part of the test. > > This in turn creates issues while compiling, due to license header missing in > these generated files. > > ``` > [INFO] hudi-integ-test SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 20.168 s > [INFO] Finished at: 2020-05-29T11:50:08-04:00 > [INFO] > > [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.12:check > (default) on project hudi-hive: Too many files with unapproved license: 5 See > RAT report in: > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/target/rat.txt > -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > ``` > Contents of rat file > ``` > 5 Unknown Licenses > * > Files with unapproved licenses: > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/db.lck > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/seg0/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/log/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/service.properties > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-980) Some remnants after running TestHiveSyncTool is created in local source dir
[ https://issues.apache.org/jira/browse/HUDI-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-980: --- Fix Version/s: 0.6.0 > Some remnants after running TestHiveSyncTool is created in local source dir > --- > > Key: HUDI-980 > URL: https://issues.apache.org/jira/browse/HUDI-980 > Project: Apache Hudi > Issue Type: Bug > Components: Release Administrative >Affects Versions: 0.5.3 >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > After running tests in TestHiveSyncTool, metadatastore_db directory is > created in hudi-hive-sync/ . Need to fix this to be generated under the work > dir created as part of the test. > > This in turn creates issues while compiling, due to license header missing in > these generated files. > > ``` > [INFO] hudi-integ-test SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 20.168 s > [INFO] Finished at: 2020-05-29T11:50:08-04:00 > [INFO] > > [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.12:check > (default) on project hudi-hive: Too many files with unapproved license: 5 See > RAT report in: > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/target/rat.txt > -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > ``` > Contents of rat file > ``` > 5 Unknown Licenses > * > Files with unapproved licenses: > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/db.lck > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/seg0/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/log/README_DO_NOT_TOUCH_FILES.txt > > /Users/sivabala/Documents/personal/projects/siva_hudi/apache_hudi/hudi/hudi-hive/metastore_db/service.properties > ``` > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198992#comment-17198992 ] leesf commented on HUDI-1288: - [~soltar] I found there are still some users face the issue https://github.com/apache/avro/pull/290#issuecomment-625731714. and does 0.5.2-incubating works well? > DeltaSync:writeToSink fails with Unknown datum type > org.apache.avro.JsonProperties$Null > --- > > Key: HUDI-1288 > URL: https://issues.apache.org/jira/browse/HUDI-1288 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Michal Swiatowy >Priority: Major > > After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into > following error message on write to HDFS: > {code:java} > 2020-09-18 12:54:38,651 [Driver] INFO > HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing > Table of type MERGE_ON_READ from > /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC > 2020-09-18 12:54:38,663 [Driver] INFO DeltaSync:setupWriteClient:470 - > Setting up Hoodie Write Client > 2020-09-18 12:54:38,695 [Driver] INFO DeltaSync:registerAvroSchemas:522 - > Registering Schema > :[{"type":"record","name":"Value","namespace":"ARC_6FQS_W.dbo.S_INCOMINGMESSAGEDETAIL","fields":[{"name":"ID","type":"long"},{"name":"OPTIMISTICLOCK","type":{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}},{"name":"DOCUMENTAMOUNT","type":["null",{"type":"bytes","scale":4,"precision":17,"connect.version":1,"connect.parameters":{"scale":"4","connect.decimal.precision":"17"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null},{"name":"DOCUMENTDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DOCUMENTNUMBER","type":["null","string"],"default":null},{"name":"PAYMENTTYPE","type":["null","string"],"default":null},{"name":"PURCHASEORDERNUMBER","type":["null","string"],"default":null},{"name":"VALUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"INCOMINGMESSAGEHEADERID","type":["null","long"],"default":null},{"name":"MESSAGETEXTID","type":["null","long"],"default":null},{"name":"DUEDATE","type":["null",{"type":"long","connect.version":1,"connect.name":"io.debezium.time.Timestamp"}],"default":null},{"name":"DEBTORASCNUMBER","type":["null","string"],"default":null},{"name":"DOCUMENTTYPE","type":["null","string"],"default":null},{"name":"NUMBEROFDUEDATES","type":["null","string"],"default":null},{"name":"DUEDATEINDICATOR","type":["null","string"],"default":null},{"name":"DISPUTECODE","type":["null","string"],"default":null},{"name":"INSTRUCTIONCODE","type":["null","string"],"default":null},{"name":"PAYMENTTERMS","type":["null","string"],"default":null},{"name":"PAYMENTCONDITION","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS1","type":["null","string"],"default":null},{"name":"DISCOUNTDAYS2","type":["null","string"],"default":null},{"name":"ERRORID","type
[jira] [Updated] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1124: Status: Open (was: New) > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1124. --- > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1124. - Fix Version/s: 0.6.0 Resolution: Fixed > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1123: Status: Open (was: New) > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1123. - Fix Version/s: 0.6.0 Resolution: Fixed > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1123. --- > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement > Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1287) Make deltastrmer supports custom ETL transformer
[ https://issues.apache.org/jira/browse/HUDI-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198994#comment-17198994 ] leesf commented on HUDI-1287: - [~liujinhui] DeltaStreamer should support user custom Transformer. you would just implement your own transformer to implement Transformer interface. > Make deltastrmer supports custom ETL transformer > > > Key: HUDI-1287 > URL: https://issues.apache.org/jira/browse/HUDI-1287 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: liujinhui >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1087) Realtime Record Reader needs to handle decimal types
[ https://issues.apache.org/jira/browse/HUDI-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1087. --- > Realtime Record Reader needs to handle decimal types > > > Key: HUDI-1087 > URL: https://issues.apache.org/jira/browse/HUDI-1087 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration >Reporter: Balaji Varadarajan >Assignee: Wenning Ding >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > For MOR, Realtime queries, decimal types are not getting handled correctly > resulting in the following exception: > > > {{scala> spark.sql("select * from testTable_rt").show > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveDecimalObjectInspector.getPrimitiveWritableObject(WritableHiveDecimalObjectInspector.java:41) > at > org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:107) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:291) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:283) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748)}} > {{}} > {{Issue : [https://github.com/apache/hudi/issues/1790]}} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1112) Blog on Tracking Hudi Data along transaction time and buisness time
[ https://issues.apache.org/jira/browse/HUDI-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1112: --- Assignee: Sandeep Maji > Blog on Tracking Hudi Data along transaction time and buisness time > --- > > Key: HUDI-1112 > URL: https://issues.apache.org/jira/browse/HUDI-1112 > Project: Apache Hudi > Issue Type: Task > Components: Docs >Reporter: Vinoth Chandar >Assignee: Sandeep Maji >Priority: Major > Fix For: 0.6.0 > > > https://github.com/apache/hudi/issues/1705 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1112) Blog on Tracking Hudi Data along transaction time and buisness time
[ https://issues.apache.org/jira/browse/HUDI-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171214#comment-17171214 ] leesf commented on HUDI-1112: - [~nandini57] Assigned to you. > Blog on Tracking Hudi Data along transaction time and buisness time > --- > > Key: HUDI-1112 > URL: https://issues.apache.org/jira/browse/HUDI-1112 > Project: Apache Hudi > Issue Type: Task > Components: Docs >Reporter: Vinoth Chandar >Assignee: Sandeep Maji >Priority: Major > Fix For: 0.6.0 > > > https://github.com/apache/hudi/issues/1705 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-841) Abstract common meta sync module support multiple meta service
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-841. -- > Abstract common meta sync module support multiple meta service > -- > > Key: HUDI-841 > URL: https://issues.apache.org/jira/browse/HUDI-841 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: liwei >Assignee: liwei >Priority: Blocker > Fix For: 0.6.0 > > > Currently Hudi only supports sync dataset metadata to Hive through hive jdbc > and IMetaStoreClient. When you need to sync to other frameworks, such as aws > glue, aliyun DataLake analytics, etc. You need to copy a lot of code from > HoodieHiveClient, which creates a lot of redundant code. So need to redesign > the hudi-hive-sync module to support other frameworks and reuse current code > as much as possible. Only the interface is provided by Hudi, and the > implement is customized by different services as hive 、aws glue、aliyun > DataLake analytics. -- This message was sent by Atlassian Jira (v8.3.4#803005)