date:20210328

[GitHub] [hudi] nsivabalan commented on pull request #2697: [HUDI-1211] clean up spark session for each test of FunctionalTestHar…

2021-03-28 Thread GitBox



nsivabalan commented on pull request #2697:
URL: https://github.com/apache/hudi/pull/2697#issuecomment-809070684


   fyi, we did merge a hot fix couple of days back on similar lines. locally, 
both funcational and Hudi-utilities tests ran parallel and we ran into some 
issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-1734) Hive sync script (run_sync_tool.sh) fails w/ ClassNotFound org/apache/log4j/LogManager

2021-03-28 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-1734:
-

 Summary: Hive sync script (run_sync_tool.sh) fails w/ 
ClassNotFound org/apache/log4j/LogManager
 Key: HUDI-1734
 URL: https://issues.apache.org/jira/browse/HUDI-1734
 Project: Apache Hudi
  Issue Type: Bug
Reporter: sivabalan narayanan


./run_sync_tool_sh --jdbc-url jdbc:hive://dxbigdata102:1000 \ --user appuser \ 
--pass '' \ --base-path 
'hdfs://dxbigdata101:8020/user/hudi/test/data/hudi_trips_cow' \ --database test 
\ --table hudi_trips_cow

 

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/log4j/LogManager
at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:55)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 1 more

 

https://github.com/apache/hudi/issues/2728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1734) Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : org/apache/log4j/LogManager

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1734:
--
Summary: Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : 
org/apache/log4j/LogManager  (was: Hive sync script (run_sync_tool.sh) fails w/ 
ClassNotFound org/apache/log4j/LogManager)

> Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : 
> org/apache/log4j/LogManager
> -
>
> Key: HUDI-1734
> URL: https://issues.apache.org/jira/browse/HUDI-1734
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> ./run_sync_tool_sh --jdbc-url jdbc:hive://dxbigdata102:1000 \ --user appuser 
> \ --pass '' \ --base-path 
> 'hdfs://dxbigdata101:8020/user/hudi/test/data/hudi_trips_cow' \ --database 
> test \ --table hudi_trips_cow
>  
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/log4j/LogManager
> at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:55)
> Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 1 more
>  
> https://github.com/apache/hudi/issues/2728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1734) Hive sync script (run_sync_tool.sh) fails w/ ClassNotFound org/apache/log4j/LogManager

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1734:
--
Labels: sev:critical user-support-issues  (was: )

> Hive sync script (run_sync_tool.sh) fails w/ ClassNotFound 
> org/apache/log4j/LogManager
> --
>
> Key: HUDI-1734
> URL: https://issues.apache.org/jira/browse/HUDI-1734
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> ./run_sync_tool_sh --jdbc-url jdbc:hive://dxbigdata102:1000 \ --user appuser 
> \ --pass '' \ --base-path 
> 'hdfs://dxbigdata101:8020/user/hudi/test/data/hudi_trips_cow' \ --database 
> test \ --table hudi_trips_cow
>  
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/log4j/LogManager
> at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:55)
> Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 1 more
>  
> https://github.com/apache/hudi/issues/2728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1724) run_sync_tool support for hive3.1.2 on hadoop3.1.4

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310413#comment-17310413
 ] 

sivabalan narayanan commented on HUDI-1724:
---

[#2719|https://github.com/apache/hudi/pull/2719]

> run_sync_tool support for hive3.1.2 on hadoop3.1.4
> --
>
> Key: HUDI-1724
> URL: https://issues.apache.org/jira/browse/HUDI-1724
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> Context: https://github.com/apache/hudi/issues/2717



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1724) run_sync_tool support for hive3.1.2 on hadoop3.1.4

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1724:
--
Labels: sev:triage user-support-issues  (was: sev:high user-support-issues)

> run_sync_tool support for hive3.1.2 on hadoop3.1.4
> --
>
> Key: HUDI-1724
> URL: https://issues.apache.org/jira/browse/HUDI-1724
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> Context: https://github.com/apache/hudi/issues/2717



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1572) timeline-server request exception

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310400#comment-17310400
 ] 

sivabalan narayanan commented on HUDI-1572:
---

[~nishith29]: can you please triage this. 

> timeline-server request exception
> -
>
> Key: HUDI-1572
> URL: https://issues.apache.org/jira/browse/HUDI-1572
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Utilities
>Reporter: hushenmin
>Priority: Major
> Attachments: image-2021-02-02-18-08-59-503.png
>
>
> When I use hudi, timeline-service is installed and deployed in a separate 
> mode, but every time I initiate a get/post request, an error will be 
> reported. I found the corresponding source code and found that checkArgument 
> (org.apache.hudi.common.util.ValidationUtils.checkArgument) will always throw 
> an exception. The information returned by the server to me is internal server 
> erro.
> Java stack info :
> java.lang.IllegalArgumentException: Last known instant from client was 0 but 
> server has the following timeline [[20210115214840__commit__COMPLETED], 
> [20210120101841__commit__COMPLETED]] at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>  at 
> org.apache.hudi.timeline.service.FileSystemViewHandler$ViewHandler.handle(FileSystemViewHandler.java:372)
>  at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) at 
> io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) at 
> io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) at 
> io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) at 
> io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) at 
> io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) at 
> io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) at 
> io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) 
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) 
> at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) 
> at 
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at org.eclipse.jetty.server.Server.handle(Server.java:502) at 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) 
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>  at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>  at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>  at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>  at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
>  at java.lang.Thread.run(Thread.java:748) 2021-02-02 16:54:04,417 INFO 
> service.FileSystemViewHandler: TimeTakenMillis[Total=5328, Refresh=179, 
> handle=27, Check=0], Success=false, 
> Query=partition=1007=/user/hushenmin/warehouse/datalake/sampletable, 
> Host=localhost:26754, synced=true 2021-02-02 16:54:59,901 ERROR 
> service.FileSystemViewHandler: Got runtime exception servicing request null 
> java.lang.NullPointerException at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1645)
>  at 
> org.apache.hudi.common.table.view.FileSystemViewManager.getFileSystemView(FileSystemViewManager.java:96)
>  at 
> org.apache.hudi.timeline.service.handlers.TimelineHandler.getLastInstant(TimelineHandler.java:42)
>  at 
> org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerTimelineAPI$0(FileSystemViewHandler.java:148)
>  at 
>

[jira] [Commented] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310398#comment-17310398
 ] 

sivabalan narayanan commented on HUDI-1574:
---

we could do one simple fix.  TestHoodieMultiTableDeltaStreamer extends from  
TestHoodieDeltaStreamer and hence I guess it runs all tests from 
TestHoodieDeltaStreamer again. 

> Trim existing unit tests to finish in much shorter amount of time
> -
>
> Key: HUDI-1574
> URL: https://issues.apache.org/jira/browse/HUDI-1574
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Testing
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.9.0
>
>
> spark-client-tests
> 278.165 s - in org.apache.hudi.table.TestHoodieMergeOnReadTable
> 201.628 s - in org.apache.hudi.metadata.TestHoodieBackedMetadata
> 185.716 s - in org.apache.hudi.client.TestHoodieClientOnCopyOnWriteStorage
> 158.361 s - in org.apache.hudi.index.TestHoodieIndex
> 156.196 s - in org.apache.hudi.table.TestCleaner
> 132.369 s - in 
> org.apache.hudi.table.action.commit.TestCopyOnWriteActionExecutor
> 93.307 s - in org.apache.hudi.table.action.compact.TestAsyncCompaction
> 67.301 s - in org.apache.hudi.table.upgrade.TestUpgradeDowngrade
> 45.794 s - in org.apache.hudi.client.TestHoodieReadClient
> 38.615 s - in org.apache.hudi.index.bloom.TestHoodieBloomIndex
> 31.181 s - in org.apache.hudi.client.TestTableSchemaEvolution
> 20.072 s - in org.apache.hudi.table.action.compact.TestInlineCompaction
> grep " Time elapsed" hudi-client/hudi-spark-client/target/surefire-reports/* 
> | awk -F',' ' { print $5 } ' | awk -F':' ' { print $2 } ' | sort -nr | less
> hudi-utilities
> 209.936 s - in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> 204.653 s - in 
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> 34.116 s - in org.apache.hudi.utilities.sources.TestKafkaSource
> 29.865 s - in org.apache.hudi.utilities.sources.TestParquetDFSSource
> 26.189 s - in 
> org.apache.hudi.utilities.sources.helpers.TestDatePartitionPathSelector
> Other Tests
> 42.595 s - in org.apache.hudi.common.functional.TestHoodieLogFormat
> 38.918 s - in org.apache.hudi.common.bootstrap.TestBootstrapIndex
> 22.046 s - in 
> org.apache.hudi.common.functional.TestHoodieLogFormatAppendFailure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1580) Disaster Recovery in case where HBASE index table becomes unrecoverable

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310395#comment-17310395
 ] 

sivabalan narayanan commented on HUDI-1580:
---

CC [~nishith29]: do we have any such tool within Uber that you can upstream. or 
do we have any options ATM until we have the tool built.

> Disaster Recovery in case where HBASE index table becomes unrecoverable
> ---
>
> Key: HUDI-1580
> URL: https://issues.apache.org/jira/browse/HUDI-1580
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 0.9.0
>Reporter: Ryan Pifer
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> If an HBASE table becomes corrupted for any reason, we should think of a way 
> that we can provide some sort of disaster recovery built in to re-build 
> index. Currently only way to do this without any extra utilities is to 
> rewrite entire dataset.
> Can we create a CLI command which rebuilds just index?
> Can we add checkpointing to hbase table to reduce disaster recovery time? 
> i.e. only re-build index for records written after last checkpoint time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1580) Disaster Recovery in case where HBASE index table becomes unrecoverable

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1580:
--
Labels: sev:high user-support-issues  (was: )

> Disaster Recovery in case where HBASE index table becomes unrecoverable
> ---
>
> Key: HUDI-1580
> URL: https://issues.apache.org/jira/browse/HUDI-1580
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 0.9.0
>Reporter: Ryan Pifer
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> If an HBASE table becomes corrupted for any reason, we should think of a way 
> that we can provide some sort of disaster recovery built in to re-build 
> index. Currently only way to do this without any extra utilities is to 
> rewrite entire dataset.
> Can we create a CLI command which rebuilds just index?
> Can we add checkpointing to hbase table to reduce disaster recovery time? 
> i.e. only re-build index for records written after last checkpoint time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1585) Reduce the coupling of hadoop.

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310394#comment-17310394
 ] 

sivabalan narayanan commented on HUDI-1585:
---

CC : [~nishith29] [~vinoth] your thoughts.

> Reduce the coupling of hadoop.
> --
>
> Key: HUDI-1585
> URL: https://issues.apache.org/jira/browse/HUDI-1585
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Flink Integration
>Reporter: 张超明
>Priority: Major
>
> In my oppions, the hadoop configuration totally read from the actual 
> environment. For example, the {{fs.hdfs.impl}} may be 
> {{org.apache.hadoop.fs.viewfs.ViewFileSyste}} but not 
> {{org.apache.hadoop.hdfs.DistributedFileSystem}}. Now, the implemtation was 
> forced to be set with {{org.apache.hadoop.hdfs.DistributedFileSystem}}, which 
> is not a graceful solution.
> Also, the hadoop dependency scope level should be 'test'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1592) Metadata listing fails for non partitoned dataset

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1592:
--
Labels: sev:high user-support-issues  (was: )

> Metadata listing fails for non partitoned dataset
> -
>
> Key: HUDI-1592
> URL: https://issues.apache.org/jira/browse/HUDI-1592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Affects Versions: 0.7.0
>Reporter: sivabalan narayanan
>Assignee: Prashant Wason
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> https://github.com/apache/hudi/issues/2507



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1593) Add support for "show restore" in hudi-cli

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1593:
-

Assignee: sivabalan narayanan

> Add support for "show restore" in hudi-cli
> --
>
> Key: HUDI-1593
> URL: https://issues.apache.org/jira/browse/HUDI-1593
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.9.0
>
>
> https://github.com/apache/hudi/issues/2072



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1594) Add support for clustering node and validating async operations

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1594.
---
Fix Version/s: 0.8.0
   Resolution: Fixed

> Add support for clustering node and validating async operations
> ---
>
> Key: HUDI-1594
> URL: https://issues.apache.org/jira/browse/HUDI-1594
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Add support for clustering node and validating async operations to test suite 
> job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1594) Add support for clustering node and validating async operations

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1594:
--
Status: In Progress  (was: Open)

> Add support for clustering node and validating async operations
> ---
>
> Key: HUDI-1594
> URL: https://issues.apache.org/jira/browse/HUDI-1594
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add support for clustering node and validating async operations to test suite 
> job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1597) Maven build fail

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310389#comment-17310389
 ] 

sivabalan narayanan commented on HUDI-1597:
---

[~caidezhi655] [~vinoth]: any more pending work on this(like updating README or 
anything. )? I see all related PRs are merged. 

> Maven build fail
> 
>
> Key: HUDI-1597
> URL: https://issues.apache.org/jira/browse/HUDI-1597
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Dezhi Cai
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> maven build fail as spring repo no longer support anonymous download of 
> 3rd-party Maven Central artifacts from 
> [repo.spring.io.|https://repo.spring.io/]
> master branch has been fixed, but other branches still have the problems 
> affected branches:  0.7.0, 0.6.0, 0.5.3, 0.5.2 , 0.5.1
> solution : backport the fix 
> ref link : 
> [https://spring.io/blog/2020/10/29/notice-of-permissions-changes-to-repo-spring-io-fall-and-winter-2020]
>  
> build error:
> [INFO] Total time: 22.357 s (Wall Clock)
>  [INFO] Finished at: 2021-02-08T13:21:10+08:00
>  [INFO] 
> 
>  [ERROR] Failed to execute goal on project hudi-hadoop-mr: Could not resolve 
> dependencies for project org.apache.hudi:hudi-hadoop-mr:jar:0.7.0: Could not 
> transfer artifact org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde 
> from/to libs-milestone ([https://repo.spring.io/libs-milestone/):] 
> Authentication failed for 
> [https://repo.spring.io/libs-milestone/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar]
>  401 Unauthorized -> [Help 1]
>  [ERROR] 
>  [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
>  [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>  [ERROR]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1595) Support AWS Glue Schema Registry provider with Delta Streamer

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1595:
--
Labels: sev:normal user-support-issues  (was: )

> Support AWS Glue Schema Registry provider with Delta Streamer
> -
>
> Key: HUDI-1595
> URL: https://issues.apache.org/jira/browse/HUDI-1595
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Udit Mehrotra
>Priority: Major
>  Labels: sev:normal, user-support-issues
>
> This feature has been asked in [https://github.com/apache/hudi/issues/2527] 
> and in Hudi slack channel as well.
> This will be a very useful addition to Hudi, that can help to easily consume 
> records from Amazon MSK, Kafka, Kinesis etc using the *AWS Glue Schema 
> Registry*.
> Reference: 
> [https://docs.aws.amazon.com/glue/latest/dg/schema-registry-integrations.html]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1600) Fix document to reflect Hudi supports MOR for spark datasource for incremental queries

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310385#comment-17310385
 ] 

sivabalan narayanan commented on HUDI-1600:
---

CC [~garyli]

> Fix document to reflect Hudi supports MOR for spark datasource for 
> incremental queries
> --
>
> Key: HUDI-1600
> URL: https://issues.apache.org/jira/browse/HUDI-1600
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Gary Li
>Priority: Minor
> Fix For: 0.9.0
>
>
> The document should be updated to reflect the same 
> [https://hudi.apache.org/docs/querying_data.html#merge-on-read-tables]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-1600) Fix document to reflect Hudi supports MOR for spark datasource for incremental queries

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1600:
-

Assignee: Gary Li  (was: sivabalan narayanan)

> Fix document to reflect Hudi supports MOR for spark datasource for 
> incremental queries
> --
>
> Key: HUDI-1600
> URL: https://issues.apache.org/jira/browse/HUDI-1600
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Gary Li
>Priority: Minor
> Fix For: 0.9.0
>
>
> The document should be updated to reflect the same 
> [https://hudi.apache.org/docs/querying_data.html#merge-on-read-tables]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1604) Fix archival max log size and potentially a bug in archival

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1604:
--
Labels: sev:triage user-support-issues  (was: user-support-issues)

> Fix archival max log size and potentially a bug in archival
> ---
>
> Key: HUDI-1604
> URL: https://issues.apache.org/jira/browse/HUDI-1604
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Cleaner
>Affects Versions: 0.7.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> Gist of the issue from Udit
>  
> I took a deeper look at this. For you this seems to be happening in the 
> archival code path:
>  
> {{ at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.writeToFile(HoodieTimelineArchiveLog.java:309)
>  at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:282)
>  at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:133)
>  at 
> org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:381)}}
> In {{HoodieTimelineArchiveLog}} where it needs to write log files with commit 
> record, similar to how log files are written for MOR tables. However, in this 
> code I notice a couple of issues:
>  * The default maximum log block size of 256 MB defined 
> [here|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L51],
>  is not utilized for this class and is only used for the MOR log blocks 
> writing case. As a result, there is no real control over the block size that 
> it can end up writing which can potentially overflow 
> {{ByteArrayOutputStream}} whose maximum size is {{Integer.MAX_VALE - 8}}. 
> That is what seems to be happening in this scenario here because of an 
> integer overflow following that code path inside {{ByteArrayOutputStream}}. 
> So we need to use the maximum block size concept here as well.
>  * In addition I see a bug in code 
> [here|https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java#L302]
>  where even after flushing out the records into a file after a batch size of 
> 10 (default) it is not clearing the list and just goes on accumulating the 
> records. This seems logically wrong as well (duplication), apart from the 
> fact that it would keep increasing the log file blocks size it is writing.
> Reference: https://github.com/apache/hudi/issues/2408#issuecomment-758320870



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1607) Decimal handling bug in SparkAvroPostProcessor

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1607:
--
Labels: sev:critical user-support-issues  (was: )

> Decimal handling bug in SparkAvroPostProcessor 
> ---
>
> Key: HUDI-1607
> URL: https://issues.apache.org/jira/browse/HUDI-1607
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jingwei Zhang
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> This issue related to 
> [#[Hudi-1343]|[https://github.com/apache/hudi/pull/2192].]
> I think the purpose of Hudi-1343 was to bridge the difference between avro 
> 1.8.2(used by hudi) and avro 1.9.2(used by upstream system) thru internal 
> Struct type. In particular, the incompatible form to express nullable type 
> between those two versions. 
> It was all good until I hit the type Decimal. Since it can either be FIXED or 
> BYTES, if an avro schema contains decimal type with BYTES as its literal 
> type, after this two way conversion its literal type become FIXED instead. 
> This will cause an exception to be thrown in AvroConversionHelper as the data 
> underneath is HeapByteBuffer rather than GenericFixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1627) DeltaStreamer does not work with SQLtransformation

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1627:
--
Labels: sev:triage user-support-issues  (was: )

> DeltaStreamer does not work with SQLtransformation 
> ---
>
> Key: HUDI-1627
> URL: https://issues.apache.org/jira/browse/HUDI-1627
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> Hudi deltastreamer ,  using SQLtransformation :
> 21/02/04 17:19:50 INFO SqlQueryBasedTransformer: SQL Query for transformation 
> : (select order_date from `default`.ordertest)
> org.apache.spark.sql.AnalysisException: Table or view not found: 
> `default`.`ordertest`; line 1 pos 23;
> ’Project [’order_date]
> +- ’UnresolvedRelation `default`.`ordertest` 
>  
> This is because:
> DeltaStreamer does not enable HiveSupport when initiating the SparkSession.
> this.sparkSession = 
> SparkSession.builder().config(jssc.getConf()).getOrCreate();
> line 526 in 
> [https://github.com/a0x8o/hudi/blob/b8d0747959bc6f101b5b90b8e3ad323aafa2aa6e/hudi-u[…]rg/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java|https://github.com/a0x8o/hudi/blob/b8d0747959bc6f101b5b90b8e3ad323aafa2aa6e/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1633) Make callback return HoodieWriteStat

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1633:
--
Labels: pull-request-available sev:normal  (was: pull-request-available)

> Make callback return HoodieWriteStat
> 
>
> Key: HUDI-1633
> URL: https://issues.apache.org/jira/browse/HUDI-1633
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Minor
>  Labels: pull-request-available, sev:normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1640) Implement Spark Datasource option to read hudi configs from properties file

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1640:
--
Labels: sev:high user-support-issues  (was: )

> Implement Spark Datasource option to read hudi configs from properties file
> ---
>
> Key: HUDI-1640
> URL: https://issues.apache.org/jira/browse/HUDI-1640
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> Provide config option like "hoodie.datasource.props.file" to load all the 
> options from a file.
>  
> GH: https://github.com/apache/hudi/issues/2605



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1642) Add Links to Uber engineering blog and meet up slides

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310361#comment-17310361
 ] 

sivabalan narayanan commented on HUDI-1642:
---

please close this if already done.

> Add Links to Uber engineering blog and meet up slides
> -
>
> Key: HUDI-1642
> URL: https://issues.apache.org/jira/browse/HUDI-1642
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1652) DiskBasedMap:As time goes by, the number of /temp/***** file handles held by the executor process is increasing

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1652:
--
Labels: sev:critical user-support-issues  (was: )

> DiskBasedMap:As time goes by, the number of /temp/* file handles held by 
> the executor process is increasing
> ---
>
> Key: HUDI-1652
> URL: https://issues.apache.org/jira/browse/HUDI-1652
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.6.0
>Reporter: wangmeng
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> We encountered a problem in the hudi production environment, which is very 
> similar to the HUDI-945 problem.
>  *Software environment:* spark 2.4.5, hudi 0.6
>  *Scenario:* consume Kafka data and write hudi, using spark streaming 
> (non-StructedStreaming).
>  *Problem:* As time goes by, the number of /temp/* file handles held by 
> the executor process is increasing.
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
>  /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
>  /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
>  *Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class, 
> and DiskBasedMap is used to flush overflowed data to the disk. But the file 
> stream can only be closed and deleted by the hook when the jvm exits. When 
> the clear method is executed in the program, the stream is not closed and the 
> file is not deleted. As a result, over time, more and more file handles are 
> still held, leading to errors. This error is similar to Hudi-945.
>  
> *软件环境：*spark 2.4.5、hudi 0.6 
> *场景：*消费kafka数据写入hudi，采用spark streaming(非StructedStreaming)。
>  *问题：executor 进程随着时间的推移，所持有的/temp/*文件句柄数越来越多。
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
>  /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
>  /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
> *原因分析：*HoodieMergeHandle类中采用ExternalSpillableMap，使用DiskBasedMap将溢出的数据刷新到磁盘上。但是文件流只有在jvm退出的时候通过钩子关闭且删除文件。程序中执行clear方法时，并不关闭流及删除文件。从而导致随着时间推移，越来越多的文件句柄还持有，导致报错。此错误和Hudi-945挺相似的。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1649) Bugs with Metadata Table in 0.7 release

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1649:
--
Labels: sev:high user-support-issues  (was: )

> Bugs with Metadata Table in 0.7 release
> ---
>
> Key: HUDI-1649
> URL: https://issues.apache.org/jira/browse/HUDI-1649
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Critical
>  Labels: sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> We have discovered the following issues while using the Metadata Table code 
> in production:
>  
> *Issue 1: Automatic rollbacks during commit get a timestamp which is out of 
> order*
> Suppose commit C1 failed. The next commit will try to rollback C1 
> automatically. This will create the following two instances C2.commit and 
> R3.rollback. Hence, the rollback will have a timestamp > the commit which 
> occurs after it. 
> This is because of how the code is implemented in 
> AbstractHoodieWriteClient.startCommitWithTime() where the timestamp of the 
> next commit is chosen before the timestamp of the rollback instant.
>  
> *Issue 2: Syncing of rollbacks is not working*
> Due to the above HUDI issue, syncing of rollbacks in Metadata Table does not 
> work correctly. 
> Assume the timeline as follows: 
> Dataset Timeline: C1  C2. C3
> Metadata Timeline: DC1 DC2.  (dc=delta-commit)
>  
> Suppose the next commit C4 fails. When C5 is attempted, C4 will be 
> automatically tolled back. Due to the issue #1, the timelines will become as 
> follows:
> Dataset Timeline: C1  C2. C3.  C5  R6 
> Metadata Timeline: DC1 DC2 
> Now if the Metadata Table is synced (AbstractHoodieWriteClient.postCommit), 
> the code will end up processing C5 first and then R6 which will mean that the 
> file rolled back in R6 will be committed to the metadata table as deleted 
> files. There is logic within 
> HoodieTableMetadataUtils.processRollbackMetadata() to ignore R6 in this 
> scenario but it does not work because of the issue #1.
>   
> *Issue #3: Rollback instants are deleted inline*
> Current rollback code deleted older instants inline. The delete logic keeps 
> oldest ten instants (hardcoded) and removes all more-recent rollback 
> instants. Furthermore, the deletion ONLY deletes the rollback.complete and 
> does not remove the corresponding rollback.inflight files. 
> Hence, will many rollbacks the following timeline is possible
> Timeline: C1. C2 C3 C4. R5.inflight C5 C6 C7 ...
> (there are 9 previous rollback instants to R5).
>  
> *Issue #4: Metadata Table reader does not show correct view of the metadata*
> Assume the timeline is as in Issue #3 with a leftover rollback.inflight 
> instant. Also assume that the metadata table is synced only till C4. The 
> MetadataTableWriter will not sync any more instants to the Metadata Table 
> since an incomplete instant is present next.
> The same sync logic is also used by the MetadataReader to perform the 
> in-memory merge of timeline. Hence, the reader will also not consider C5, C6 
> and C7 thereby providing an incorrect and older view of the FileSlices and 
> FileGroups. 
>  
> Any future ingestion into this table MAY insert data into older versions of 
> the FileSlices which will end up being a data loss when queried.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1656) Loading history data to new hudi table taking longer time

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1656:
--
Labels: sev:triage user-support-issues  (was: )

> Loading history data to new hudi table taking longer time
> -
>
> Key: HUDI-1656
> URL: https://issues.apache.org/jira/browse/HUDI-1656
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: newbie
>Reporter: Fredrick jose antony cruz
>Priority: Major
>  Labels: sev:triage, user-support-issues
> Fix For: 0.7.0
>
>
> spark-submit --jars 
> /u/users/svcordrdats/order_hudi_poc/hudi-support-jars/org.apache.avro_avro-1.8.2.jar,/u/users/svcordrdats/order_hudi_poc/hudi-support-jars/spark-avro_2.11-2.4.4.jar,/u/users/svcordrdats/order_hudi_poc/hudi-support-jars/hudi-spark-bundle_2.11-0.7.0.jar
>  --master yarn --deploy-mode cluster --num-executors 50 --executor-cores 4 
> --executor-memory 32g --driver-memory=24g --queue=default --conf 
> spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
> spark.driver.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar
>  --conf 
> spark.executor.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar:hudi-spark-bundle_2.11-0.7.0.jar
>  --conf spark.memory.fraction=0.2 --driver-java-options "-XX:NewSize=1g 
> -XX:SurvivorRatio=2 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 
> -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime 
> -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof" --files 
> /usr/hdp/current/spark2-client/conf/hive-site.xml --class 
> com.walmart.gis.order.workflows.WorkflowController 
> lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar workflow="stgStsWorkflow" 
> runmode="global"
> we are running on GCS cluster with 3 TB, 29 node cluster 870 v. core.
> pom
> 1.8
> 2.11.12
> 2.3.0
> 1.8.2
> 2.4.4
> 0.7.0
> 1.4.0
> UTF-8
> 1.8
> 1.8
> stsDailyDf.write.format("org.apache.hudi")
>   .option("hoodie.cleaner.commits.retained", 2)
>   .option("hoodie.copyonwrite.record.size.estimate", 70)
>   .option("hoodie.parquet.small.file.limit", 1)
>   .option("hoodie.parquet.max.file.size", 12800)
>   .option("hoodie.index.bloom.num_entries", 180)
>   .option("hoodie.bloom.index.filter.type", "DYNAMIC_V0")
>   .option("hoodie.bloom.index.filter.dynamic.max.entries", 250)
>   .option("hoodie.datasource.write.operation", "upsert")
>   .option("hoodie.datasource.write.storage.type", "COPY_ON_WRITE")
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, 
> "sales_order_sts_line_key")
>   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
> "REL_STS_DT")
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "src_upd_ts")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName.toString)
>   .option("hoodie.bloom.index.bucketized.checking", "false")
>   .mode(SaveMode.Append)
>   .save(tablePath.toString)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1657) build failed on AArch64, Fedora 33

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1657:
--
Labels: sev:triage user-support-issues  (was: )

> build failed on AArch64, Fedora 33 
> ---
>
> Key: HUDI-1657
> URL: https://issues.apache.org/jira/browse/HUDI-1657
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lutz Weischer
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> [jw@cn05 hudi]$ mvn package -DskipTests
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-java-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark/pom.xml, line 26, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark2_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark2_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark2/pom.xml, line 24, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-utilities/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-spark-bundle/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-utilities-bundle/pom.xml, line 26, column 
> 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-flink_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-flink/pom.xml, line 28, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-flink-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-flink-bundle/pom.xml, line 28, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they 
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support 
> building such malformed projects.
> [WARNING]
> [INFO] 
> 
> [INFO] Reactor Build Order:
> [INFO]
> [INFO] Hudi   
> [pom]
> [INFO] hudi-common
> [jar]
> [INFO] hudi-timeline-service  
> [jar]
> [INFO] hudi-client
> [pom]
> [INFO] hudi-client-common 
> [jar]
> [INFO] hudi-hadoop-mr

[jira] [Updated] (HUDI-1674) add partition level delete DOC or example

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1674:
--
Labels: docs user-support-issues  (was: )

> add partition level delete DOC or example
> -
>
> Key: HUDI-1674
> URL: https://issues.apache.org/jira/browse/HUDI-1674
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Priority: Minor
>  Labels: docs, user-support-issues
> Attachments: image-2021-03-08-09-57-05-768.png
>
>
> !image-2021-03-08-09-57-05-768.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1668) GlobalSortPartitioner is getting called twice during bulk_insert.

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1668:
--
Labels: sev:high user-support-issues  (was: )

> GlobalSortPartitioner is getting called twice during bulk_insert.
> -
>
> Key: HUDI-1668
> URL: https://issues.apache.org/jira/browse/HUDI-1668
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sugamber
>Priority: Minor
>  Labels: sev:high, user-support-issues
> Attachments: 1st.png, 2nd.png
>
>
> Hi Team,
> I'm using bulk insert option to load close to 2 TB data. The process is 
> taking near by 2 hours to get completed. While looking at the job log, it is 
> identified that [sortBy at 
> GlobalSortPartitioner.java:41|https://gdlcuspc1a3-6.us-central1.us.walmart.net:18481/history/application_1614298633248_1444/1/jobs/job?id=1]
>  is running twice. 
> It is getting triggered at 1 stage. *refer this screenshot ->[^1st.png]*.
> Second time it is getting trigged from  *HoodieSparkSqlWriter.scala:433* 
> *[count at 
> HoodieSparkSqlWriter.scala:433|https://gdlcuspc1a3-6.us-central1.us.walmart.net:18481/history/application_1614298633248_1444/1/jobs/job?id=2]*
>    step.
> In both cases, same number of job got triggered and running time is close to 
> each other. *Refer this screenshot* -> [^2nd.png]
> Is there any way to run only one time so that data can be loaded faster or it 
> is expected behaviour.
> *Spark and Hudi configurations*
>  
> {code:java}
> Spark - 2.3.0
> Scala- 2.11.12
> Hudi - 0.7.0
>  
> {code}
>  
> Hudi Configuration
> {code:java}
> "hoodie.cleaner.commits.retained" = 2  
> "hoodie.bulkinsert.shuffle.parallelism"=2000  
> "hoodie.parquet.small.file.limit" = 1  
> "hoodie.parquet.max.file.size" = 12800  
> "hoodie.index.bloom.num_entries" = 180  
> "hoodie.bloom.index.filter.type" = "DYNAMIC_V0"  
> "hoodie.bloom.index.filter.dynamic.max.entries" = 250  
> "hoodie.bloom.index.bucketized.checking" = "false"  
> "hoodie.datasource.write.operation" = "bulk_insert"  
> "hoodie.datasource.write.table.type" = "COPY_ON_WRITE"
> {code}
>  
> Spark Configuration -
> {code:java}
> --num-executors 180 
> --executor-cores 4 
> --executor-memory 16g 
> --driver-memory=24g 
> --conf spark.rdd.compress=true 
> --queue=default 
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
> --conf spark.executor.memoryOverhead=1600 
> --conf spark.driver.memoryOverhead=1200 
> --conf spark.driver.maxResultSize=2g
> --conf spark.kryoserializer.buffer.max=512m 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1675) Externalize all Hudi configurations

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310352#comment-17310352
 ] 

sivabalan narayanan commented on HUDI-1675:
---

[~xiaotaotao]: May I know what's the intent here. If we have this, then for 
future writes, user does not need to set every config option and can just rely 
on existing options? is my understanding right. In general, what are the 
benefits from adding the configs to a file. 

> Externalize all Hudi configurations
> ---
>
> Key: HUDI-1675
> URL: https://issues.apache.org/jira/browse/HUDI-1675
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: tao meng
>Priority: Major
> Fix For: 0.9.0
>
>
> # Externalize all Hudi configurations (separate configuration file)
>  # Save table related properties into hoodie.properties file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1682) Rasa4D Situs Game Togel Slot Online Terbaik Deposit Pulsa Tanpa Potongan

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310351#comment-17310351
 ] 

sivabalan narayanan commented on HUDI-1682:
---

[~RASA4D]: guess you have wrongly tagged this ticket to Hudi. 

> Rasa4D Situs Game Togel Slot Online Terbaik Deposit Pulsa Tanpa Potongan
> 
>
> Key: HUDI-1682
> URL: https://issues.apache.org/jira/browse/HUDI-1682
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Rasa4D Situs Game Togel Slot Online Terbaik Deposit 
> Pulsa Tanpa Potongan
>Priority: Trivial
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: eventslot2.jpg
>
>   Original Estimate: 1,932h
>  Remaining Estimate: 1,932h
>
> RASA4D adalah situs game online terpercaya terbaik dan paling populer yang 
> menyediakan berbagai jenis permainan yang dapat dimainkan hanya menggunkan 
> satu akun saja. Ada 4 jenis permainan yang disediakan yaitu permainan slot 
> online, live casino, togel online dan sport. Keempat jenis permainan tersebut 
> masing-masing disediakan dengan berbagai jenis permainan yang berbeda-beda.
> Semua permainan slot online, togel online, live casino online dan sport yang 
> disediakan situs RASA4D dapat dimainkan lewat hp android dan ios dan semuanya 
> sudah dapat dimainkan hanya menggunakan satu akun saja.
> Kenapa RASA4D bisa menjadi situs game online terpercaya terbaik dan paling 
> populer?.
> Karena RASA4D menyediakan hal-hal menarik yang jarang ditemukan di 
> situs-situs judi online lainnya. Salah satu hal yang menjadikan RASA4D situs 
> game online terbaik yaitu karena salah satu promonya yang sangat jarang 
> ditemukan yaitu promo deposit pulsa tanpa potongan.
> RASA4D menyediakan deposit pulsa tanpa potongan untuk semua pemain-pemain 
> slot, togel, casino dan sport yang bermain di situsnya. Semua pemain yang 
> melakukan deposit via pulsa tidak akan terkena potongan dan pastinya saldo 
> yang akan masuk di akun mereka sesuai dengan nominal transferan pulsa mereka 
> ke nomor tujuan deposit RASA4D.
> Biasanya situs RASA4D terkenal akan situs tanpa potongan yang sangat digemari 
> semua pemain-pemain judi online. Dan semua pemain yang bermain di permainan 
> apapun tidak akan terkena potongan jika melakukan deposit via pulsa di RASA4D.
> RASA4D juga menyediakan promo-promo lainnya yang bisa menguntungkan para 
> pemainnya dalam bermain permainan apapun. Promo lain yang disediakan yaitu :
> - MENERIMA DEPOSIT VIA PULSA TANPA POTONGAN
> - MENERIMA DEPOSIT VIA E-WALLET
> - BONUS MEMBER BARU 10%
> - BONUS DEPOSIT HARIAN 5%
> - REFFERAL UP TO 1%
> - CASHBACK UP TO 10%
> - HADIAH DAN DISKON TERTINGGI
> - HADIAH HIBURAN TERBALIK LURUS
> - HADIAH PRIZE 2&3
> - MIN BET TERKECIL
> RASA4D tidak hanya menerima deposit via pulsa saja, situs togel slot online 
> tersebut juga menyediakan deposit via e-wallet. Jadi bagi kalian yang 
> memiliki e-wallet maka kalain dapat bermain di situs game online terbaik 
> RASA4D. Ada 4 provider E-wallet yang bekerjasama dengan RASA4D yaitu :
> - OVO
> - DANA
> - GOPAY
> - LINK AJA
> Kalian bisa memilih salah satu dari keempat provider e-wallet tersebut mana 
> yang ingin kalian gunakan dalam melakukan transaksi deposit maupun withdraw.
> Dan RASA4D juga menyediakan bonus untuk member baru dan bonus deposit harian. 
> Bonus untuk member baru sangatlah menguntungkan karena bisa mnedapatkan bonus 
> hingga 10%. Bonus ini sangatlah menguntungkan untuk para pemain-pemain slot 
> online, togel online, permainan live casino dan sport.
> Bagi kalian yang bermain permainan slot online, togel online, permainan live 
> casino maupun sport masih bisa berkesempatan untuk mendapatkan bonus yang 
> berlimpah yaitu bonus sebesar 5%.
> RASA4D juga menyediakan bonus refferal bagi para pemain-pemain yang bermain 
> di situs RASA4D dengan bermain permainan slot online dan casino. Bagi kalian 
> pemain slot online dan casino maka kalian bisa mendapatkan keuntungan yang 
> sangat besar apabila bergabung di situs terbaik RASA4D.
> Ada juga bonus mingguan lainnya yang bisa didapatkan di situs terbaik RASA4d. 
> Bonus lainnya yaitu cashback hingga sebesar 10%. Pastinya semua bisa 
> mendapatkan bonus cashback ini apabila bermain di situs terbaik RASA4D.
> Dan bagi kalian para pemain togel online juga berkesempatan untuk mendapatkan 
> hadiah hiburan yang sangat besar. Hadiah hiburan besar yang bisa kalian 
> dapatkan di RASA4D yaitu hadiah hiburan terbalik lurus dan hadiah prize 2&3.
> - Pelayanan Terbaik Situs Slot Online Togel RASA4D.
> Pelayanan yang diberikan oleh RASA4D sangatlah baik dan sangat ramah. 
> Terlebih lagi apabila kalian dilayani oleh cs(customer service) RASA4D yang 
> bernama riri. Pastinya kalian akan dilayani dengan

[jira] [Updated] (HUDI-1690) Fix StackOverflowError while running clustering with large number of partitions

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1690:
--
Labels: sev:high user-support-issues  (was: )

> Fix StackOverflowError while running clustering with large number of 
> partitions
> ---
>
> Key: HUDI-1690
> URL: https://issues.apache.org/jira/browse/HUDI-1690
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: Rong Ma
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> We are testing clustering on a hudi table with about 3000 partitions. The 
> spark driver throws StackOverflowError before all the partitions sorted:
> 21/03/11 19:51:20 ERROR [main] UtilHelpers: Cluster failed
>  java.lang.StackOverflowError
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:261)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343)
>  at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
>  at sun.reflect.GeneratedMethodAccessor201.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:477)
>  at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> ...
>  
> I see similar issue here:
> [https://stackoverflow.com/questions/30522564/spark-when-union-a-lot-of-rdd-throws-stack-overflow-error]
> Setting the driver's stack size to 100M still has this error. So this is 
> probably because the rdd.union has been called too many times and the result 
> of rdd lineage is too large. I think we should use JavaSparkContext.union 
> instead RDD.union here 
> [https://github.com/apache/hudi/blob/e93c6a569310ce55c5a0fc0655328e7fd32a9da2/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java#L96]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1689) Support Multipath query for HoodieFileIndex

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1689:
--
Labels: sev:high  (was: )

> Support Multipath query for HoodieFileIndex
> ---
>
> Key: HUDI-1689
> URL: https://issues.apache.org/jira/browse/HUDI-1689
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: sev:high
>
> Support Multipath query for the HoodieFileIndex to benefit from the partition 
> prune.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1690) Fix StackOverflowError while running clustering with large number of partitions

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310348#comment-17310348
 ] 

sivabalan narayanan commented on HUDI-1690:
---

[~satishkotha]: would you mind taking a look.

> Fix StackOverflowError while running clustering with large number of 
> partitions
> ---
>
> Key: HUDI-1690
> URL: https://issues.apache.org/jira/browse/HUDI-1690
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: Rong Ma
>Priority: Major
> Fix For: 0.9.0
>
>
> We are testing clustering on a hudi table with about 3000 partitions. The 
> spark driver throws StackOverflowError before all the partitions sorted:
> 21/03/11 19:51:20 ERROR [main] UtilHelpers: Cluster failed
>  java.lang.StackOverflowError
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> org.apache.spark.RangePartitioner.$anonfun$writeObject$1(Partitioner.scala:261)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343)
>  at org.apache.spark.RangePartitioner.writeObject(Partitioner.scala:254)
>  at sun.reflect.GeneratedMethodAccessor201.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:477)
>  at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> ...
>  
> I see similar issue here:
> [https://stackoverflow.com/questions/30522564/spark-when-union-a-lot-of-rdd-throws-stack-overflow-error]
> Setting the driver's stack size to 100M still has this error. So this is 
> probably because the rdd.union has been called too many times and the result 
> of rdd lineage is too large. I think we should use JavaSparkContext.union 
> instead RDD.union here 
> [https://github.com/apache/hudi/blob/e93c6a569310ce55c5a0fc0655328e7fd32a9da2/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java#L96]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1697) A parallel scan needed for FS.

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1697:
--
Labels: sev:high user-support-issues  (was: )

> A parallel scan needed for FS.
> --
>
> Key: HUDI-1697
> URL: https://issues.apache.org/jira/browse/HUDI-1697
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Volodymyr Burenin
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> I am running Hudi with GCS as a backend. It takes way too long to update the 
> file system view for several hundred partitions. I think it can be done in 
> parallel, so the process could be speed up significantly.
> Here is a small cut from the logs where I notice the slow processing. The 
> original one is much longer and takes several minutes to complete.
> ```
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/05/12) =66, Time taken =45
> 21/03/16 20:02:56 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/05/12, #FileGroups=22
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=66, NumFileGroups=22, FileGroupsCreationTime=3, StoreTimeTaken=1
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/05/12) =76
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Took 1 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:56 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2020/03/25)
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/03/25) =36, Time taken =36
> 21/03/16 20:02:56 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/03/25, #FileGroups=12
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=36, NumFileGroups=12, FileGroupsCreationTime=1, StoreTimeTaken=1
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/03/25) =62
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:56 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2020/10/15)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/10/15) =201, Time taken =100
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/10/15, #FileGroups=128
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=201, NumFileGroups=128, FileGroupsCreationTime=6, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/10/15) =148
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2021/01/11)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2021/01/11) =311, Time taken =71
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2021/01/11, #FileGroups=302
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=311, NumFileGroups=302, FileGroupsCreationTime=9, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2021/01/11) =110
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2019/07/08)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2019/07/08) =2, Time taken =40
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2019/07/08, #FileGroups=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2019/07/08) =63
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1696) artifactSet of maven-shade-plugin has not commons-codec

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1696:
--
Labels: pull-request-available sev:high user-support-issues  (was: 
pull-request-available)

> artifactSet of maven-shade-plugin has not commons-codec
> ---
>
> Key: HUDI-1696
> URL: https://issues.apache.org/jira/browse/HUDI-1696
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.7.0
> Environment: spark2.4.4
> scala2.11.8
> centos7
>Reporter: peng-xin
>Priority: Critical
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.7.0
>
> Attachments: image-2021-03-16-18-20-16-477.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when i use hbase index,it cause some error like below
> !image-2021-03-16-18-20-16-477.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on a change in pull request #2687: [HUDI-1700] Hudi Meetup with Uber video link

2021-03-28 Thread GitBox



nsivabalan commented on a change in pull request #2687:
URL: https://github.com/apache/hudi/pull/2687#discussion_r602990036



##
File path: docs/_docs/0.7.0/1_4_powered_by.md
##
@@ -146,6 +146,8 @@ Meanwhile, we build a set of data access standards based on 
Hudi, which provides
 
 21. ["Meetup talk by Nishith 
Agarwal"](https://www.meetup.com/UberEvents/events/274924537/) - Uber Data 
Platforms Meetup, Dec 2020
 
+22. ["Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & 
Uber"](https://youtu.be/cAvbBfMbaiA) - By Udit Mehrotra, Wenning Ding (AWS), 
Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), 
Feb 2021

Review comment:
   @vburenin : Did you try this https://youtu.be/iXBInMLbjo0. it works for 
me. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1702) TestHoodieMergeOnReadTable.init fails randomly on Travis CI

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1702:
--
Labels: sev:triage  (was: )

> TestHoodieMergeOnReadTable.init fails randomly on Travis CI
> ---
>
> Key: HUDI-1702
> URL: https://issues.apache.org/jira/browse/HUDI-1702
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Danny Chen
>Priority: Major
>  Labels: sev:triage
>
> The test case fails randomly from time to time, which is annoying, take this 
> for a example:
> https://travis-ci.com/github/apache/hudi/jobs/491671521



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1708) Add HiveMetastore URL based configs to allow for using locks with custom metastore URI's

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310346#comment-17310346
 ] 

sivabalan narayanan commented on HUDI-1708:
---

Is this still valid ? 

> Add HiveMetastore URL based configs to allow for using locks with custom 
> metastore URI's
> 
>
> Key: HUDI-1708
> URL: https://issues.apache.org/jira/browse/HUDI-1708
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1711:
--
Labels: sev:triage user-support-issues  (was: sev:critical 
user-support-issues)

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2, 
> struct,
>  false])) null else createexternalrow(if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].version.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].connector.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].name.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].ts_ms, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].snapshot.toString, if (input[2, 
> struct,
>  false].isNullAt)

[jira] [Commented] (HUDI-1713) Fix config name for concurrency

2021-03-28 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17310345#comment-17310345
 ] 

sivabalan narayanan commented on HUDI-1713:
---

[~nishith29]: can we close this? 

> Fix config name for concurrency
> ---
>
> Key: HUDI-1713
> URL: https://issues.apache.org/jira/browse/HUDI-1713
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Writer Core
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1711) Avro Schema Exception with Spark 3.0 in 0.7

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1711:
--
Labels: sev:critical user-support-issues  (was: )

> Avro Schema Exception with Spark 3.0 in 0.7
> ---
>
> Key: HUDI-1711
> URL: https://issues.apache.org/jira/browse/HUDI-1711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> GH: [https://github.com/apache/hudi/issues/2705]
>  
>  
> {{21/03/22 10:10:35 WARN util.package: Truncated the string representation of 
> a plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 21/03/22 10:10:35 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NegativeArraySizeException: -1255727808
> createexternalrow(if (isnull(input[0, 
> struct,
>  true])) null else createexternalrow(if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].id, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].name.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].type.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].url.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].password.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].create_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_time.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].update_user.toString, if (input[0, 
> struct,
>  true].isNullAt) null else input[0, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[1, 
> struct,
>  true])) null else createexternalrow(if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].id, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].name.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].type.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].url.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].password.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].create_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_time.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].update_user.toString, if (input[1, 
> struct,
>  true].isNullAt) null else input[1, 
> struct,
>  true].del_flag, StructField(id,IntegerType,false), 
> StructField(name,StringType,true), StructField(type,StringType,true), 
> StructField(url,StringType,true), StructField(user,StringType,true), 
> StructField(password,StringType,true), 
> StructField(create_time,StringType,true), 
> StructField(create_user,StringType,true), 
> StructField(update_time,StringType,true), 
> StructField(update_user,StringType,true), 
> StructField(del_flag,IntegerType,true)), if (isnull(input[2, 
> struct,
>  false])) null else createexternalrow(if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].version.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].connector.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].name.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].ts_ms, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,
>  false].snapshot.toString, if (input[2, 
> struct,
>  false].isNullAt) null else input[2, 
> struct,

[jira] [Updated] (HUDI-1717) Metadata Table reader does not show correct view of the metadata

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1717:
--
Labels: sev:critical user-support-issues  (was: )

> Metadata Table reader does not show correct view of the metadata
> 
>
> Key: HUDI-1717
> URL: https://issues.apache.org/jira/browse/HUDI-1717
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Blocker
>  Labels: sev:critical, user-support-issues
>
> Dataset timeline: C1 C2 C3 Compaction.inflight C4 C5
> Metadata timeline: DC1 DC2 DC3. (DC=deltaCommit)
> Assume the dataset timeline has some completed commits (C1, C2 ... C5) and an 
> async compaction operation in progress. Also assume that the metadata table 
> is synced only till C3.
> The MetadataTableWriter will not sync any more instants to the Metadata Table 
> since an incomplete instant is present next (Compaction.inflight).
> The same sync logic is also used by the MetadataReader to perform the 
> in-memory merge of timeline. Hence, the reader will also not consider C4 and 
> C5  thereby providing an incorrect and older view of the FileSlices and 
> FileGroups. 
> Any future ingestion into this table MAY insert data into older versions of 
> the FileSlices which will end up being a data loss when queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1718) when query incr view of mor table which has Multi level partitions, the query failed

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1718:
--
Labels: pull-request-available sev:critical user-support-issues  (was: 
pull-request-available)

> when query incr view of  mor table which has Multi level partitions, the 
> query failed
> -
>
> Key: HUDI-1718
> URL: https://issues.apache.org/jira/browse/HUDI-1718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.7.0, 0.8.0
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> HoodieCombineHiveInputFormat use "," to join mutil partitions, however hive 
> use "/" to join muit1 partitions. there exists some gap, so modify 
> HoodieCombineHiveInputFormat's logical
>  test env
> spark2.4.5, hadoop 3.1.1, hive 3.1.1
>  
> step1:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(6))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // bulk_insert df,   partition by p,p1,p2
>   merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step2:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // upsert table hive8b
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> step3:
> start hive beeline:
> set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> set hoodie.hive_8b.consume.mode=INCREMENTAL;
> set hoodie.hive_8b.consume.max.commits=3;
> set hoodie.hive_8b.consume.start.timestamp=20210325141300;  // this timestamp 
> is smaller the earlist commit, so  we can query whole commits
> select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where 
> `_hoodie_commit_time`>'20210325141300'
>  
> 2021-03-25 14:14:36,036 | INFO  | AsyncDispatcher event handler | Diagnostics 
> report from attempt_1615883368881_0028_m_00_3: Error: 
> org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: 
> p,p1,p2 2021-03-25 14:14:36,036 | INFO  | AsyncDispatcher event handler | 
> Diagnostics report from attempt_1615883368881_0028_m_00_3: Error: 
> org.apache.hudi.org.apache.avro.SchemaParseException: Illegal character in: 
> p,p1,p2 at 
> org.apache.hudi.org.apache.avro.Schema.validateName(Schema.java:1151) at 
> org.apache.hudi.org.apache.avro.Schema.access$200(Schema.java:81) at 
> org.apache.hudi.org.apache.avro.Schema$Field.(Schema.java:403) at 
> org.apache.hudi.org.apache.avro.Schema$Field.(Schema.java:396) at 
> org.apache.hudi.avro.HoodieAvroUtils.appendNullSchemaFields(HoodieAvroUtils.java:268)
>  at 
> org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.addPartitionFields(HoodieRealtimeRecordReaderUtils.java:286)
>  at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:98)
>  at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:53)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:123)
>  at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:975)
>  at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:556)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
> org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177)
>  



--
This

[jira] [Updated] (HUDI-1720) when query incr view of mor table which has many delete records use sparksql/hive-beeline, StackOverflowError

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1720:
--
Labels: pull-request-available sev:critical user-support-issues  (was: 
pull-request-available)

> when query incr view of  mor table which has many delete records use 
> sparksql/hive-beeline,  StackOverflowError
> ---
>
> Key: HUDI-1720
> URL: https://issues.apache.org/jira/browse/HUDI-1720
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Spark Integration
>Affects Versions: 0.7.0, 0.8.0
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
>  now RealtimeCompactedRecordReader.next   deal with delete record by 
> recursion, see:
> [https://github.com/apache/hudi/blob/6e803e08b1328b32a5c3a6acd8168fdabc8a1e50/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java#L106]
> however when the log file contains many delete record,  the logcial of 
> RealtimeCompactedRecordReader.next  will lead stackOverflowError
> test step:
> step1:
> val df = spark.range(0, 100).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // bulk_insert 100w row (keyid from 0 to 100)
> merge(df, 4, "default", "hive_9b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
> step2:
> val df = spark.range(0, 90).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // delete 90w row (keyid from 0 to 90)
> delete(df, 4, "default", "hive_9b")
> step3:
> query on beeline/spark-sql :  select count(col3)  from hive_9b_rt
> 2021-03-25 15:33:29,029 | INFO  | main | RECORDS_OUT_OPERATOR_RS_3:1, 
> RECORDS_OUT_INTERMEDIATE:1,  | Operator.java:10382021-03-25 15:33:29,029 | 
> INFO  | main | RECORDS_OUT_OPERATOR_RS_3:1, RECORDS_OUT_INTERMEDIATE:1,  | 
> Operator.java:10382021-03-25 15:33:29,029 | ERROR | main | Error running 
> child : java.lang.StackOverflowError at 
> org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:83) 
> at 
> org.apache.parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:39)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase$2$6.read(ColumnReaderBase.java:344)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.readValue(ColumnReaderBase.java:503)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:30)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:409)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:159)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:41)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:84)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:106)
>  at 
>

[jira] [Updated] (HUDI-1719) hive on spark/mr,Incremental query of the mor table, the partition field is incorrect

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1719:
--
Labels: pull-request-available sev:critical user-support-issues  (was: 
pull-request-available)

> hive on spark/mr,Incremental query of the mor table, the partition field is 
> incorrect
> -
>
> Key: HUDI-1719
> URL: https://issues.apache.org/jira/browse/HUDI-1719
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.7.0, 0.8.0
> Environment: spark2.4.5, hadoop 3.1.1, hive 3.1.1
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> now hudi use HoodieCombineHiveInputFormat to achieve Incremental query of the 
> mor table.
> when we have some small files in different partitions, 
> HoodieCombineHiveInputFormat  will combine those small file readers.   
> HoodieCombineHiveInputFormat  build partition field base on  the first file 
> reader in it, however now HoodieCombineHiveInputFormat  holds other file 
> readers which come from different partitions.
> When switching readers, we should  update ioctx
> test env:
> spark2.4.5, hadoop 3.1.1, hive 3.1.1
> test step:
> step1:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(6))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // create hudi table which has three  level partitions p,p1,p2
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "bulk_insert")
>  
> step2:
> val df = spark.range(0, 1).toDF("keyid")
>  .withColumn("col3", expr("keyid + 1000"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String]("sb1", "rz")))
>  .withColumn("a2", lit(Array[String]("sb1", "rz")))
> // upsert current table
> merge(df, 4, "default", "hive_8b", 
> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "upsert")
> hive beeline:
> set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> set hoodie.hive_8b.consume.mode=INCREMENTAL;
> set hoodie.hive_8b.consume.max.commits=3;
> set hoodie.hive_8b.consume.start.timestamp=20210325141300; // this timestamp 
> is smaller the earlist commit, so  we can query whole commits
> select `p`, `p1`, `p2`,`keyid` from hive_8b_rt where 
> `_hoodie_commit_time`>'20210325141300' and `keyid` < 5;
> query result:
> +-+++-+
> |p|p1|p2|keyid|
> +-+++-+
> |0|0|6|0|
> |0|0|6|1|
> |0|0|6|2|
> |0|0|6|3|
> |0|0|6|4|
> |0|0|6|4|
> |0|0|6|0|
> |0|0|6|3|
> |0|0|6|2|
> |0|0|6|1|
> +-+++-+
> this result is wrong, since the second step we insert new data in table which 
> p2=7, however in the query result we cannot find p2=7, all p2= 6
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1721) run_sync_tool support hive3.1.2 on hadoop3.1.4

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1721:
--
Labels: pull-request-available sev:critical user-support-issues  (was: 
pull-request-available)

> run_sync_tool support hive3.1.2 on hadoop3.1.4 
> ---
>
> Key: HUDI-1721
> URL: https://issues.apache.org/jira/browse/HUDI-1721
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.8.0
>Reporter: 谢波
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
>
> [https://github.com/apache/hudi/issues/2717]
> run_sync_tool support hive3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1722) hive beeline/spark-sql query specified field on mor table occur NPE

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1722:
--
Labels: pull-request-available sev:high user-support-issues  (was: 
pull-request-available)

> hive beeline/spark-sql  query specified field on mor table occur NPE
> 
>
> Key: HUDI-1722
> URL: https://issues.apache.org/jira/browse/HUDI-1722
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Spark Integration
>Affects Versions: 0.7.0
> Environment: spark2.4.5, hadoop3.1.1, hive 3.1.1
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> HUDI-892 introduce this problem。
>  this pr skip adding projection columns if there are no log files in the 
> hoodieRealtimeSplit。 but this pr donnot consider that multiple 
> getRecordReaders share same jobConf。
>  Consider the following questions：
>  we have four getRecordReaders: 
>  reader1(its hoodieRealtimeSplit contains no log files)
>  reader2 (its hoodieRealtimeSplit contains log files)
>  reader3(its hoodieRealtimeSplit contains log files)
>  reader4(its hoodieRealtimeSplit contains no log files)
> now reader1 run first, HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
> jobConf will be set to be true, and no hoodie additional projection columns 
> will be added to jobConf （see 
> HoodieParquetRealtimeInputFormat.addProjectionToJobConf）
> reader2 run later, since HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
> jobConf is set to be true, no hoodie additional projection columns will be 
> added to jobConf. （see 
> HoodieParquetRealtimeInputFormat.addProjectionToJobConf）
>  which lead to the result that _hoodie_record_key would be missing and merge 
> step would throw exceptions
> 2021-03-25 20:23:14,014 | INFO  | AsyncDispatcher event handler | Diagnostics 
> report from attempt_1615883368881_0038_m_00_0: Error: 
> java.lang.NullPointerException2021-03-25 20:23:14,014 | INFO  | 
> AsyncDispatcher event handler | Diagnostics report from 
> attempt_1615883368881_0038_m_00_0: Error: java.lang.NullPointerException 
> at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:101)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:43)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:79)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:36)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:92)
>  at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:43)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:79)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:68)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:77)
>  at 
> org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:42)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:205)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:191) 
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
> org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177)
>  
> Obviously, this is an occasional problem。 if reader2 run first, hoodie 
> additional projection columns will be added to jobConf and in this case the 
> query will be ok
> sparksql can avoid this problem by set  spark.hadoop.cloneConf=true which is 
> not recommended in spark， however hive has no way to avoid this problem。
>  test step：
> step1:
> val df = spark.range(0, 10).toDF("keyid")
>  .withColumn("col3", expr("keyid"))
>  .withColumn("p", lit(0))
>  .withColumn("p1", lit(0))
>  .withColumn("p2", lit(7))
>  .withColumn("a1", lit(Array[String] ("sb1", "rz")))
>  .withColumn("a2", lit(Array[String] ("sb1", "rz")))
> // create

[jira] [Updated] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1723:
--
Labels: sev:critical user-support-issues  (was: )

> DFSPathSelector skips files with the same modify date when read up to source 
> limit
> --
>
> Key: HUDI-1723
> URL: https://issues.apache.org/jira/browse/HUDI-1723
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
> Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
>
>
> org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles 
> filters the input files based on last saved checkpoint, which was the 
> modification date from last read file. However, the last read file's 
> modification date could be duplicated for multiple files and resulted in 
> skipping a few of them when reading up to source limit. An illustration is 
> shown in the attached picture.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1724) run_sync_tool support for hive3.1.2 on hadoop3.1.4

2021-03-28 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1724:
--
Labels: sev:high user-support-issues  (was: )

> run_sync_tool support for hive3.1.2 on hadoop3.1.4
> --
>
> Key: HUDI-1724
> URL: https://issues.apache.org/jira/browse/HUDI-1724
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> Context: https://github.com/apache/hudi/issues/2717



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1729) Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread vinoyang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1729.
--
Resolution: Done

d415d45416707ca4d5b1dbad65dc80e6fccfa378

> Asynchronous Hive sync and commits cleaning for Flink writer
> 
>
> Key: HUDI-1729
> URL: https://issues.apache.org/jira/browse/HUDI-1729
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> - Asynchronous Hive sync
> - Asynchronous commits cleaning with pluggable strategies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch master updated (ecbd389 -> d415d45)

2021-03-28 Thread vinoyang

This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from ecbd389  [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client 
(#2608)
 add d415d45  [HUDI-1729] Asynchronous Hive sync and commits cleaning for 
Flink writer (#2732)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/client/HoodieFlinkWriteClient.java |  91 +---
 hudi-flink/pom.xml |  10 ++
 .../apache/hudi/configuration/FlinkOptions.java| 117 -
 .../java/org/apache/hudi/sink/CleanFunction.java   |  89 
 .../hudi/sink/StreamWriteOperatorCoordinator.java  |  47 -
 .../hudi/sink/compact/CompactionCommitSink.java|   8 +-
 .../sink/partitioner/BucketAssignFunction.java |  17 ++-
 .../apache/hudi/sink/utils/HiveSyncContext.java|  89 
 .../apache/hudi/sink/utils/NonThrownExecutor.java  |  78 ++
 .../hudi/streamer/HoodieFlinkStreamerV2.java   |  12 ++-
 .../org/apache/hudi/table/HoodieTableSink.java |   9 +-
 .../java/org/apache/hudi/util/StreamerUtil.java|  22 
 .../org/apache/hudi/sink/StreamWriteITCase.java|  15 ++-
 .../sink/TestStreamWriteOperatorCoordinator.java   |  32 +-
 .../org/apache/hudi/sink/TestWriteCopyOnWrite.java |   4 +-
 .../apache/hudi/table/HoodieDataSourceITCase.java  |  35 +-
 .../org/apache/hudi/utils/TestConfigurations.java  |  11 +-
 .../test/java/org/apache/hudi/utils/TestData.java  |   2 +-
 .../utils/factory/ContinuousFileSourceFactory.java |  11 +-
 .../hudi/utils/source/ContinuousFileSource.java|   4 +-
 .../{test_source2.data => test_source_2.data}  |   0
 hudi-flink/src/test/resources/test_source_3.data   |   8 ++
 packaging/hudi-flink-bundle/pom.xml|  34 ++
 23 files changed, 704 insertions(+), 41 deletions(-)
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.java
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/sink/utils/NonThrownExecutor.java
 rename hudi-flink/src/test/resources/{test_source2.data => test_source_2.data} 
(100%)
 create mode 100644 hudi-flink/src/test/resources/test_source_3.data

[GitHub] [hudi] yanghua merged pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



yanghua merged pull request #2732:
URL: https://github.com/apache/hudi/pull/2732


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stayrascal closed issue #2712: [SUPPORT] May I ask how to delete data by Flink SQL

2021-03-28 Thread GitBox



stayrascal closed issue #2712:
URL: https://github.com/apache/hudi/issues/2712


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stayrascal commented on issue #2712: [SUPPORT] May I ask how to delete data by Flink SQL

2021-03-28 Thread GitBox



stayrascal commented on issue #2712:
URL: https://github.com/apache/hudi/issues/2712#issuecomment-809011948


   @danny0405 thanks for your reply, got it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

svn commit: r46766 - in /dev/hudi/hudi-0.8.0-rc1: ./ hudi-0.8.0-rc1.src.tgz hudi-0.8.0-rc1.src.tgz.asc hudi-0.8.0-rc1.src.tgz.sha512

2021-03-28 Thread garyli

Author: garyli
Date: Sun Mar 28 15:45:52 2021
New Revision: 46766

Log:
Hudi 0.8.0 RC1

Added:
dev/hudi/hudi-0.8.0-rc1/
dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz   (with props)
dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.asc
dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.sha512

Added: dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.asc
==
--- dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.asc (added)
+++ dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.asc Sun Mar 28 15:45:52 2021
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE4qlxTg+6Ogh73uZV5yhz12XWxAYFAmBgkk0ACgkQ5yhz12XW
+xAZ3rhAAh8wVotQqFOhXMeIF28i9i/LDgMYRTl3pEIwDOJ8rYs5n3fowfBaH5mBA
+P87zJKg2O7mW4IYGQE8kXOJgQTHdMblTWNOkexxOYh8ewmXDurh20IF5deaHMxif
+Qe8u3RLkWQ+B4SpkiulvCvNIgGw6nNtHj+Cmm5jlMYGskrLoNrWbH18LraVTZ0g/
+vlW+m8C9sDniIUqZ+z9s/Led4TFz2JsF6Xl/tROrelBIiAmbawIF6SeCGARelDIx
+gXSDu9B+jHKVLI+tuktWj9QnXIb+FFbI9eZbqZe2pE4hH0hZGF1JSgEOIXatoFCq
+7MilR4uP+N/nZfScnWUH/RIfRULAtUiQjBN61RQVDm51nH2KQEtGoEQJAKuhzzHq
+oza15QdKnTVQsnH+DiF4Te7Okq+WjQhMjhvBCQhKSevNITQdyW0ERyfdIxPSwNPG
+/cc3kWjldRBsEAJuz2ko9fo9Y2RRztsvs4mphK2Zk/10tpLn+1RWucw5Cjldn8hf
+Wo1lgOhtf3yW5/A17+u9NsznOERHmVTxrPLvco0h/dsOEpPUhKK7xYeVmoc5Zq5o
+mmSdBYK+1e3GmGi9DRBvNbSW8bopzNilyJNEmqRWaoQ/fz5D9dj9DQTrB58J5KeB
+9bceLxUau8+9g8t+gFrDdZNGVA87yy8T9z6q/WxMB57bfZvbcx4=
+=24Q/
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.sha512
==
--- dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.8.0-rc1/hudi-0.8.0-rc1.src.tgz.sha512 Sun Mar 28 15:45:52 
2021
@@ -0,0 +1 @@
+0a809ab2bd26b642e03a6fc1338c23e6cd4e09aa7264f2028572bf770465e9bb9cc68b936bb9ca656834e0f074de023bbb46602543fca055387feeafe4cd2b57
  hudi-0.8.0-rc1.src.tgz

[hudi] annotated tag release-0.8.0-rc1 updated (0921272 -> fc8cfc3)

2021-03-28 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a change to annotated tag release-0.8.0-rc1
in repository https://gitbox.apache.org/repos/asf/hudi.git.


*** WARNING: tag release-0.8.0-rc1 was modified! ***

from 0921272  (commit)
  to fc8cfc3  (tag)
 tagging 0921272442b424000f39fea73c5abc30037d67f1 (commit)
 replaces hoodie-0.4.7
  by garyli1019
  on Sun Mar 28 22:32:58 2021 +0800

- Log -
0.8.0
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEE4qlxTg+6Ogh73uZV5yhz12XWxAYFAmBgk5oACgkQ5yhz12XW
xAaqUxAAtBaY1Us2Wymn3z14HVYSQipW5RvLYrZ8hsQo+ID2cPZRdkcl3pxTW0l6
XZaZtVERxkn5n4Cw0m5ETHzuWLX+VYLd0A5jR3n/cGUWB2mBnlVe1+aMDQPm60rJ
kOsE3SagfWiAxNvtkmQnT7nZrDEeIhwKFbRnaODrmL5gOH2JKbu+Tka4JH17FoZw
DCuw3t4jN14h6iwY3m+EasFrf4ZkhSgpK6/pEN7zx2Qz49FNH8iVM/JJMM7A4Geg
iHM/9O0bgBwXa3GuEOekVtXcHBlQSuJvyu0XOELT45Kw9/+HJhQrQbRyAQl7GGd8
nn34xkravNg9rxp2fEmfQsA43+EtY8rtGrEmoDUe28CUyCwa0V4k9JZwWYeK0VqE
lpzKFq5gDos9h78nHdyrS+WrajuPXk0N+49Miv++A0pHmWK7ByBf9n5Z2ax77Dkp
jbKpjc2a55WF15lRS4QI4aCN2awRJevlHv3fOoCSrubSkgHe6yWhI8YFZ+z9CFuh
h3ausKh0Ac5LuQaaxLAHOitlURDwxCLcKc8T71ttOFeb7s4QxkxgC46yQOt+d+5V
Av6rhLNPO4NA486CpdmknVIIg1ODqM72XzrlYYWYxobWOI+wnK40GFbiWwfSl7QO
cVAHfm/CGWr09X/+SH88r4wJbvMr5Nsx5+NBp2DZeXSaJqh0ZFY=
=69re
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:

[GitHub] [hudi] codecov-io edited a comment on pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2732:
URL: https://github.com/apache/hudi/pull/2732#issuecomment-808716532


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=h1) Report
   > Merging 
[#2732](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=desc) (c96b2cc) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/8b774fe3313757a8b94ca408079327c62b4a664c?el=desc)
 (8b774fe) will **increase** coverage by `18.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2732/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2732   +/-   ##
   =
   + Coverage 51.73%   69.73%   +18.00% 
   + Complexity 3606  371 -3235 
   =
 Files   476   54  -422 
 Lines 22611 1989-20622 
 Branches   2410  236 -2174 
   =
   - Hits  11697 1387-10310 
   + Misses 9891  471 -9420 
   + Partials   1023  131  -892 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/hadoop/HoodieHFileInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUhGaWxlSW5wdXRGb3JtYXQuamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/HoodieDatasetBulkInsertHelper.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllRGF0YXNldEJ1bGtJbnNlcnRIZWxwZXIuamF2YQ==)
 | | | |
   | 
[...che/hudi/common/table/timeline/dto/InstantDTO.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9JbnN0YW50RFRPLmphdmE=)
 | | | |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | | | |
   | 
[...g/apache/hudi/sink/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JGYWN0b3J5LmphdmE=)
 | | | |
   | 
[...in/scala/org/apache/hudi/HoodieStreamingSink.scala](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVN0cmVhbWluZ1Npbmsuc2NhbGE=)
 | | | |
   | 
[.../apache/hudi/common/model/HoodieRecordPayload.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZFBheWxvYWQuamF2YQ==)
 | | | |
   | 
[...udi/timeline/service/handlers/TimelineHandler.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvVGltZWxpbmVIYW5kbGVyLmphdmE=)
 | | | |
   | 
[.../common/table/view/RocksDbBasedFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUm9ja3NEYkJhc2VkRmlsZVN5c3RlbVZpZXcuamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/table/format/FilePathUtils.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRmlsZVBhdGhVdGlscy5qYXZh)
 | | | |
   | ... and [405 
more](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch release-0.8.0 updated: [HOTFIX] fix deploy staging jars script

2021-03-28 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch release-0.8.0
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/release-0.8.0 by this push:
 new 0921272  [HOTFIX] fix deploy staging jars script
0921272 is described below

commit 0921272442b424000f39fea73c5abc30037d67f1
Author: garyli1019 
AuthorDate: Sun Mar 28 22:10:00 2021 +0800

[HOTFIX] fix deploy staging jars script
---
 scripts/release/deploy_staging_jars.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/release/deploy_staging_jars.sh 
b/scripts/release/deploy_staging_jars.sh
index b2c5bf3..4bd9158 100755
--- a/scripts/release/deploy_staging_jars.sh
+++ b/scripts/release/deploy_staging_jars.sh
@@ -45,7 +45,7 @@ else
if [[ $param =~ --scala_version\=(2\.1[1-2]) ]]; then
SCALA_VERSION=${BASH_REMATCH[1]}
   elif [[ $param =~ --spark_version\=([2-3]) ]]; then
-  SPARK_VERSION=${BASH_REMATCH[0]}
+  SPARK_VERSION=${BASH_REMATCH[1]}
fi
 done
 fi

[hudi] annotated tag release-0.8.0-rc1 updated (8cc8c0b -> 6cce34d)

2021-03-28 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a change to annotated tag release-0.8.0-rc1
in repository https://gitbox.apache.org/repos/asf/hudi.git.


*** WARNING: tag release-0.8.0-rc1 was modified! ***

from 8cc8c0b  (commit)
  to 6cce34d  (tag)
 tagging 8cc8c0b743840e1b365e8625202fed0a53f66621 (commit)
 replaces hoodie-0.4.7
  by garyli1019
  on Sun Mar 28 20:17:37 2021 +0800

- Log -
0.8.0
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEE4qlxTg+6Ogh73uZV5yhz12XWxAYFAmBgc+EACgkQ5yhz12XW
xAZBwQ//eUP+D5ACaR3qcuSevELMOFQlOB0FY8Sq+9vLzzwpN87pEwuY6pSMYZBN
xBcijCc2dUkaiOot5IRW85UVoRtEsvWpDZxxFBrsHpAq+UrjQChzNJa5nLTzWLzG
tLO9uFUC44N/KEjWn2lxGKH4KGViC6xD6Ltm1+656QuDk5B43AE6hYWmhM5xyhdo
EtoFrwKiCxW99NnipGHH8XLk9ZtOBL2D0cD6GfUXn2fIpV3YIzsFRP2j97wSwoJr
8IsEjLtPObQMBAEB8SCZWjTGNroWhf2aaAHrh8MNCBFcnTdUl8qsDecyGzBM5MFJ
LG69x0W4rwmYj0V2pZjQ6Ys7ueuqw44RNU/+b99D+i9D15F/cDwbPKw4vf0vZg06
FcU454R7hLn75or/AY2KYg0PILNLiys+gTzkWc4rtixevDFMiAxxlo9KE4i7kr+d
9mf0y272nUcRDbF4t6nBqjU4Wy+x9SGwvXmqBUM7eX5c6wetKYITMSlsfwYf682P
50sHKznlcAORUfJ1vQSxc6hO/DxN2iN9UQtCLRn1cVAcG25Js6kFJp0HjBrZMz3Q
K5L0+xfO+L35y5FjRY58HMR2Vt7tZPBO7+M+5kHMVIR4IKSxDNrgAEYYgvU74XI9
M9km5kzsmZ/F+VpmUwaA/8MQIxty9ipuAhypFj8AyHVVaVCfMk8=
=/wbj
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:

[GitHub] [hudi] leesf merged pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-03-28 Thread GitBox



leesf merged pull request #2608:
URL: https://github.com/apache/hudi/pull/2608


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client (#2608)

2021-03-28 Thread leesf

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ecbd389  [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client 
(#2608)
ecbd389 is described below

commit ecbd389a3f9215e219cb19b8641f2faea4fa3ad7
Author: Shen Hong 
AuthorDate: Sun Mar 28 20:28:40 2021 +0800

[HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client (#2608)
---
 .../hudi/index/bloom/HoodieBaseBloomIndex.java}|  26 +--
 .../bloom/HoodieBaseBloomIndexCheckFunction.java}  |   4 +-
 .../org/apache/hudi/index/FlinkHoodieIndex.java|   2 +-
 .../hudi/index/bloom/FlinkHoodieBloomIndex.java| 235 +
 .../apache/hudi/index/JavaHoodieBloomIndex.java|  32 +++
 .../org/apache/hudi/index/JavaHoodieIndex.java |   4 +-
 6 files changed, 50 insertions(+), 253 deletions(-)

diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/FlinkHoodieBloomIndex.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBaseBloomIndex.java
similarity index 92%
copy from 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/FlinkHoodieBloomIndex.java
copy to 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBaseBloomIndex.java
index 255a66b..75ab693 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/FlinkHoodieBloomIndex.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBaseBloomIndex.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.index.bloom;
 
+import com.beust.jcommander.internal.Lists;
 import org.apache.hudi.client.WriteStatus;
 import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.model.HoodieKey;
@@ -28,15 +29,13 @@ import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.MetadataNotFoundException;
-import org.apache.hudi.index.FlinkHoodieIndex;
+import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.index.HoodieIndexUtils;
 import org.apache.hudi.io.HoodieKeyLookupHandle;
 import org.apache.hudi.io.HoodieRangeInfoHandle;
 import org.apache.hudi.table.HoodieTable;
-
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
-import com.beust.jcommander.internal.Lists;
 
 import java.util.ArrayList;
 import java.util.HashMap;
@@ -44,20 +43,17 @@ import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 
-import static java.util.stream.Collectors.mapping;
 import static java.util.stream.Collectors.groupingBy;
+import static java.util.stream.Collectors.mapping;
 import static java.util.stream.Collectors.toList;
 import static 
org.apache.hudi.index.HoodieIndexUtils.getLatestBaseFilesForAllPartitions;
 
-/**
- * Indexing mechanism based on bloom filter. Each parquet file includes its 
row_key bloom filter in its metadata.
- */
 @SuppressWarnings("checkstyle:LineLength")
-public class FlinkHoodieBloomIndex extends 
FlinkHoodieIndex {
+public class HoodieBaseBloomIndex extends 
HoodieIndex>, List, List> {
 
-  private static final Logger LOG = 
LogManager.getLogger(FlinkHoodieBloomIndex.class);
+  private static final Logger LOG = 
LogManager.getLogger(HoodieBaseBloomIndex.class);
 
-  public FlinkHoodieBloomIndex(HoodieWriteConfig config) {
+  public HoodieBaseBloomIndex(HoodieWriteConfig config) {
 super(config);
   }
 
@@ -112,7 +108,7 @@ public class FlinkHoodieBloomIndex extends FlinkH
 // Step 3: Obtain a List, for each incoming record, that already exists, 
with the file id,
 // that contains it.
 List> fileComparisons =
-explodeRecordsWithFileComparisons(partitionToFileInfo, 
partitionRecordKeyMap);
+explodeRecordsWithFileComparisons(partitionToFileInfo, 
partitionRecordKeyMap);
 return findMatchingFilesForRecordKeys(fileComparisons, hoodieTable);
   }
 
@@ -121,7 +117,7 @@ public class FlinkHoodieBloomIndex extends FlinkH
*/
   //TODO duplicate code with spark, we can optimize this method later
   List> loadInvolvedFiles(List 
partitions, final HoodieEngineContext context,
- final HoodieTable 
hoodieTable) {
+   final HoodieTable 
hoodieTable) {
 // Obtain the latest data files from all the partitions.
 List> partitionPathFileIDList = 
getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable).stream()
 .map(pair -> Pair.of(pair.getKey(), pair.getValue().getFileId()))
@@ -197,7 +193,7 @@ public class FlinkHoodieBloomIndex extends FlinkH
   hoodieRecordKeys.forEach(hoodieRecordKey -> {
 indexFileFilter.getMatchingFilesAndPartition(partitionPath,

[GitHub] [hudi] codecov-io edited a comment on pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2732:
URL: https://github.com/apache/hudi/pull/2732#issuecomment-808716532


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=h1) Report
   > Merging 
[#2732](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=desc) (e5121ef) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/8b774fe3313757a8b94ca408079327c62b4a664c?el=desc)
 (8b774fe) will **increase** coverage by `18.00%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head e5121ef differs from pull request most recent 
head dcd8294. Consider uploading reports for the commit dcd8294 to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2732/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2732   +/-   ##
   =
   + Coverage 51.73%   69.73%   +18.00% 
   + Complexity 3606  371 -3235 
   =
 Files   476   54  -422 
 Lines 22611 1989-20622 
 Branches   2410  236 -2174 
   =
   - Hits  11697 1387-10310 
   + Misses 9891  471 -9420 
   + Partials   1023  131  -892 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...di/hadoop/hive/HoodieCombineRealtimeFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | | | |
   | 
[...op/realtime/HoodieCombineRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUNvbWJpbmVSZWFsdGltZVJlY29yZFJlYWRlci5qYXZh)
 | | | |
   | 
[...common/table/view/PriorityBasedFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUHJpb3JpdHlCYXNlZEZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | | | |
   | 
[...rg/apache/hudi/sink/KeyedWriteProcessOperator.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0tleWVkV3JpdGVQcm9jZXNzT3BlcmF0b3IuamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/model/ClusteringOperation.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NsdXN0ZXJpbmdPcGVyYXRpb24uamF2YQ==)
 | | | |
   | 
[.../apache/hudi/hadoop/RecordReaderValueIterator.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL1JlY29yZFJlYWRlclZhbHVlSXRlcmF0b3IuamF2YQ==)
 | | | |
   | 
[...apache/hudi/common/util/collection/RocksDBDAO.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9Sb2Nrc0RCREFPLmphdmE=)
 | | | |
   | 
[...3/internal/HoodieDataSourceInternalBatchWrite.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlLmphdmE=)
 | | | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | | | |
   | 
[...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==)
 | | | |
   | ... and [405 
more](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2732:
URL: https://github.com/apache/hudi/pull/2732#issuecomment-808716532


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=h1) Report
   > Merging 
[#2732](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=desc) (dc6bddf) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/8b774fe3313757a8b94ca408079327c62b4a664c?el=desc)
 (8b774fe) will **decrease** coverage by `4.94%`.
   > The diff coverage is `85.93%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2732/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2732  +/-   ##
   
   - Coverage 51.73%   46.78%   -4.95% 
   + Complexity 3606 3300 -306 
   
 Files   476  479   +3 
 Lines 2261122799 +188 
 Branches   2410 2413   +3 
   
   - Hits  1169710667-1030 
   - Misses 989111226+1335 
   + Partials   1023  906 -117 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.97% <ø> (+0.06%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `55.95% <85.93%> (+1.77%)` | `0.00 <15.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (-0.12%)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `9.40% <ø> (-60.34%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/streamer/HoodieFlinkStreamerV2.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyVjIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `12.19% <0.00%> (ø)` | `2.00 <0.00> (ø)` | |
   | 
[.../main/java/org/apache/hudi/sink/CleanFunction.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0NsZWFuRnVuY3Rpb24uamF2YQ==)
 | `36.84% <36.84%> (ø)` | `2.00 <2.00> (?)` | |
   | 
[.../org/apache/hudi/sink/utils/NonThrownExecutor.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL05vblRocm93bkV4ZWN1dG9yLmphdmE=)
 | `77.77% <77.77%> (ø)` | `5.00 <5.00> (?)` | |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `70.28% <85.71%> (+1.34%)` | `37.00 <3.00> (+5.00)` | |
   | 
[...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh)
 | `97.14% <97.14%> (ø)` | `3.00 <3.00> (?)` | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | `88.88% <100.00%> (+4.83%)` | `11.00 <0.00> (ø)` | |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `70.21% <100.00%> (ø)` | `11.00 <1.00> (ø)` | |
   | 
[...c/main/java/org/apache/hudi/util/StreamerUtil.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1N0cmVhbWVyVXRpbC5qYXZh)
 | `52.21% <100.00%> (+2.67%)` | `18.00 <1.00> (+1.00)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)

[GitHub] [hudi] codecov-io edited a comment on pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2732:
URL: https://github.com/apache/hudi/pull/2732#issuecomment-808716532


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=h1) Report
   > Merging 
[#2732](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=desc) (dc6bddf) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/8b774fe3313757a8b94ca408079327c62b4a664c?el=desc)
 (8b774fe) will **decrease** coverage by `5.24%`.
   > The diff coverage is `85.93%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2732/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2732  +/-   ##
   
   - Coverage 51.73%   46.48%   -5.25% 
   + Complexity 3606 3110 -496 
   
 Files   476  457  -19 
 Lines 2261121191-1420 
 Branches   2410 2258 -152 
   
   - Hits  11697 9851-1846 
   - Misses 989110509 +618 
   + Partials   1023  831 -192 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.97% <ø> (+0.06%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `55.95% <85.93%> (+1.77%)` | `0.00 <15.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.40% <ø> (-60.34%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/streamer/HoodieFlinkStreamerV2.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9Ib29kaWVGbGlua1N0cmVhbWVyVjIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `12.19% <0.00%> (ø)` | `2.00 <0.00> (ø)` | |
   | 
[.../main/java/org/apache/hudi/sink/CleanFunction.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0NsZWFuRnVuY3Rpb24uamF2YQ==)
 | `36.84% <36.84%> (ø)` | `2.00 <2.00> (?)` | |
   | 
[.../org/apache/hudi/sink/utils/NonThrownExecutor.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL05vblRocm93bkV4ZWN1dG9yLmphdmE=)
 | `77.77% <77.77%> (ø)` | `5.00 <5.00> (?)` | |
   | 
[...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh)
 | `70.28% <85.71%> (+1.34%)` | `37.00 <3.00> (+5.00)` | |
   | 
[...va/org/apache/hudi/sink/utils/HiveSyncContext.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3V0aWxzL0hpdmVTeW5jQ29udGV4dC5qYXZh)
 | `97.14% <97.14%> (ø)` | `3.00 <3.00> (?)` | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | `88.88% <100.00%> (+4.83%)` | `11.00 <0.00> (ø)` | |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `70.21% <100.00%> (ø)` | `11.00 <1.00> (ø)` | |
   | 
[...c/main/java/org/apache/hudi/util/StreamerUtil.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1N0cmVhbWVyVXRpbC5qYXZh)
 | `52.21% <100.00%> (+2.67%)` | `18.00 <1.00> (+1.00)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%>

[GitHub] [hudi] codecov-io edited a comment on pull request #2732: [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink writer

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2732:
URL: https://github.com/apache/hudi/pull/2732#issuecomment-808716532


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=h1) Report
   > Merging 
[#2732](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=desc) (dc6bddf) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/8b774fe3313757a8b94ca408079327c62b4a664c?el=desc)
 (8b774fe) will **decrease** coverage by `42.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2732/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2732   +/-   ##
   
   - Coverage 51.73%   9.40%   -42.33% 
   + Complexity 3606  48 -3558 
   
 Files   476  54  -422 
 Lines 226111989-20622 
 Branches   2410 236 -2174 
   
   - Hits  11697 187-11510 
   + Misses 98911789 -8102 
   + Partials   1023  13 -1010 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.40% <ø> (-60.34%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2732?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2732/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%>

[GitHub] [hudi] garyli1019 closed pull request #2729: [DO NOT MERGE] Verify IT scala2.12

2021-03-28 Thread GitBox



garyli1019 closed pull request #2729:
URL: https://github.com/apache/hudi/pull/2729


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 closed pull request #2730: [DO NOT MERGE]Verify IT spark3

2021-03-28 Thread GitBox



garyli1019 closed pull request #2730:
URL: https://github.com/apache/hudi/pull/2730


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch release-0.8.0 updated: [HOTFIX] Disable ITs for Spark3 and scala2.12 (#2733)

2021-03-28 Thread garyli

This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch release-0.8.0
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/release-0.8.0 by this push:
 new 8cc8c0b  [HOTFIX] Disable ITs for Spark3 and scala2.12 (#2733)
8cc8c0b is described below

commit 8cc8c0b743840e1b365e8625202fed0a53f66621
Author: Gary Li 
AuthorDate: Sun Mar 28 01:07:57 2021 -0700

[HOTFIX] Disable ITs for Spark3 and scala2.12 (#2733)
---
 pom.xml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/pom.xml b/pom.xml
index db1a798..e088cff 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1426,6 +1426,7 @@
 ${scala12.version}
 2.12
 true
+true
   
   
 
@@ -1473,6 +1474,7 @@
 
${fasterxml.spark3.version}
 
${fasterxml.spark3.version}
 true
+true

[GitHub] [hudi] garyli1019 merged pull request #2733: [HOTFIX] Disable ITs for Spark3 and scala2.12

2021-03-28 Thread GitBox



garyli1019 merged pull request #2733:
URL: https://github.com/apache/hudi/pull/2733


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io edited a comment on pull request #2729: [DO NOT MERGE] Verify IT scala2.12

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2729:
URL: https://github.com/apache/hudi/pull/2729#issuecomment-808659555


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=h1) Report
   > Merging 
[#2729](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=desc) (642264b) 
into 
[release-0.8.0](https://codecov.io/gh/apache/hudi/commit/b7c47b195861c90269f10932dd2838e2f53bf326?el=desc)
 (b7c47b1) will **increase** coverage by `0.13%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2729/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## release-0.8.0#2729  +/-   ##
   ===
   + Coverage51.73%   51.87%   +0.13% 
   - Complexity3602 3651  +49 
   ===
 Files  476  468   -8 
 Lines2259822492 -106 
 Branches  2409 2365  -44 
   ===
   - Hits 1169211668  -24 
   + Misses9888 9829  -59 
   + Partials  1018  995  -23 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.06% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.92% <ø> (-0.05%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `54.08% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `74.47% <ø> (+3.59%)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.78% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[.../main/scala/org/apache/hudi/cli/SparkHelpers.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9TcGFya0hlbHBlcnMuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/scala/org/apache/hudi/cli/DedupeSparkJob.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZWR1cGVTcGFya0pvYi5zY2FsYQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...nal/HoodieBulkInsertDataInternalWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVCdWxrSW5zZXJ0RGF0YUludGVybmFsV3JpdGVyRmFjdG9yeS5qYXZh)
 | | | |
   | 
[...org/apache/hudi/spark3/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9EZWZhdWx0U291cmNlLmphdmE=)
 | | | |
   | 
[...nal/HoodieDataSourceInternalBatchWriteBuilder.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlQnVpbGRlci5qYXZh)
 | | | |
   | 
[...udi/spark3/internal/HoodieWriterCommitMessage.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVXcml0ZXJDb21taXRNZXNzYWdlLmphdmE=)
 | | | |
   |

[GitHub] [hudi] codecov-io edited a comment on pull request #2733: [HOTFIX] Disable ITs for Spark3 and scala2.12

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2733:
URL: https://github.com/apache/hudi/pull/2733#issuecomment-808858281






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io commented on pull request #2734: [HUDI-1731] Rename UpsertPartitioner in both hudi-java-client and hud…

2021-03-28 Thread GitBox



codecov-io commented on pull request #2734:
URL: https://github.com/apache/hudi/pull/2734#issuecomment-808859046


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2734?src=pr=h1) Report
   > Merging 
[#2734](https://codecov.io/gh/apache/hudi/pull/2734?src=pr=desc) (eef9891) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/bec70413c0943f38ee5cdf62fa3a79af44d8cded?el=desc)
 (bec7041) will **decrease** coverage by `42.33%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2734/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2734?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2734   +/-   ##
   
   - Coverage 51.73%   9.40%   -42.34% 
   + Complexity 3607  48 -3559 
   
 Files   476  54  -422 
 Lines 226141989-20625 
 Branches   2410 236 -2174 
   
   - Hits  11700 187-11513 
   + Misses 98921789 -8103 
   + Partials   1022  13 -1009 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.40% <ø> (-60.39%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2734?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2734/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%>

[GitHub] [hudi] codecov-io edited a comment on pull request #2730: [DO NOT MERGE]Verify IT spark3

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2730:
URL: https://github.com/apache/hudi/pull/2730#issuecomment-808660342






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1731) Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to differentiate them from each other

2021-03-28 Thread Leo Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Zhu updated HUDI-1731:
--
Status: Patch Available  (was: In Progress)

> Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to 
> differentiate them from each other
> -
>
> Key: HUDI-1731
> URL: https://issues.apache.org/jira/browse/HUDI-1731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Leo Zhu
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There's same fully-qualified name class - 
> "org.apache.hudi.table.action.commit.UpsertPartitioner" in both 
> hudi-spark-client and hudi-java-client module. When both jars are included in 
> classpath, one would override another, and below error would happen,
> Exception in thread "main" java.lang.VerifyError: Bad return typeException in 
> thread "main" java.lang.VerifyError: Bad return typeException Details:  
> Location:    
> org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.getUpsertPartitioner(Lorg/apache/hudi/table/WorkloadProfile;)Lorg/apache/spark/Partitioner;
>  @34: areturn  Reason:    Type 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' (current frame, 
> stack[0]) is not assignable to 'org/apache/spark/Partitioner' (from method 
> signature)  Current Frame:    bci: @34    flags: \{ }    locals: \{ 
> 'org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor', 
> 'org/apache/hudi/table/WorkloadProfile' }    stack: \{ 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' }  Bytecode:    
> 0x000: 2bc7 000d bb00 9859 12bc b700 9bbf bb00    0x010: 8d59 2b2a 
> b400 0f2a b400 052a b400 03b7    0x020: 00bd b0                           
>        Stackmap Table:    same_frame(@14)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:97)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:82)
>  at 
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:169)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] codecov-io edited a comment on pull request #2729: [DO NOT MERGE] Verify IT scala2.12

2021-03-28 Thread GitBox



codecov-io edited a comment on pull request #2729:
URL: https://github.com/apache/hudi/pull/2729#issuecomment-808659555


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=h1) Report
   > Merging 
[#2729](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=desc) (642264b) 
into 
[release-0.8.0](https://codecov.io/gh/apache/hudi/commit/b7c47b195861c90269f10932dd2838e2f53bf326?el=desc)
 (b7c47b1) will **increase** coverage by `0.13%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2729/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## release-0.8.0#2729  +/-   ##
   ===
   + Coverage51.73%   51.87%   +0.13% 
   - Complexity3602 3651  +49 
   ===
 Files  476  468   -8 
 Lines2259822492 -106 
 Branches  2409 2365  -44 
   ===
   - Hits 1169211668  -24 
   + Misses9888 9829  -59 
   + Partials  1018  995  -23 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.06% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.92% <ø> (-0.05%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `54.08% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `74.47% <ø> (+3.59%)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.78% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2729?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[.../main/scala/org/apache/hudi/cli/SparkHelpers.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9TcGFya0hlbHBlcnMuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/scala/org/apache/hudi/cli/DedupeSparkJob.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZWR1cGVTcGFya0pvYi5zY2FsYQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...spark/src/main/scala/org/apache/hudi/package.scala](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL3BhY2thZ2Uuc2NhbGE=)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...udi/spark3/internal/HoodieWriterCommitMessage.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVXcml0ZXJDb21taXRNZXNzYWdlLmphdmE=)
 | | | |
   | 
[...nal/HoodieDataSourceInternalBatchWriteBuilder.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlQnVpbGRlci5qYXZh)
 | | | |
   | 
[...spark3/internal/HoodieDataSourceInternalTable.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxUYWJsZS5qYXZh)
 | | | |
   | 
[...org/apache/hudi/spark3/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2729/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9EZWZhdWx0U291cmNlLmphdmE=)
 | | | |
   |

[jira] [Updated] (HUDI-1731) Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to differentiate them from each other

2021-03-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1731:
-
Labels: pull-request-available  (was: )

> Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to 
> differentiate them from each other
> -
>
> Key: HUDI-1731
> URL: https://issues.apache.org/jira/browse/HUDI-1731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Leo Zhu
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There's same fully-qualified name class - 
> "org.apache.hudi.table.action.commit.UpsertPartitioner" in both 
> hudi-spark-client and hudi-java-client module. When both jars are included in 
> classpath, one would override another, and below error would happen,
> Exception in thread "main" java.lang.VerifyError: Bad return typeException in 
> thread "main" java.lang.VerifyError: Bad return typeException Details:  
> Location:    
> org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.getUpsertPartitioner(Lorg/apache/hudi/table/WorkloadProfile;)Lorg/apache/spark/Partitioner;
>  @34: areturn  Reason:    Type 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' (current frame, 
> stack[0]) is not assignable to 'org/apache/spark/Partitioner' (from method 
> signature)  Current Frame:    bci: @34    flags: \{ }    locals: \{ 
> 'org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor', 
> 'org/apache/hudi/table/WorkloadProfile' }    stack: \{ 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' }  Bytecode:    
> 0x000: 2bc7 000d bb00 9859 12bc b700 9bbf bb00    0x010: 8d59 2b2a 
> b400 0f2a b400 052a b400 03b7    0x020: 00bd b0                           
>        Stackmap Table:    same_frame(@14)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:97)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:82)
>  at 
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:169)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] leo-Iamok opened a new pull request #2734: [HUDI-1731] Rename UpsertPartitioner in both hudi-java-client and hud…

2021-03-28 Thread GitBox



leo-Iamok opened a new pull request #2734:
URL: https://github.com/apache/hudi/pull/2734


   Add "Java" prefix for "UpsertPartitioner" in hudi-java-client to 
differentiate it from another same fully qualified name "UpsertPartitioner" in 
hudi-spark-client, otherwise when both modules are included in classpath, the 
two classes would conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-io commented on pull request #2733: [HOTFIX] Disable ITs for Spark3 and scala2.12

2021-03-28 Thread GitBox



codecov-io commented on pull request #2733:
URL: https://github.com/apache/hudi/pull/2733#issuecomment-808858281


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2733?src=pr=h1) Report
   > Merging 
[#2733](https://codecov.io/gh/apache/hudi/pull/2733?src=pr=desc) (ad21bee) 
into 
[release-0.8.0](https://codecov.io/gh/apache/hudi/commit/b7c47b195861c90269f10932dd2838e2f53bf326?el=desc)
 (b7c47b1) will **increase** coverage by `17.99%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2733/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2733?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## release-0.8.0#2733   +/-   ##
   
   + Coverage51.73%   69.73%   +17.99% 
   + Complexity3602  371 -3231 
   
 Files  476   54  -422 
 Lines22598 1989-20609 
 Branches  2409  236 -2173 
   
   - Hits 11692 1387-10305 
   + Misses9888  471 -9417 
   + Partials  1018  131  -887 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.73% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2733?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/spark3/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9EZWZhdWx0U291cmNlLmphdmE=)
 | | | |
   | 
[...pache/hudi/hadoop/config/HoodieRealtimeConfig.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2NvbmZpZy9Ib29kaWVSZWFsdGltZUNvbmZpZy5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/io/storage/HoodieFileReader.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlUmVhZGVyLmphdmE=)
 | | | |
   | 
[.../apache/hudi/exception/HoodieCompactException.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUNvbXBhY3RFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/cli/commands/TempViewCommand.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RlbXBWaWV3Q29tbWFuZC5qYXZh)
 | | | |
   | 
[...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==)
 | | | |
   | 
[...g/apache/hudi/exception/HoodieRemoteException.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZVJlbW90ZUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | | | |
   | 
[.../main/scala/org/apache/hudi/cli/SparkHelpers.scala](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9TcGFya0hlbHBlcnMuc2NhbGE=)
 | | | |
   | 
[...i/common/util/collection/ExternalSpillableMap.java](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9FeHRlcm5hbFNwaWxsYWJsZU1hcC5qYXZh)
 | | | |
   | ... and [412 
more](https://codecov.io/gh/apache/hudi/pull/2733/diff?src=pr=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 opened a new pull request #2733: [HOTFIX] Disable ITs for Spark3 and scala2.12

2021-03-28 Thread GitBox



garyli1019 opened a new pull request #2733:
URL: https://github.com/apache/hudi/pull/2733


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-1733) Support scala2.12 integration test

2021-03-28 Thread Gary Li (Jira)

Gary Li created HUDI-1733:
-

 Summary: Support scala2.12 integration test
 Key: HUDI-1733
 URL: https://issues.apache.org/jira/browse/HUDI-1733
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Testing
Reporter: Gary Li


Integration test runs failed with Dscala-2.12 version.

 
{code:java}
[INFO] 5888[INFO] Results:5889[INFO] 5890[ERROR] Failures: 5891[ERROR]   
ITTestRepairsCommand.testDeduplicateWithInserts:128 expected:  but was: 
5892[ERROR]   
ITTestRepairsCommand.testDeduplicateWithReal:213 expected:  but was: 5893[ERROR]   
ITTestRepairsCommand.testDeduplicateWithUpdates:155 expected:  but was: 
5894[ERROR]   
ITTestRepairsCommand.testDeduplicateWithUpserts:182 expected:  but was: 

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1732) Support Spark3 integration test

2021-03-28 Thread Gary Li (Jira)

Gary Li created HUDI-1732:
-

 Summary: Support Spark3 integration test
 Key: HUDI-1732
 URL: https://issues.apache.org/jira/browse/HUDI-1732
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Testing
Reporter: Gary Li


Currently, the integration test has hard-coded spark2.4 in docker image. We 
need to support this with a changeable spark version and scala version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1731) Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to differentiate them from each other

2021-03-28 Thread Leo Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Zhu updated HUDI-1731:
--
Status: In Progress  (was: Open)

> Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to 
> differentiate them from each other
> -
>
> Key: HUDI-1731
> URL: https://issues.apache.org/jira/browse/HUDI-1731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Leo Zhu
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There's same fully-qualified name class - 
> "org.apache.hudi.table.action.commit.UpsertPartitioner" in both 
> hudi-spark-client and hudi-java-client module. When both jars are included in 
> classpath, one would override another, and below error would happen,
> Exception in thread "main" java.lang.VerifyError: Bad return typeException in 
> thread "main" java.lang.VerifyError: Bad return typeException Details:  
> Location:    
> org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.getUpsertPartitioner(Lorg/apache/hudi/table/WorkloadProfile;)Lorg/apache/spark/Partitioner;
>  @34: areturn  Reason:    Type 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' (current frame, 
> stack[0]) is not assignable to 'org/apache/spark/Partitioner' (from method 
> signature)  Current Frame:    bci: @34    flags: \{ }    locals: \{ 
> 'org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor', 
> 'org/apache/hudi/table/WorkloadProfile' }    stack: \{ 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' }  Bytecode:    
> 0x000: 2bc7 000d bb00 9859 12bc b700 9bbf bb00    0x010: 8d59 2b2a 
> b400 0f2a b400 052a b400 03b7    0x020: 00bd b0                           
>        Stackmap Table:    same_frame(@14)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:97)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:82)
>  at 
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:169)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1730) Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to differentiate them from each other

2021-03-28 Thread Leo Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Zhu closed HUDI-1730.
-
Resolution: Duplicate

Duplicate with HUDI-1731

> Rename UpsertPartitioner in both hudi-java-client and hudi-spark-client to 
> differentiate them from each other
> -
>
> Key: HUDI-1730
> URL: https://issues.apache.org/jira/browse/HUDI-1730
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Leo Zhu
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> There's same fully-qualified name class - 
> "org.apache.hudi.table.action.commit.UpsertPartitioner" in both 
> hudi-spark-client and hudi-java-client module. When both jars are included in 
> classpath, one would override another, and below error would happen,
> Exception in thread "main" java.lang.VerifyError: Bad return typeException in 
> thread "main" java.lang.VerifyError: Bad return typeException Details:  
> Location:    
> org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.getUpsertPartitioner(Lorg/apache/hudi/table/WorkloadProfile;)Lorg/apache/spark/Partitioner;
>  @34: areturn  Reason:    Type 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' (current frame, 
> stack[0]) is not assignable to 'org/apache/spark/Partitioner' (from method 
> signature)  Current Frame:    bci: @34    flags: \{ }    locals: \{ 
> 'org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor', 
> 'org/apache/hudi/table/WorkloadProfile' }    stack: \{ 
> 'org/apache/hudi/table/action/commit/UpsertPartitioner' }  Bytecode:    
> 0x000: 2bc7 000d bb00 9859 12bc b700 9bbf bb00    0x010: 8d59 2b2a 
> b400 0f2a b400 052a b400 03b7    0x020: 00bd b0                           
>        Stackmap Table:    same_frame(@14)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:97)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.insert(HoodieSparkCopyOnWriteTable.java:82)
>  at 
> org.apache.hudi.client.SparkRDDWriteClient.insert(SparkRDDWriteClient.java:169)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

86 matches

Mail list logo