[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458230756 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458232450 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458242933 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiMergeOnReadRDD.scala ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation

[hudi] branch hudi_test_suite_refactor updated (247d923 -> ea2c616)

2020-07-21 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. discard 247d923 [HUDI-394] Provide a basic implementation of test suite add ea2c616 [HUDI-394]

[GitHub] [hudi] tooptoop4 commented on issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-07-21 Thread GitBox
tooptoop4 commented on issue #1846: URL: https://github.com/apache/hudi/issues/1846#issuecomment-662028709 @xushiyan I want to replace contents of existing table. ie read existing 10k small files from tableA and replace tableA with 20 big files

[GitHub] [hudi] nsivabalan opened a new pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox
nsivabalan opened a new pull request #1858: URL: https://github.com/apache/hudi/pull/1858 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] ssomuah commented on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
ssomuah commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-661970919 I don't see any exceptions in the driver logs or executor logs. I see these two warnings in driver logs ``` 20/07/21 13:12:28 WARN IncrementalTimelineSyncFileSystemView:

[GitHub] [hudi] xushiyan commented on issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-07-21 Thread GitBox
xushiyan commented on issue #1846: URL: https://github.com/apache/hudi/issues/1846#issuecomment-662007289 > Can I use it to read all 0.4.6 COW hoodie data from one path and write back into less files in 0.5.3 format on same path? IIUC, this is to perform write operation from one

[GitHub] [hudi] vinothchandar opened a new pull request #1857: [HUDI-1029] In inline compaction mode, previously failed compactions …

2020-07-21 Thread GitBox
vinothchandar opened a new pull request #1857: URL: https://github.com/apache/hudi/pull/1857 …needs to be retried before new compactions - Prevents failed compactions from causing issues with future commits - Need to add tests ## *Tips* - *Thank you very much for

[GitHub] [hudi] xushiyan commented on issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-07-21 Thread GitBox
xushiyan commented on issue #1846: URL: https://github.com/apache/hudi/issues/1846#issuecomment-662059975 @tooptoop4 actually you should be able to achieve that with `HoodieDeltaStreamer`: just point the source to the existing hudi table and write to another dir, make sure set file size

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458245661 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiMergeOnReadRDD.scala ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Assigned] (HUDI-767) Support transformation when export to Hudi

2020-07-21 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-767: --- Assignee: (was: Raymond Xu) > Support transformation when export to Hudi >

[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-661970919 I don't see any exceptions in the driver logs or executor logs. I see these two warnings in driver logs ``` 20/07/21 13:12:28 WARN

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458245661 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiMergeOnReadRDD.scala ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Resolved] (HUDI-92) Include custom names for spark HUDI spark DAG stages for easier understanding

2020-07-21 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason resolved HUDI-92. Resolution: Fixed > Include custom names for spark HUDI spark DAG stages for easier understanding >

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458241891 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiMergeOnReadRDD.scala ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation

[hudi] branch master updated: [HUDI-994] Move TestHoodieIndex test cases to unit tests (#1850)

2020-07-21 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 5e7ab11 [HUDI-994] Move TestHoodieIndex test

[GitHub] [hudi] vinothchandar merged pull request #1850: [HUDI-994] Move TestHoodieIndex test cases to unit tests

2020-07-21 Thread GitBox
vinothchandar merged pull request #1850: URL: https://github.com/apache/hudi/pull/1850 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458355792 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458237536 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458319202 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java ## @@ -110,7 +110,7 @@ public long

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458354886 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458355188 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReader.java ## @@ -34,7 +35,17 @@ public Set filterRowKeys(Set

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458355367 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java ## @@ -0,0 +1,301 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] prashantwason commented on a change in pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-21 Thread GitBox
prashantwason commented on a change in pull request #1804: URL: https://github.com/apache/hudi/pull/1804#discussion_r458355532 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java ## @@ -0,0 +1,301 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r458247881 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiMergeOnReadRDD.scala ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] xushiyan commented on issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-07-21 Thread GitBox
xushiyan commented on issue #1846: URL: https://github.com/apache/hudi/issues/1846#issuecomment-662038478 > @xushiyan I want to replace contents of existing table. ie read existing 10k small files from tableA and replace tableA with 20 big files @tooptoop4 as i mentioned,

[GitHub] [hudi] vinothchandar commented on pull request #1765: [HUDI-1049] 0.5.3 Patch - In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-21 Thread GitBox
vinothchandar commented on pull request #1765: URL: https://github.com/apache/hudi/pull/1765#issuecomment-662094985 Closing this in favor of #1857 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] vinothchandar closed pull request #1765: [HUDI-1049] 0.5.3 Patch - In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-21 Thread GitBox
vinothchandar closed pull request #1765: URL: https://github.com/apache/hudi/pull/1765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox
nsivabalan commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r458365915 ## File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -190,6 +192,7 @@ public HoodieMetrics getMetrics() {

[GitHub] [hudi] leesf commented on a change in pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-21 Thread GitBox
leesf commented on a change in pull request #1851: URL: https://github.com/apache/hudi/pull/1851#discussion_r458477384 ## File path: hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporterFactory.java ## @@ -48,6 +51,10 @@ public static MetricsReporter

[GitHub] [hudi] leesf commented on a change in pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-21 Thread GitBox
leesf commented on a change in pull request #1851: URL: https://github.com/apache/hudi/pull/1851#discussion_r458477826 ## File path: hudi-client/src/main/java/org/apache/hudi/metrics/userdefined/DefaultUserDefinedMetricsReporter.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to

[GitHub] [hudi] leesf commented on a change in pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-21 Thread GitBox
leesf commented on a change in pull request #1851: URL: https://github.com/apache/hudi/pull/1851#discussion_r458477465 ## File path: hudi-client/src/main/java/org/apache/hudi/metrics/MetricsReporterType.java ## @@ -22,5 +22,5 @@ * Types of the reporter. Right now we only

[GitHub] [hudi] satishkotha commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
satishkotha commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458511328 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java ## @@ -126,6 +129,13 @@ */ HoodieTimeline

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox
vinothchandar commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r458411863 ## File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -190,6 +192,7 @@ public HoodieMetrics

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-662177092 MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.clean.requested' ~/Downloads/dot_hoodie_folder.txt 16 MacBook-Pro:hudi balaji.varadarajan$ grep -c '\.deltacommit.requested'

[GitHub] [hudi] n3nash commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458476891 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java ## @@ -126,6 +129,13 @@ */ HoodieTimeline

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-21 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r458550790 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteCommitCallbackConfig.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] n3nash commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458476085 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -304,6 +305,22 @@ public HoodieInstant

[GitHub] [hudi] n3nash commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458475548 ## File path: hudi-common/src/main/avro/HoodieReplaceMetadata.avsc ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[GitHub] [hudi] n3nash commented on a change in pull request #1859: [HUDI-1072] Use replace metadata file to filter excluded files in views

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1859: URL: https://github.com/apache/hudi/pull/1859#discussion_r458490961 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -103,14 +105,19 @@ protected void

[GitHub] [hudi] satishkotha commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
satishkotha commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458510963 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -65,7 +65,8 @@ COMMIT_EXTENSION,

[GitHub] [hudi] satishkotha commented on pull request #1859: [HUDI-1072] Use replace metadata file to filter excluded files in views

2020-07-21 Thread GitBox
satishkotha commented on pull request #1859: URL: https://github.com/apache/hudi/pull/1859#issuecomment-662223025 > Reviewed 50%, high level, I feel the changes of excludeFileGroups is being forced into many of the `TableFileSystem` implementations. Need to think more if there is a way to

[GitHub] [hudi] n3nash commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458476453 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -65,7 +65,8 @@ COMMIT_EXTENSION,

[GitHub] [hudi] satishkotha commented on a change in pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
satishkotha commented on a change in pull request #1853: URL: https://github.com/apache/hudi/pull/1853#discussion_r458510585 ## File path: hudi-common/src/main/avro/HoodieReplaceMetadata.avsc ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[jira] [Resolved] (HUDI-896) Parallelize CI testing to reduce CI wait time

2020-07-21 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu resolved HUDI-896. - Resolution: Done > Parallelize CI testing to reduce CI wait time >

[jira] [Updated] (HUDI-896) Parallelize CI testing to reduce CI wait time

2020-07-21 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-896: Status: In Progress (was: Open) > Parallelize CI testing to reduce CI wait time >

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-21 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r458552435 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteCommitCallbackConfig.java ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox
vinothchandar commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r458412092 ## File path: hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache

[jira] [Updated] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1117: - Status: Open (was: New) > Add tdunning json library to spark and utilities bundle >

[jira] [Assigned] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-1117: Assignee: Balaji Varadarajan > Add tdunning json library to spark and utilities

[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

2020-07-21 Thread GitBox
bvaradar commented on issue #1787: URL: https://github.com/apache/hudi/issues/1787#issuecomment-662166742 @asheeshgarg : JSONException class is coming from https://mvnrepository.com/artifact/org.json/json There is licensing issue and hence not part of hudi bundle packages. The underlying

[jira] [Created] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-21 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1117: Summary: Add tdunning json library to spark and utilities bundle Key: HUDI-1117 URL: https://issues.apache.org/jira/browse/HUDI-1117 Project: Apache Hudi

[GitHub] [hudi] n3nash commented on a change in pull request #1859: [HUDI-1072] Use replace metadata file to filter excluded files in views

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1859: URL: https://github.com/apache/hudi/pull/1859#discussion_r458478588 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -103,14 +105,19 @@ protected void

[GitHub] [hudi] leesf commented on a change in pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-21 Thread GitBox
leesf commented on a change in pull request #1851: URL: https://github.com/apache/hudi/pull/1851#discussion_r458478400 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieMetricsConfig.java ## @@ -58,6 +59,12 @@ public static final String

[GitHub] [hudi] nsivabalan commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox
nsivabalan commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r458509862 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -151,6 +154,27 @@ public HoodieTableType

[jira] [Updated] (HUDI-1050) Support filter pushdown and column pruning for MOR table on Spark Datasource

2020-07-21 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1050: - Fix Version/s: (was: 0.6.1) 0.6.0 > Support filter pushdown and column

[GitHub] [hudi] yihua commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-21 Thread GitBox
yihua commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r458381933 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -245,6 +250,16 @@ public int getMaxConsistencyCheckIntervalMs() {

[GitHub] [hudi] bvaradar commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-21 Thread GitBox
bvaradar commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-662170951 With 0.5.[1/2], Hudi stopped using renames for state transition. Hence, you are seeing separate state files for each action. All these files (except rollback) will be cleaned up as

[GitHub] [hudi] leesf commented on a change in pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-21 Thread GitBox
leesf commented on a change in pull request #1851: URL: https://github.com/apache/hudi/pull/1851#discussion_r458478829 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieMetricsConfig.java ## @@ -58,6 +59,12 @@ public static final String

[jira] [Assigned] (HUDI-781) Re-design test utilities

2020-07-21 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-781: --- Assignee: Raymond Xu > Re-design test utilities > > > Key:

[jira] [Updated] (HUDI-781) Re-design test utilities

2020-07-21 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-781: Status: In Progress (was: Open) > Re-design test utilities > > >

[jira] [Updated] (HUDI-1118) Cleanup rollback files residing in .hoodie folder

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1118: - Status: Open (was: New) > Cleanup rollback files residing in .hoodie folder >

[jira] [Updated] (HUDI-1118) Cleanup rollback files residing in .hoodie folder

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1118: - Fix Version/s: (was: 0.6.1) 0.6.0 > Cleanup rollback files

[jira] [Created] (HUDI-1118) Cleanup rollback files residing in .hoodie folder

2020-07-21 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1118: Summary: Cleanup rollback files residing in .hoodie folder Key: HUDI-1118 URL: https://issues.apache.org/jira/browse/HUDI-1118 Project: Apache Hudi

[GitHub] [hudi] yanghua commented on a change in pull request #1770: [HUDI-708]Add temps show and unit test for TempViewCommand

2020-07-21 Thread GitBox
yanghua commented on a change in pull request #1770: URL: https://github.com/apache/hudi/pull/1770#discussion_r458466053 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/TempViewCommand.java ## @@ -20,36 +20,55 @@ import org.apache.hudi.cli.HoodieCLI;

[GitHub] [hudi] satishkotha commented on pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-21 Thread GitBox
satishkotha commented on pull request #1853: URL: https://github.com/apache/hudi/pull/1853#issuecomment-662220607 > High level, introducing `replace` action changes seem fine to me, interested in learning how old_file_group -> new_file_group mapping is stored and accessed. Yet to review

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #346

2020-07-21 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.34 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [hudi] stackfun opened a new issue #1860: [SUPPORT] Issue when querying from Spark Datasource if COW table is being written to at the same time

2020-07-21 Thread GitBox
stackfun opened a new issue #1860: URL: https://github.com/apache/hudi/issues/1860 **Describe the problem you faced** In one pyspark job, I'm appending 10 rows to a COW table in a loop In another pyspark job, I'm doing a select count(*) on the same table in another loop.

[jira] [Updated] (HUDI-781) Re-design test utilities

2020-07-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-781: Labels: pull-request-available (was: ) > Re-design test utilities > > >

[GitHub] [hudi] xushiyan opened a new pull request #1861: [HUDI-781] [WIP] Refactor test utils classes

2020-07-21 Thread GitBox
xushiyan opened a new pull request #1861: URL: https://github.com/apache/hudi/pull/1861 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary

[jira] [Commented] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162402#comment-17162402 ] Balaji Varadarajan commented on HUDI-1117: -- THis can also be potentially solved by including

[jira] [Comment Edited] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162402#comment-17162402 ] Balaji Varadarajan edited comment on HUDI-1117 at 7/22/20, 12:07 AM: -

[GitHub] [hudi] satishkotha opened a new pull request #1859: [HUDI-1072] Use replace metadata file to filter excluded files in views

2020-07-21 Thread GitBox
satishkotha opened a new pull request #1859: URL: https://github.com/apache/hudi/pull/1859 ## What is the purpose of the pull request Follow up on #1853 Use metadata and filter excluded files from views. Changed base views. If general approach looks good, I can update

[GitHub] [hudi] satishkotha commented on a change in pull request #1859: [HUDI-1072] Use replace metadata file to filter excluded files in views

2020-07-21 Thread GitBox
satishkotha commented on a change in pull request #1859: URL: https://github.com/apache/hudi/pull/1859#discussion_r458513411 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -103,14 +105,19 @@ protected void

[GitHub] [hudi] n3nash commented on a change in pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-07-21 Thread GitBox
n3nash commented on a change in pull request #1100: URL: https://github.com/apache/hudi/pull/1100#discussion_r457858872 ## File path: hudi-hadoop-mr/pom.xml ## @@ -125,6 +125,10 @@ mockito-junit-jupiter test + + org.mockito +

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-21 Thread GitBox
codecov-commenter edited a comment on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-652734921 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1149?src=pr=h1) Report > Merging [#1149](https://codecov.io/gh/apache/hudi/pull/1149?src=pr=desc) into

[GitHub] [hudi] vinothchandar commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
vinothchandar commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r457864770 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java ## @@ -69,6 +71,17 @@ public static

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-21 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-661677452 @umehrot2 can you also please make a quick second pass. This is an automated message from the Apache Git

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-21 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457886851 ## File path: hudi-client/src/main/java/org/apache/hudi/callback/common/HoodieBaseCommitCallbackMessage.java ## @@ -0,0 +1,67 @@ +/* + * Licensed to

[GitHub] [hudi] bvaradar commented on a change in pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-21 Thread GitBox
bvaradar commented on a change in pull request #1819: URL: https://github.com/apache/hudi/pull/1819#discussion_r457912958 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -337,9 +337,15 @@ private void refreshTimeline()

[GitHub] [hudi] vinothchandar commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-21 Thread GitBox
vinothchandar commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r457859023 ## File path: hudi-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDPartitionRangePartitioner.java ## @@ -0,0 +1,61 @@ +/* + * Licensed

[hudi] branch hudi_test_suite_refactor updated (8980e09 -> 247d923)

2020-07-21 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. discard 8980e09 [HUDI-394] Provide a basic implementation of test suite add 247d923 [HUDI-394]

[GitHub] [hudi] garyli1019 closed pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-21 Thread GitBox
garyli1019 closed pull request #1722: URL: https://github.com/apache/hudi/pull/1722 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] lw309637554 commented on pull request #1756: [HUDI-839] Introducing support for rollbacks using marker files

2020-07-21 Thread GitBox
lw309637554 commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-661690560 > @lw309637554 Looks good. Planning to merge after CI passes this time.. Thanks a lot of for your contributions. This is one very important PR ! Great, a very good

[GitHub] [hudi] bvaradar commented on issue #1854: query MOR table using spark sql error

2020-07-21 Thread GitBox
bvaradar commented on issue #1854: URL: https://github.com/apache/hudi/issues/1854#issuecomment-661695358 Is the table (._acidtest2) registered as Hive table. If so, can you provide the complete table description of the table (desc formatted ) in Hive metastore.

[GitHub] [hudi] vinothchandar commented on pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-07-21 Thread GitBox
vinothchandar commented on pull request #1768: URL: https://github.com/apache/hudi/pull/1768#issuecomment-661679756 @umehrot2 just landed the changes I mentioned. can we rework this PR and try again . We can make things parallel i.e working for s3 for now. and then we can adjust for HDFS

[GitHub] [hudi] vinothchandar commented on pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

2020-07-21 Thread GitBox
vinothchandar commented on pull request #1792: URL: https://github.com/apache/hudi/pull/1792#issuecomment-661682772 @nsivabalan this does seem good to me. Can we add a Unit test specifically for `OverwriteWithLatestAvroPayload` which just tests these scenarios at the single class elvel?

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-661692328 ``` And looking at the thread dump of the executors they are almost always spending their time listing files. ``` This looks surprising to me. file listing for finding

[GitHub] [hudi] sbernauer commented on issue #1845: [SUPPORT] Support for Schema evolution. Facing an error

2020-07-21 Thread GitBox
sbernauer commented on issue #1845: URL: https://github.com/apache/hudi/issues/1845#issuecomment-661711698 @bvaradar, yes i am appending the field to end of the schema (as reproduced in the test). The definition of the event is outside my scope, we just consume this events ;)

[GitHub] [hudi] vinothchandar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-21 Thread GitBox
vinothchandar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-661660112 @garyli1019 should we close this in favor of #1848 ? This is an automated message from the Apache Git

[GitHub] [hudi] bvaradar commented on issue #1845: [SUPPORT] Support for Schema evolution. Facing an error

2020-07-21 Thread GitBox
bvaradar commented on issue #1845: URL: https://github.com/apache/hudi/issues/1845#issuecomment-661709618 @sbernauer : Are you appending this field to the end of the schema ? Otherwise looks ok. Although honestly, I have not seen the usage of "avro.java.string": "String" before.

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-21 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r458060234 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -632,6 +632,21 @@ public FileSystemViewStorageConfig

[jira] [Commented] (HUDI-1116) Support time travel using timestamp type

2020-07-21 Thread linshan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161997#comment-17161997 ] linshan commented on HUDI-1116: --- hi,[~vbalaji]       would you describe the problem in detail? I want to

[GitHub] [hudi] leesf commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

2020-07-21 Thread GitBox
leesf commented on pull request #1855: URL: https://github.com/apache/hudi/pull/1855#issuecomment-661790185 close to retrigger This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] leesf closed pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

2020-07-21 Thread GitBox
leesf closed pull request #1855: URL: https://github.com/apache/hudi/pull/1855 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-1116) Support time travel using timestamp type

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1116: - Status: Open (was: New) > Support time travel using timestamp type >

[jira] [Created] (HUDI-1116) Support time travel using timestamp type

2020-07-21 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1116: Summary: Support time travel using timestamp type Key: HUDI-1116 URL: https://issues.apache.org/jira/browse/HUDI-1116 Project: Apache Hudi Issue

[jira] [Commented] (HUDI-1116) Support time travel using timestamp type

2020-07-21 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161883#comment-17161883 ] Balaji Varadarajan commented on HUDI-1116: -- One option is to provide a mapping utility which can

[jira] [Commented] (HUDI-871) Add support for Tencent cloud COS

2020-07-21 Thread deyzhong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161900#comment-17161900 ] deyzhong commented on HUDI-871: --- I have submit a pr([https://github.com/apache/hudi/pull/1855]), please help

  1   2   >