[jira] [Issue Comment Deleted] (HUDI-1057) optional int32 is not a group

2020-06-26 Thread Selvaraj Periyasamy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Selvaraj Periyasamy updated HUDI-1057: -- Comment: was deleted (was: I have some of the old records inserted with null for those

[jira] [Commented] (HUDI-1057) optional int32 is not a group

2020-06-26 Thread Selvaraj Periyasamy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146781#comment-17146781 ] Selvaraj Periyasamy commented on HUDI-1057: --- I have some of the old records inserted with null

[jira] [Created] (HUDI-1057) optional int32 is not a group

2020-06-26 Thread Selvaraj Periyasamy (Jira)
Selvaraj Periyasamy created HUDI-1057: - Summary: optional int32 is not a group Key: HUDI-1057 URL: https://issues.apache.org/jira/browse/HUDI-1057 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] RajasekarSribalan commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
RajasekarSribalan commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650487308 @vinothchandar Thanks for the quick response. Much appreciated. Version details : Hudi : 0.5.2 Hive version : hive-1.1.0+cdh5.12.2+1218 I get these

[GitHub] [hudi] umehrot2 commented on pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-06-26 Thread GitBox
umehrot2 commented on pull request #1768: URL: https://github.com/apache/hudi/pull/1768#issuecomment-650479862 @bvaradar fyi This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] umehrot2 commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-26 Thread GitBox
umehrot2 commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-650478776 Actually this is not just a problem with `Throttling`. AWS S3 can throw intermittent `Throttling` and well as `Internal Errors` which can potentially succeed upon retrying. I

[GitHub] [hudi] vinothchandar commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
vinothchandar commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650472589 Are you getting this error consistently This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] vinothchandar commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
vinothchandar commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650472516 @RajasekarSribalan as you may have guessed the issue seems like the right input format not getting invoked. Hudi input formats filler for the latest parquet files after each

[GitHub] [hudi] vinothchandar commented on a change in pull request #1732: [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics

2020-06-26 Thread GitBox
vinothchandar commented on a change in pull request #1732: URL: https://github.com/apache/hudi/pull/1732#discussion_r446469349 ## File path: hudi-client/src/main/java/org/apache/hudi/metrics/HudiGauge.java ## @@ -25,22 +25,21 @@ * Similar to {@link Gauge}, but metric value

[jira] [Updated] (HUDI-1054) Address performance issues with finalizing writes on S3

2020-06-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1054: - Labels: pull-request-available (was: ) > Address performance issues with finalizing writes on S3

[GitHub] [hudi] umehrot2 opened a new pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-06-26 Thread GitBox
umehrot2 opened a new pull request #1768: URL: https://github.com/apache/hudi/pull/1768 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[jira] [Commented] (HUDI-1056) Ensure validate_staged_release.sh also runs against released version in release repo

2020-06-26 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146672#comment-17146672 ] sivabalan narayanan commented on HUDI-1056: --- sure. > Ensure validate_staged_release.sh also

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446389709 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446390993 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446389709 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446384084 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446384084 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446381750 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] vinothchandar commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-26 Thread GitBox
vinothchandar commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-650366410 >I am just thinking if we really need to wait for all files to appear here, or even if we need to wait, if at the end of the wait period the file is not present it should be safe

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446372801 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446374024 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446373497 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446372801 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446365843 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] umehrot2 commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-26 Thread GitBox
umehrot2 commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-650344638 @vinothchandar Agreed, I realized this soon after so didn't proceed with this approach. I am just thinking if we really need to wait for all files to appear here, or even if we need

[GitHub] [hudi] tooptoop4 commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-06-26 Thread GitBox
tooptoop4 commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-650338788 @bvaradar pls reopen This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] vinothchandar commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-26 Thread GitBox
vinothchandar commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-650243029 @umehrot2 if we reverse the order, then it might violate guarantee that if there was a file created in storage, then there is a marker file involved... the task can open a

[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-06-26 Thread GitBox
nandurj commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-650238181 We are still facing this issue after setting the hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator

[GitHub] [hudi] nsivabalan opened a new pull request #1767: [MINOR] Adding test to WriteClient to validate update partition path with global bloom

2020-06-26 Thread GitBox
nsivabalan opened a new pull request #1767: URL: https://github.com/apache/hudi/pull/1767 ## What is the purpose of the pull request Adding a test to validate update partition path ## Brief change log ## Verify this pull request this patch is just adding tests

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446121818 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446120155 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446119811 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand. ##

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446115740 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand. ##

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446115055 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand. ##

[GitHub] [hudi] leesf commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-26 Thread GitBox
leesf commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446113235 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand. ##

[jira] [Assigned] (HUDI-1048) Support synchronize clustering in MoR mode

2020-06-26 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1048: --- Assignee: leesf > Support synchronize clustering in MoR mode > -- >

[jira] [Assigned] (HUDI-1047) Support synchronize clustering in CoW mode

2020-06-26 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1047: --- Assignee: leesf > Support synchronize clustering in CoW mode > -- >

[GitHub] [hudi] lw309637554 edited a comment on pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-26 Thread GitBox
lw309637554 edited a comment on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-650096565 > Took a quick pass at the three test classes you have added.. LGTM . > Will do a detailed pass once you confirm PR is indeed ready.. @vinothchandar hello,i

[GitHub] [hudi] lw309637554 commented on pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-26 Thread GitBox
lw309637554 commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-650096565 > Took a quick pass at the three test classes you have added.. LGTM . > Will do a detailed pass once you confirm PR is indeed ready.. @vinothchandar hello,i have add

[GitHub] [hudi] RajasekarSribalan edited a comment on issue #1766: Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
RajasekarSribalan edited a comment on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-649994437 @vinothchandar @bvaradar could you please help! And I am getting below error when querying from hive beeline. 1) In below scenario set

[GitHub] [hudi] RajasekarSribalan edited a comment on issue #1766: Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
RajasekarSribalan edited a comment on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-649994437 @vinothchandar @bvaradar could you please help! And I am getting below error when querying from hive beeline. set

[jira] [Commented] (HUDI-983) Add Metrics section to asf-site

2020-06-26 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146094#comment-17146094 ] Hong Shen commented on HUDI-983: Sounds good. I will move deployment#metrics to the new metrics section. >

[GitHub] [hudi] bvaradar merged pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-26 Thread GitBox
bvaradar merged pull request #1687: URL: https://github.com/apache/hudi/pull/1687 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-26 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r445995261 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelper.java ## @@ -71,8 +71,9 @@ public

[GitHub] [hudi] RajasekarSribalan commented on issue #1766: Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-26 Thread GitBox
RajasekarSribalan commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-649994437 @vinothchandar @bvaradar could you please help! This is an automated message from the Apache Git Service. To

[hudi] annotated tag 0.5.3 updated (c51ac65 -> d1e00ca)

2020-06-26 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to annotated tag 0.5.3 in repository https://gitbox.apache.org/repos/asf/hudi.git. *** WARNING: tag 0.5.3 was modified! *** from c51ac65 (commit) to d1e00ca (tag) tagging