date:20200324

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1442: [HUDI-734] Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread GitBox

lamber-ken edited a comment on issue #1442: [HUDI-734] Fix error: cannot create 
directory ‘test-content’: File exists
URL: https://github.com/apache/incubator-hudi/pull/1442#issuecomment-603673964
 
 
   hi @leesf, this bug causes travis didn't push build content to 
`test-content`. https://github.com/apache/incubator-hudi/pull/1441


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on issue #1442: [HUDI-734] Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread GitBox

lamber-ken commented on issue #1442: [HUDI-734] Fix error: cannot create 
directory ‘test-content’: File exists
URL: https://github.com/apache/incubator-hudi/pull/1442#issuecomment-603673964
 
 
   hi @leesf, this bug causes travis didn't push build content to 
`test-content`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (HUDI-734) Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-734:

Labels: pull-request-available  (was: )

> Fix error: cannot create directory ‘test-content’: File exists
> --
>
> Key: HUDI-734
> URL: https://issues.apache.org/jira/browse/HUDI-734
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> Fix error: cannot create directory ‘test-content’: File exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1442: [HUDI-734] Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread GitBox

lamber-ken opened a new pull request #1442: [HUDI-734] Fix error: cannot create 
directory ‘test-content’: File exists
URL: https://github.com/apache/incubator-hudi/pull/1442
 
 
   ## What is the purpose of the pull request
   
   The `mkdir: cannot create directory ‘test-content’: File exists` error 
affact `\cp -rf ...` command.
   
   https://travis-ci.org/github/apache/incubator-hudi/builds/47086
   
   ## Brief change log
   
- Add `-p` option
   
   ## Verify this pull request
   
   This pull request is a trivial rework without any test coverage.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Assigned] (HUDI-734) Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-734:
---

Assignee: lamber-ken

> Fix error: cannot create directory ‘test-content’: File exists
> --
>
> Key: HUDI-734
> URL: https://issues.apache.org/jira/browse/HUDI-734
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> Fix error: cannot create directory ‘test-content’: File exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-734) Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread lamber-ken (Jira)

lamber-ken created HUDI-734:
---

 Summary: Fix error: cannot create directory ‘test-content’: File 
exists
 Key: HUDI-734
 URL: https://issues.apache.org/jira/browse/HUDI-734
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Docs
Reporter: lamber-ken


Fix error: cannot create directory ‘test-content’: File exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-734) Fix error: cannot create directory ‘test-content’: File exists

2020-03-24 Thread lamber-ken (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-734:

Status: Open  (was: New)

> Fix error: cannot create directory ‘test-content’: File exists
> --
>
> Key: HUDI-734
> URL: https://issues.apache.org/jira/browse/HUDI-734
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>
> Fix error: cannot create directory ‘test-content’: File exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-733) presto query data error

2020-03-24 Thread jing (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066417#comment-17066417
 ] 

jing commented on HUDI-733:
---

[~bhasudha]   help me 

> presto query data error
> ---
>
> Key: HUDI-733
> URL: https://issues.apache.org/jira/browse/HUDI-733
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Presto Integration
>Affects Versions: 0.5.1
>Reporter: jing
>Priority: Major
> Attachments: hive_table.png, parquet_context.png, parquet_schema.png, 
> presto_query_data.png
>
>
> We found a data sequence issue in Hudi when we use API to import data(use 
> spark.read.json("filename") read to dataframe then write  to hudi). The 
> original d is rowkey:1 dt:2 time:3.
> But the value is unexpected when query the data by Presto(rowkey:2 dt:1 
> time:2), but correctly in Hive.
> After analysis, if I use dt to partition the column data, it is also written 
> in the parquet file. dt = xxx, and the value of the partition column should 
> be the value in the path of the hudi. However, I found that the value of the 
> presto query must be one-to-one with the columns in the parquet. He will not 
> detect the column names.
> Transformation methods and suggestions:
>  # Can the inputformat class be ignored to read the column value of the 
> partition column dt in parquet?
>  # Can hive data be synchronized without dt as a partition column? Consider 
> adding a column such as repl_dt as a partition column and dt as an ordinary 
> field.
>  # The dt column is not written to the parquet file.
>      4, dt is written to the parquet file, but as the last column.
>  
> @Sudha



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-733) presto query data error

2020-03-24 Thread jing (Jira)

jing created HUDI-733:
-

 Summary: presto query data error
 Key: HUDI-733
 URL: https://issues.apache.org/jira/browse/HUDI-733
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Presto Integration
Affects Versions: 0.5.1
Reporter: jing
 Attachments: hive_table.png, parquet_context.png, parquet_schema.png, 
presto_query_data.png

We found a data sequence issue in Hudi when we use API to import data(use 
spark.read.json("filename") read to dataframe then write  to hudi). The 
original d is rowkey:1 dt:2 time:3.

But the value is unexpected when query the data by Presto(rowkey:2 dt:1 
time:2), but correctly in Hive.

After analysis, if I use dt to partition the column data, it is also written in 
the parquet file. dt = xxx, and the value of the partition column should be the 
value in the path of the hudi. However, I found that the value of the presto 
query must be one-to-one with the columns in the parquet. He will not detect 
the column names.

Transformation methods and suggestions:
 # Can the inputformat class be ignored to read the column value of the 
partition column dt in parquet?
 # Can hive data be synchronized without dt as a partition column? Consider 
adding a column such as repl_dt as a partition column and dt as an ordinary 
field.
 # The dt column is not written to the parquet file.

     4, dt is written to the parquet file, but as the last column.

 

@Sudha



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[incubator-hudi] branch asf-site updated: [MINOR] Add Hudi Online Meetup (#1441)

2020-03-24 Thread vinoyang

This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 691c522  [MINOR] Add Hudi Online Meetup (#1441)
691c522 is described below

commit 691c52212a109a1e564363d3b658008fad062f8e
Author: leesf <490081...@qq.com>
AuthorDate: Wed Mar 25 13:36:26 2020 +0800

[MINOR] Add Hudi Online Meetup (#1441)

* [MINOR] Add Hudi Online Meetup
---
 docs/_docs/1_4_powered_by.cn.md | 11 ++-
 docs/_docs/1_4_powered_by.md|  4 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/docs/_docs/1_4_powered_by.cn.md b/docs/_docs/1_4_powered_by.cn.md
index 2bb41f5..82f9c02 100644
--- a/docs/_docs/1_4_powered_by.cn.md
+++ b/docs/_docs/1_4_powered_by.cn.md
@@ -48,10 +48,19 @@ Hudi在Yotpo有不少用途。首先，在他们的[开源ETL框架](https://git
 
 7. ["Building highly efficient data lakes using Apache Hudi 
(Incubating)"](https://www.slideshare.net/ChesterChen/sf-big-analytics-20190612-building-highly-efficient-data-lakes-using-apache-hudi)
 - By Vinoth Chandar 
June 2019, SF Big Analytics Meetup, San Mateo, CA
-   
+
 8. ["Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data 
Lake 
Architectures"](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM)
 - By Vinoth Chandar & Balaji Varadarajan
September 2019, ApacheCon NA 19, Las Vegas, NV, USA
 
+9. ["Insert, upsert, and delete data in Amazon S3 using Amazon 
EMR"](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662&csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P)
 - By Paul Codding & Vinoth Chandar
+   December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
+
+10. ["Building Robust CDC Pipeline With Apache Hudi And 
Debezium"](https://www.slideshare.net/SyedKather/building-robust-cdc-pipeline-with-apache-hudi-and-debezium)
 - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit 
Bangalore, India
+
+11. ["Using Apache Hudi to build the next-generation data lake and its 
application in medical big 
data"](https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn) - By 
JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
+12. ["Building a near real-time, high-performance data warehouse based on 
Apache Hudi and Apache 
Kylin"](https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e) - 
By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
 ## 文章
 
 1. ["The Case for incremental processing on 
Hadoop"](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop)
 - O'reilly Ideas article by Vinoth Chandar
diff --git a/docs/_docs/1_4_powered_by.md b/docs/_docs/1_4_powered_by.md
index 761876e..229150e 100644
--- a/docs/_docs/1_4_powered_by.md
+++ b/docs/_docs/1_4_powered_by.md
@@ -63,6 +63,10 @@ Using Hudi at Yotpo for several usages. Firstly, integrated 
Hudi as a writer in

 10. ["Building Robust CDC Pipeline With Apache Hudi And 
Debezium"](https://www.slideshare.net/SyedKather/building-robust-cdc-pipeline-with-apache-hudi-and-debezium)
 - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit 
Bangalore, India
 
+11. ["Using Apache Hudi to build the next-generation data lake and its 
application in medical big 
data"](https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn) - By 
JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
+12. ["Building a near real-time, high-performance data warehouse based on 
Apache Hudi and Apache 
Kylin"](https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e) - 
By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
 ## Articles
 
 1. ["The Case for incremental processing on 
Hadoop"](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop)
 - O'reilly Ideas article by Vinoth Chandar

[GitHub] [incubator-hudi] yanghua merged pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

yanghua merged pull request #1441: [MINOR] Add Hudi Online Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online 
Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441#discussion_r397618497
 
 

 ##
 File path: docs/_docs/1_4_powered_by.cn.md
 ##
 @@ -48,10 +48,20 @@ Hudi在Yotpo有不少用途。首先，在他们的[开源ETL框架](https://git
 
 7. ["Building highly efficient data lakes using Apache Hudi 
(Incubating)"](https://www.slideshare.net/ChesterChen/sf-big-analytics-20190612-building-highly-efficient-data-lakes-using-apache-hudi)
 - By Vinoth Chandar 
June 2019, SF Big Analytics Meetup, San Mateo, CA
-   
+
 8. ["Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data 
Lake 
Architectures"](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM)
 - By Vinoth Chandar & Balaji Varadarajan
September 2019, ApacheCon NA 19, Las Vegas, NV, USA
 
+9. ["Insert, upsert, and delete data in Amazon S3 using Amazon 
EMR"](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662&csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P)
 - By Paul Codding & Vinoth Chandar
+   December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
 
 Review comment:
   OK~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1441: [MINOR] Add Hudi Online 
Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441#discussion_r397607372
 
 

 ##
 File path: docs/_docs/1_4_powered_by.cn.md
 ##
 @@ -48,10 +48,20 @@ Hudi在Yotpo有不少用途。首先，在他们的[开源ETL框架](https://git
 
 7. ["Building highly efficient data lakes using Apache Hudi 
(Incubating)"](https://www.slideshare.net/ChesterChen/sf-big-analytics-20190612-building-highly-efficient-data-lakes-using-apache-hudi)
 - By Vinoth Chandar 
June 2019, SF Big Analytics Meetup, San Mateo, CA
-   
+
 8. ["Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data 
Lake 
Architectures"](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM)
 - By Vinoth Chandar & Balaji Varadarajan
September 2019, ApacheCon NA 19, Las Vegas, NV, USA
 
+9. ["Insert, upsert, and delete data in Amazon S3 using Amazon 
EMR"](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662&csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P)
 - By Paul Codding & Vinoth Chandar
+   December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
 
 Review comment:
   > Why here need to break a new line? Can we merge into the previous line?
   
   Just copied from english version.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

codecov-io edited a comment on issue #1436: [HUDI-711] Refactor exporter main 
logic
URL: https://github.com/apache/incubator-hudi/pull/1436#issuecomment-602121990
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=h1) 
Report
   > Merging 
[#1436](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/c5030f77a0e63f609ed2c674bea00201b97d8bb6&el=desc)
 will **increase** coverage by `0.03%`.
   > The diff coverage is `87.93%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1436/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1436  +/-   ##
   
   + Coverage 67.60%   67.63%   +0.03% 
   - Complexity  255  261   +6 
   
 Files   340  341   +1 
 Lines 1651416511   -3 
 Branches   1689 1688   -1 
   
   + Hits  1116411167   +3 
   + Misses 4612 4607   -5 
   + Partials738  737   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `87.96% <87.50%> (+4.46%)` | `28.00 <13.00> (+6.00)` | |
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `100.00% <100.00%> (ø)` | `1.00 <1.00> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `40.00% <0.00%> (-60.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `56.75% <0.00%> (-13.52%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...hudi/utilities/sources/helpers/KafkaOffsetGen.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9LYWZrYU9mZnNldEdlbi5qYXZh)
 | `85.54% <0.00%> (-1.21%)` | `12.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/utilities/sources/Source.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU291cmNlLmphdmE=)
 | `87.50% <0.00%> (-0.74%)` | `5.00% <0.00%> (-1.00%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `72.27% <0.00%> (ø)` | `38.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `76.92% <0.00%> (+1.92%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/client/AbstractHoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1436/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0Fic3RyYWN0SG9vZGllV3JpdGVDbGllbnQuamF2YQ==)
 | `74.44% <0.00%> (+3.54%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=footer).
 Last update 
[c5030f7...6bb0346](https://codecov.io/gh/apache/incubator-hudi/pull/1436?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online 
Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441#discussion_r397600716
 
 

 ##
 File path: docs/_docs/1_4_powered_by.cn.md
 ##
 @@ -48,10 +48,20 @@ Hudi在Yotpo有不少用途。首先，在他们的[开源ETL框架](https://git
 
 7. ["Building highly efficient data lakes using Apache Hudi 
(Incubating)"](https://www.slideshare.net/ChesterChen/sf-big-analytics-20190612-building-highly-efficient-data-lakes-using-apache-hudi)
 - By Vinoth Chandar 
June 2019, SF Big Analytics Meetup, San Mateo, CA
-   
+
 8. ["Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data 
Lake 
Architectures"](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM)
 - By Vinoth Chandar & Balaji Varadarajan
September 2019, ApacheCon NA 19, Las Vegas, NV, USA
 
+9. ["Insert, upsert, and delete data in Amazon S3 using Amazon 
EMR"](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662&csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P)
 - By Paul Codding & Vinoth Chandar
+   December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
 
 Review comment:
   Why here need to break a new line? Can we merge into the previous line?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

yanghua commented on a change in pull request #1441: [MINOR] Add Hudi Online 
Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441#discussion_r397600382
 
 

 ##
 File path: docs/_docs/1_4_powered_by.cn.md
 ##
 @@ -48,10 +48,20 @@ Hudi在Yotpo有不少用途。首先，在他们的[开源ETL框架](https://git
 
 7. ["Building highly efficient data lakes using Apache Hudi 
(Incubating)"](https://www.slideshare.net/ChesterChen/sf-big-analytics-20190612-building-highly-efficient-data-lakes-using-apache-hudi)
 - By Vinoth Chandar 
June 2019, SF Big Analytics Meetup, San Mateo, CA
-   
+
 8. ["Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data 
Lake 
Architectures"](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM)
 - By Vinoth Chandar & Balaji Varadarajan
September 2019, ApacheCon NA 19, Las Vegas, NV, USA
 
+9. ["Insert, upsert, and delete data in Amazon S3 using Amazon 
EMR"](https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662&csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P)
 - By Paul Codding & Vinoth Chandar
+   December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
+
+10. ["Building Robust CDC Pipeline With Apache Hudi And 
Debezium"](https://www.slideshare.net/SyedKather/building-robust-cdc-pipeline-with-apache-hudi-and-debezium)
 - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit 
Bangalore, India
+
+11. ["Using Apache Hudi to build the next-generation data lake and its 
application in medical big 
data"](https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn) - By 
JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
+12. ["Building a near real-time, high-performance data warehouse based on 
Apache Hudi and Apache 
Kylin"](https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e) - 
By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China
+
+
 
 Review comment:
   Is here a duplicated empty line?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf opened a new pull request #1441: [MINOR] Add Hudi Online Meetup

2020-03-24 Thread GitBox

leesf opened a new pull request #1441: [MINOR] Add Hudi Online Meetup
URL: https://github.com/apache/incubator-hudi/pull/1441
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Add Hudi online meeup talk to website
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan commented on issue #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

xushiyan commented on issue #1436: [HUDI-711] Refactor exporter main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#issuecomment-603626530
 
 
   @leesf Please kindly review the last commit for the commented testcases. 
Note that I omitted the `assertFalse(dfs.exists(new Path(targetPath)));` 
statement; figure it is ok as we obviously don't expect that when exception was 
thrown.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor 
exporter main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397592137
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
 
 Review comment:
   let me change these cases to using `ExpectedException`; something to do with
   
   ```java
   @Rule
   public ExpectedException exceptionRule = ExpectedException.none();
   ```
   
   I'll do a small commit and please see if it looks better. I kinda miss junit 
5 😂 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #227

2020-03-24 Thread Apache Jenkins Server

See 


Changes:


--
[...truncated 2.32 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor 
exporter main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397592137
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
 
 Review comment:
   actually it reminds me to use annotation of expected exception 
`@Test(expected = NullPointerException.class)`. Though i couldn't 
`assertFalse(dfs.exists(new Path(targetPath)))` in that way, guess it is okay; 
when exception is thrown, obviously we don't expect target path being created. 
   So can I just change these testcases to the annotation way?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

xushiyan commented on a change in pull request #1436: [HUDI-711] Refactor 
exporter main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397590026
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
 
 Review comment:
   @leesf The reason of checking the assertion in `finally` is to cover the 
case when exporter does not throw exception, which should be deemed as failed 
test. Putting in catch block may still pass in that case. 😄 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter 
main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397585126
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
+  }
+}
+
+@Test
+public void testExportDatasetWithNoCommit() throws IOException {
+  // delete commit files
+  List commitFiles = Arrays.stream(dfs.listStatus(new 
Path(sourcePath + "/.hoodie")))
+  .map(FileStatus::getPath)
+  .filter(filePath -> filePath.getName().endsWith(".commit"))
+  .collect(Collectors.toList());
+  for (Path p : commitFiles) {
+dfs.delete(p, false);
+  }
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("No commits present. Nothing to snapshot.", 
t.getMessage());
+  }
+
+  // Check results
+  assertFalse(dfs.exists(new Path(targetPath)));
+}
+
+@Test
+public void testExportDatasetWithNoPartition() throws IOException {
   // delete all source data
   dfs.delete(new Path(sourcePath + "/" + PARTITION_PATH), true);
 
   // export
-  new 
HoodieSnapshotExporter().export(SparkSession.builder().config(jsc.getConf()).getOrCreate(),
 cfg);
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The source dataset has 0 partition to snapshot.", 
t.getMessage());
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter 
main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397585042
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
+  }
+}
+
+@Test
+public void testExportDatasetWithNoCommit() throws IOException {
+  // delete commit files
+  List commitFiles = Arrays.stream(dfs.listStatus(new 
Path(sourcePath + "/.hoodie")))
+  .map(FileStatus::getPath)
+  .filter(filePath -> filePath.getName().endsWith(".commit"))
+  .collect(Collectors.toList());
+  for (Path p : commitFiles) {
+dfs.delete(p, false);
+  }
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("No commits present. Nothing to snapshot.", 
t.getMessage());
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter 
main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397584926
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
 
 Review comment:
   would move to catch block?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter 
main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397584926
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
+  } catch (Exception e) {
+t = e;
+  } finally {
+assertNotNull(t);
+assertTrue(t instanceof HoodieSnapshotExporterException);
+assertEquals("The target output path already exists.", t.getMessage());
 
 Review comment:
   would move to catch ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter main logic

2020-03-24 Thread GitBox

leesf commented on a change in pull request #1436: [HUDI-711] Refactor exporter 
main logic
URL: https://github.com/apache/incubator-hudi/pull/1436#discussion_r397584857
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -159,18 +161,85 @@ public void testExportAsHudi() throws IOException {
   assertTrue(dfs.exists(new Path(partition + 
"/.hoodie_partition_metadata")));
   assertTrue(dfs.exists(new Path(targetPath + "/_SUCCESS")));
 }
+  }
+
+  public static class TestHoodieSnapshotExporterForEarlyAbort extends 
ExporterTestHarness {
+
+private HoodieSnapshotExporter.Config cfg;
+
+@Before
+public void setUp() throws Exception {
+  super.setUp();
+  cfg = new Config();
+  cfg.sourceBasePath = sourcePath;
+  cfg.targetOutputPath = targetPath;
+  cfg.outputFormat = OutputFormatValidator.HUDI;
+}
 
 @Test
-public void testExportEmptyDataset() throws IOException {
+public void testExportWhenTargetPathExists() throws IOException {
+  // make target output path present
+  dfs.mkdirs(new Path(targetPath));
+
+  // export
+  Throwable t = null;
+  try {
+new HoodieSnapshotExporter().export(jsc, cfg);
 
 Review comment:
   use fail assert?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-24 Thread GitBox

vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark 
array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-603506326
 
 
   yes I am also wondering if we can do something about incompatibility.. that 
a large issue.. we don't want our users to go through this.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-24 Thread GitBox

bvaradar commented on issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with 
large partitions takes a lot of time in writing
URL: https://github.com/apache/incubator-hudi/issues/1371#issuecomment-603492434
 
 
   https://github.com/apache/incubator-hudi/pull/1394 merged. Resolving this 
ticket.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with large partitions takes a lot of time in writing

2020-03-24 Thread GitBox

bvaradar closed issue #1371: [SUPPORT] Upsert for S3 Hudi dataset with large 
partitions takes a lot of time in writing
URL: https://github.com/apache/incubator-hudi/issues/1371
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1284: [SUPPORT]

2020-03-24 Thread GitBox

bvaradar commented on issue #1284: [SUPPORT]
URL: https://github.com/apache/incubator-hudi/issues/1284#issuecomment-603491329
 
 
   closing this ticket due to inactivity


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar closed issue #1284: [SUPPORT]

2020-03-24 Thread GitBox

bvaradar closed issue #1284: [SUPPORT]
URL: https://github.com/apache/incubator-hudi/issues/1284
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nsivabalan commented on issue #1438: How to get the file name corresponding to HoodieKey through the GlobalBloomIndex

2020-03-24 Thread GitBox

nsivabalan commented on issue #1438: How to get the file name corresponding to 
HoodieKey through the GlobalBloomIndex 
URL: https://github.com/apache/incubator-hudi/issues/1438#issuecomment-603483852
 
 
   Sure. will take a look. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] afilipchik commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-24 Thread GitBox

afilipchik commented on issue #1406: [HUDI-713] Fix conversion of Spark array 
of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-603459354
 
 
   is there a way to avoid incompatibility? We have couple of streams with 
complex schemas written with 0.5 and it would block the upgrade.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] xushiyan commented on issue #1440: [HUDI-731] Add ChainedTransformer

2020-03-24 Thread GitBox

xushiyan commented on issue #1440: [HUDI-731] Add ChainedTransformer
URL: https://github.com/apache/incubator-hudi/pull/1440#issuecomment-603417382
 
 
   @vinothchandar @yanghua Sure thanks for the input. Since the parsing 
approach saves some efforts for users, I can make it that way. May I have your 
👍 on this comment to get the green light to work on it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-24 Thread GitBox

zhedoubushishi commented on issue #1406: [HUDI-713] Fix conversion of Spark 
array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-603400047
 
 
   > > > So anyone who has written data using databricks-avro will face issues 
reading.
   > 
   > By this you mean, reading for merging data (i.e during ingestion/writing) 
or querying via Spark/Hive/Presto?
   
   As @umehrot2 said, we have done some testing shows that inserting new data 
with ```spark-avro``` on top of ```databricks-avro``` data would cause some 
schema issues.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-24 Thread GitBox

zhedoubushishi commented on a change in pull request #1406: [HUDI-713] Fix 
conversion of Spark array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#discussion_r397341690
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala
 ##
 @@ -338,12 +337,13 @@ object AvroConversionHelper {
 }
   case structType: StructType =>
 val schema: Schema = SchemaConverters.toAvroType(structType, nullable 
= false, structName, recordNamespace)
+val childNameSpace = if (recordNamespace != "") 
s"$recordNamespace.$structName" else structName
 
 Review comment:
   This line is directly copied from: 
https://github.com/apache/spark/blob/master/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L175
   Do you still want me to change it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1439: Hudi class loading problem

2020-03-24 Thread GitBox

bvaradar commented on issue #1439: Hudi class loading problem
URL: https://github.com/apache/incubator-hudi/issues/1439#issuecomment-603394312
 
 
   @melkimohamed :  From your description,
   
   hive.reloadable.aux.jars.path=/usr/hudi/hudi-hive-bundle-0.5.0-incubating.jar
   
   
   It looks like you are using 0.5.0. The bundle you need to be using is 
hudi-hadoop-mr-bundle and not hudi-hive-bundle.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1420: Broken Maven dependencies.

2020-03-24 Thread GitBox

bvaradar commented on issue #1420: Broken Maven dependencies.
URL: https://github.com/apache/incubator-hudi/issues/1420#issuecomment-603380871
 
 
   @deabreu : Were you able to build locally ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] wangxianghu commented on a change in pull request #1409: [HUDI-714]Add javadoc and comments to hudi write method link

2020-03-24 Thread GitBox

wangxianghu commented on a change in pull request #1409: [HUDI-714]Add javadoc 
and comments to hudi write method link
URL: https://github.com/apache/incubator-hudi/pull/1409#discussion_r397124853
 
 

 ##
 File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
 ##
 @@ -218,6 +218,13 @@ public static HoodieRecord 
createHoodieRecord(GenericRecord gr, Comparable order
 return new HoodieRecord<>(hKey, payload);
   }
 
+  /**
+   * Drop duplicate records, whose location (file group/file id) mapped by 
index exists.
 
 Review comment:
   Hi  @nsivabalan. I have addressed all of your comments, please take another 
look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-24 Thread GitBox

umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default 
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r397037244
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestHoodieAvroUtils.java
 ##
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
 }
 Assert.assertTrue("column pii_col doesn't show up", piiPresent);
   }
+
+  @Test
+  public void testDefaultValue() {
+GenericRecord rec = new GenericData.Record(new 
Schema.Parser().parse(EXAMPLE_SCHEMA));
+rec.put("_row_key", "key1");
+rec.put("non_pii_col", "val1");
+rec.put("pii_col", "val2");
+rec.put("timestamp", 3.5);
 
 Review comment:
   Can you help me understand how you are running into this issue with default 
values ? 
   
   Based on my understanding, conversion to avro is internal to Hudi and a 
custom avro schema (with default values) is not something that user can 
themselves pass. And how `spark-avro` converts `struct schema to avro` there is 
no special handling there from `default value` perspective. So I guess I am not 
sure whether this is an issue in the first place.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1416: [HUDI-717] Fixed usage of HiveDriver for DDL statements for Hive 2.x

2020-03-24 Thread GitBox

umehrot2 commented on a change in pull request #1416: [HUDI-717] Fixed usage of 
HiveDriver for DDL statements for Hive 2.x
URL: https://github.com/apache/incubator-hudi/pull/1416#discussion_r397012179
 
 

 ##
 File path: 
hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
 ##
 @@ -363,4 +404,51 @@ public void testMultiPartitionKeySync() throws Exception {
 assertEquals("The last commit that was sycned should be updated in the 
TBLPROPERTIES", commitTime,
 hiveClient.getLastCommitTimeSynced(hiveSyncConfig.tableName).get());
   }
+
+  @Test
+  public void testSchemeFromMOR() throws Exception {
+TestUtil.hiveSyncConfig.useJdbc = this.useJdbc;
+String commitTime = "100";
+String snapshotTableName = TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE;
+TestUtil.createMORTable(commitTime, "", 5, false);
+HoodieHiveClient hiveClientRT =
+new HoodieHiveClient(TestUtil.hiveSyncConfig, TestUtil.getHiveConf(), 
TestUtil.fileSystem);
+
+assertFalse("Table " + TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE
++ " should not exist initially", 
hiveClientRT.doesTableExist(snapshotTableName));
+
+// Lets do the sync
+HiveSyncTool tool = new HiveSyncTool(TestUtil.hiveSyncConfig, 
TestUtil.getHiveConf(), TestUtil.fileSystem);
+tool.syncHoodieTable();
+
+assertTrue("Table " + TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE
++ " should exist after sync completes", 
hiveClientRT.doesTableExist(snapshotTableName));
+
+// Schema being read from compacted base files
+assertEquals("Hive Schema should match the table schema + partition 
field", hiveClientRT.getTableSchema(snapshotTableName).size(),
+SchemaTestUtil.getSimpleSchema().getFields().size() + 1);
 
 Review comment:
   Is this the only reason to add this test, so that schema is read from base 
file ? Otherwise this seems exactly like `testSyncMergeOnRead` ? I guess I am 
also trying to understand how this test relates to the change you are making in 
this review.
   
   On that note name of the test `testSchemeFromMOR` can probably be changed to 
something more meaningful to reflect what this test is trying to achieve 
differently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1416: [HUDI-717] Fixed usage of HiveDriver for DDL statements for Hive 2.x

2020-03-24 Thread GitBox

umehrot2 commented on a change in pull request #1416: [HUDI-717] Fixed usage of 
HiveDriver for DDL statements for Hive 2.x
URL: https://github.com/apache/incubator-hudi/pull/1416#discussion_r397009308
 
 

 ##
 File path: 
hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
 ##
 @@ -363,4 +404,51 @@ public void testMultiPartitionKeySync() throws Exception {
 assertEquals("The last commit that was sycned should be updated in the 
TBLPROPERTIES", commitTime,
 hiveClient.getLastCommitTimeSynced(hiveSyncConfig.tableName).get());
   }
+
+  @Test
+  public void testSchemeFromMOR() throws Exception {
+TestUtil.hiveSyncConfig.useJdbc = this.useJdbc;
+String commitTime = "100";
+String snapshotTableName = TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE;
+TestUtil.createMORTable(commitTime, "", 5, false);
+HoodieHiveClient hiveClientRT =
+new HoodieHiveClient(TestUtil.hiveSyncConfig, TestUtil.getHiveConf(), 
TestUtil.fileSystem);
+
+assertFalse("Table " + TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE
++ " should not exist initially", 
hiveClientRT.doesTableExist(snapshotTableName));
+
+// Lets do the sync
+HiveSyncTool tool = new HiveSyncTool(TestUtil.hiveSyncConfig, 
TestUtil.getHiveConf(), TestUtil.fileSystem);
+tool.syncHoodieTable();
+
+assertTrue("Table " + TestUtil.hiveSyncConfig.tableName + 
HiveSyncTool.SUFFIX_SNAPSHOT_TABLE
++ " should exist after sync completes", 
hiveClientRT.doesTableExist(snapshotTableName));
+
+// Schema being read from compacted base files
+assertEquals("Hive Schema should match the table schema + partition 
field", hiveClientRT.getTableSchema(snapshotTableName).size(),
+SchemaTestUtil.getSimpleSchema().getFields().size() + 1);
+assertEquals("Table partitions should match the number of partitions we 
wrote", 5,
+hiveClientRT.scanTablePartitions(snapshotTableName).size());
+
+// Now lets create more partitions and these are the only ones which needs 
to be synced
+DateTime dateTime = DateTime.now().plusDays(6);
+String commitTime2 = "102";
+String deltaCommitTime2 = "103";
+
+TestUtil.addCOWPartitions(1, true, dateTime, commitTime2);
+TestUtil.addMORPartitions(1, true, false, dateTime, commitTime2, 
deltaCommitTime2);
+// Lets do the sync
+tool = new HiveSyncTool(TestUtil.hiveSyncConfig, TestUtil.getHiveConf(), 
TestUtil.fileSystem);
+tool.syncHoodieTable();
+hiveClientRT = new HoodieHiveClient(TestUtil.hiveSyncConfig, 
TestUtil.getHiveConf(), TestUtil.fileSystem);
+
+// Schema being read from the log files
+assertEquals("Hive Schema should match the evolved table schema + 
partition field",
+hiveClientRT.getTableSchema(snapshotTableName).size(), 
SchemaTestUtil.getEvolvedSchema().getFields().size() + 1);
+// Sync should add the one partition
+assertEquals("The 2 partitions we wrote should be added to hive", 6, 
hiveClientRT.scanTablePartitions(snapshotTableName).size());
 
 Review comment:
   Is this message correct ? We only added `1 partition` right and thats why 
count went from `5 => 6` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-24 Thread GitBox

pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy 
default values of fields if not present when rewriting incoming record with new 
schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r396992378
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestHoodieAvroUtils.java
 ##
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
 }
 Assert.assertTrue("column pii_col doesn't show up", piiPresent);
   }
+
+  @Test
+  public void testDefaultValue() {
+GenericRecord rec = new GenericData.Record(new 
Schema.Parser().parse(EXAMPLE_SCHEMA));
+rec.put("_row_key", "key1");
+rec.put("non_pii_col", "val1");
+rec.put("pii_col", "val2");
+rec.put("timestamp", 3.5);
 
 Review comment:
   No its not that the original record has default values as null. Its just 
while getting the values from the record, default values are not considered. 
Please have a look at this function from avro-1.8.2 library - 
   
   @Override public Object get(String key) {
 Field field = schema.getField(key);
 if (field == null) return null;
 return values[field.pos()];
   }
   
   Ideally the above function should return field.defaultVal() in case 
values[field.pos()] is null, but that is not the case. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-24 Thread GitBox

pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy 
default values of fields if not present when rewriting incoming record with new 
schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r396992378
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestHoodieAvroUtils.java
 ##
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
 }
 Assert.assertTrue("column pii_col doesn't show up", piiPresent);
   }
+
+  @Test
+  public void testDefaultValue() {
+GenericRecord rec = new GenericData.Record(new 
Schema.Parser().parse(EXAMPLE_SCHEMA));
+rec.put("_row_key", "key1");
+rec.put("non_pii_col", "val1");
+rec.put("pii_col", "val2");
+rec.put("timestamp", 3.5);
 
 Review comment:
   No its not that the original record has default values as null. Its just 
while getting the values from the record, default values are not considered. 
Please have a look at this function from avro-1.8.2 library - 
   
   @Override public Object get(String key) {
 Field field = schema.getField(key);
 if (field == null) return null;
 return values[field.pos()];
   }
   
   Ideally the above function should get field.defaultVal() in case 
values[field.pos()] is null, but that is not the case. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-24 Thread GitBox

pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy 
default values of fields if not present when rewriting incoming record with new 
schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r396992378
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestHoodieAvroUtils.java
 ##
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
 }
 Assert.assertTrue("column pii_col doesn't show up", piiPresent);
   }
+
+  @Test
+  public void testDefaultValue() {
+GenericRecord rec = new GenericData.Record(new 
Schema.Parser().parse(EXAMPLE_SCHEMA));
+rec.put("_row_key", "key1");
+rec.put("non_pii_col", "val1");
+rec.put("pii_col", "val2");
+rec.put("timestamp", 3.5);
 
 Review comment:
   No its not that the original record has default values as null. Its just 
while getting the values from the record, default values are not considered. 
Please have a look at this function - 
   
   @Override public Object get(String key) {
 Field field = schema.getField(key);
 if (field == null) return null;
 return values[field.pos()];
   }
   
   Ideally the above function should get field.defaultVal() in case 
values[field.pos()] is null, but that is not the case. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema

2020-03-24 Thread GitBox

pratyakshsharma commented on a change in pull request #1427: [HUDI-727]: Copy 
default values of fields if not present when rewriting incoming record with new 
schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r396965661
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
 ##
 @@ -104,15 +103,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", null);
 
 Review comment:
   yeah. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

47 matches

Mail list logo