[jira] [Updated] (HUDI-398) Add set env for spark launcher

2019-12-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-398:

Labels: pull-request-available  (was: )

> Add set env for spark launcher
> --
>
> Key: HUDI-398
> URL: https://issues.apache.org/jira/browse/HUDI-398
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-12-11-14-44-55-064.png, 
> image-2019-12-11-14-45-27-764.png
>
>
> It always throw exception 'SPAR_HOEM not found' when SPARK_HOME is not set. 
> So we need quit and set it.
> !image-2019-12-11-14-45-27-764.png!
> After add this function for cli, we can type SPARK_HOEM and other conf on 
> hudi-CLI.
> !image-2019-12-11-14-44-55-064.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1096: [HUDI-398]Add set env for spark launcher

2019-12-10 Thread GitBox
hddong opened a new pull request #1096: [HUDI-398]Add set env for spark launcher
URL: https://github.com/apache/incubator-hudi/pull/1096
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *It always throw exception `SPAR_HOEM not found` when SPARK_HOME is not set, 
so we need quit and set it.
   ```
   hudi:hudi_cow_table->commit rollback --commit 20191210155859
   Command failed java.lang.IllegalStateException: Spark home not found; set it 
explicitly or use the SPARK_HOME environment variable.
   Listening for transport dt_socket at address: 5005
   Spark home not found; set it explicitly or use the SPARK_HOME environment 
variable.
   java.lang.IllegalStateException: Spark home not found; set it explicitly or 
use the SPARK_HOME environment variable.
   at 
org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)
   at 
org.apache.spark.launcher.AbstractCommandBuilder.getSparkHome(AbstractCommandBuilder.java:248)
   ```
   After add this function for cli, we can type SPARK_HOEM and other conf on 
hudi-CLI:
   `[hudi->set --conf SPARK_HOME=/usr/bch/3.0.1/spark`
   *
   
   ## Brief change log
   
   *(for example:)*
 - *add a new cli command to set conf for spark launcher*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-398) Add set env for spark launcher

2019-12-10 Thread hong dongdong (Jira)
hong dongdong created HUDI-398:
--

 Summary: Add set env for spark launcher
 Key: HUDI-398
 URL: https://issues.apache.org/jira/browse/HUDI-398
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: CLI
Reporter: hong dongdong
 Attachments: image-2019-12-11-14-44-55-064.png, 
image-2019-12-11-14-45-27-764.png

It always throw exception 'SPAR_HOEM not found' when SPARK_HOME is not set. So 
we need quit and set it.

!image-2019-12-11-14-45-27-764.png!

After add this function for cli, we can type SPARK_HOEM and other conf on 
hudi-CLI.

!image-2019-12-11-14-44-55-064.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement 
prometheus metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#discussion_r356425369
 
 

 ##
 File path: hudi-client/pom.xml
 ##
 @@ -117,6 +117,18 @@
   io.dropwizard.metrics
   metrics-core
 
+
 
 Review comment:
   hi, this dependency can remove.
   ```
   
 io.prometheus
 simpleclient
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement 
prometheus metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#discussion_r356425369
 
 

 ##
 File path: hudi-client/pom.xml
 ##
 @@ -117,6 +117,18 @@
   io.dropwizard.metrics
   metrics-core
 
+
 
 Review comment:
   hi, this dependency can remove.
   
 io.prometheus
 simpleclient
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement 
prometheus metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#discussion_r356425369
 
 

 ##
 File path: hudi-client/pom.xml
 ##
 @@ -117,6 +117,18 @@
   io.dropwizard.metrics
   metrics-core
 
+
 
 Review comment:
   hi, this dependency can remove.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch hudi_test_suite_refactor updated (c82d6d9 -> c2c9347)

2019-12-10 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard c82d6d9  Hudi Test Suite - Flexible schema payload generation 
- Different types of workload generation such as inserts, upserts etc - 
Post process actions to perform validations - Interoperability of test 
suite to use HoodieWriteClient and HoodieDeltaStreamer so both code paths can 
be tested - Custom workload sequence generator - Ability to perform 
parallel operations, such as upsert and compaction
 new c2c9347  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (c82d6d9)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (c2c9347)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:



[jira] [Commented] (HUDI-334) Clean up terminologies in code/docs

2019-12-10 Thread Shahida Khan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993239#comment-16993239
 ] 

Shahida Khan commented on HUDI-334:
---

[~vinoth] : did we came up with end suggestion.??
I would like to work on this.

> Clean up terminologies in code/docs
> ---
>
> Key: HUDI-334
> URL: https://issues.apache.org/jira/browse/HUDI-334
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> https://lists.apache.org/thread.html/9fa4d2eec9347ba640649151da659e159f27dc3444b003752bee0176@%3Cdev.hudi.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-322) DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

2019-12-10 Thread Shahida Khan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993237#comment-16993237
 ] 

Shahida Khan edited comment on HUDI-322 at 12/11/19 6:28 AM:
-

[~vinoth]  [~xleesf]  I would like to pick this up, if nobody is working on 
this..


was (Author: srkhan):
I would like to pick this up, if nobody is working on this..

> DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
> -
>
> Key: HUDI-322
> URL: https://issues.apache.org/jira/browse/HUDI-322
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-322) DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

2019-12-10 Thread Shahida Khan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993237#comment-16993237
 ] 

Shahida Khan commented on HUDI-322:
---

I would like to pick this up, if nobody is working on this..

> DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
> -
>
> Key: HUDI-322
> URL: https://issues.apache.org/jira/browse/HUDI-322
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#issuecomment-564394638
 
 
   hi, Thanks for opening the PR @XuQianJin-Stars, this test case is useless. 
Because PushGateway use `io.prometheus.client.Gauge` instead of 
`com.codahale.metrics.Gauge`.
   It need to regist `io.prometheus.client.Gauge` firstly, after that we can 
send these metrics.
   
   For example, `MetricRegistry registry = new MetricRegistry();` belllow is 
redundant.
   ```
   PushGateway pushGateway = new PushGateway("localhost:9091");
   
   
   MetricRegistry registry = new MetricRegistry();
   registry.register("push_gateway1", (Gauge) () -> 123L);
   
   
   io.prometheus.client.Gauge submitActiveTasksGauge = 
io.prometheus.client.Gauge.build()
   .name("active_tasks")
   .help("active_tasks").register();
   submitActiveTasksGauge.set(5L);
   
   pushGateway.push(CollectorRegistry.defaultRegistry, "PushGateway");
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-394) Provide a basic implementation of test suite

2019-12-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-394:
--
Description: 
It provides:
 * Flexible schema payload generation
 * Different types of workload generation such as inserts, upserts etc
 * Post process actions to perform validations
 * Interoperability of test suite to use HoodieWriteClient and 
HoodieDeltaStreamer so both code paths can be tested
 * Custom workload sequence generator
 * Ability to perform parallel operations, such as upsert and compaction

> Provide a basic implementation of test suite
> 
>
> Key: HUDI-394
> URL: https://issues.apache.org/jira/browse/HUDI-394
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: Nishith Agarwal
>Priority: Major
>
> It provides:
>  * Flexible schema payload generation
>  * Different types of workload generation such as inserts, upserts etc
>  * Post process actions to perform validations
>  * Interoperability of test suite to use HoodieWriteClient and 
> HoodieDeltaStreamer so both code paths can be tested
>  * Custom workload sequence generator
>  * Ability to perform parallel operations, such as upsert and compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-210) Implement prometheus metrics reporter

2019-12-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-210:
-

Assignee: Forward Xu  (was: vinoyang)

> Implement prometheus metrics reporter
> -
>
> Key: HUDI-210
> URL: https://issues.apache.org/jira/browse/HUDI-210
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: vinoyang
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since Prometheus is a very popular monitoring system and time series 
> database, it would be better to provide a metrics reporter to report metrics 
> to prometheus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#issuecomment-564394638
 
 
   hi, Thanks for opening the PR @XuQianJin-Stars, this test case is useless. 
Because PushGateway use `io.prometheus.client.Gauge` instead of 
`com.codahale.metrics.Gauge`.
   It need to regist `io.prometheus.client.Gauge` firstly, after that we can 
send these metrics.
   
   For example,
   ```
   PushGateway pushGateway = new PushGateway("localhost:9091");
   
   
   MetricRegistry registry = new MetricRegistry();
   registry.register("push_gateway1", (Gauge) () -> 123L);
   
   
   io.prometheus.client.Gauge submitActiveTasksGauge = 
io.prometheus.client.Gauge.build()
   .name("active_tasks")
   .help("active_tasks").register();
   submitActiveTasksGauge.set(5L);
   
   pushGateway.push(CollectorRegistry.defaultRegistry, "PushGateway");
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1095: [HUDI-210] Implement prometheus 
metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#issuecomment-564394638
 
 
   hi, @XuQianJin-Stars, this test case is useless. Because PushGateway use 
`io.prometheus.client.Gauge` instead of `com.codahale.metrics.Gauge`.
   It need to regist `io.prometheus.client.Gauge` firstly, after that we can 
send these metrics.
   
   For example,
   ```
   PushGateway pushGateway = new PushGateway("localhost:9091");
   
   
   MetricRegistry registry = new MetricRegistry();
   registry.register("push_gateway1", (Gauge) () -> 123L);
   
   
   io.prometheus.client.Gauge submitActiveTasksGauge = 
io.prometheus.client.Gauge.build()
   .name("active_tasks")
   .help("active_tasks").register();
   submitActiveTasksGauge.set(5L);
   
   pushGateway.push(CollectorRegistry.defaultRegistry, "PushGateway");
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken commented on issue #1095: [HUDI-210] Implement prometheus metrics 
reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#issuecomment-564394638
 
 
   hi, @XuQianJin-Stars, this test case is useless. Because PushGateway use 
`io.prometheus.client.Gauge` instead of `com.codahale.metrics.Gauge`.
   It need to regist `io.prometheus.client.Gauge` firstly, after that we can 
send these metrics.
   
   For example,
   ```
   PushGateway pushGateway = new PushGateway("localhost:9091");
   
   MetricRegistry registry = new MetricRegistry();
   registry.register("push_gateway1", (Gauge) () -> 123L);
   
   
   io.prometheus.client.Gauge submitActiveTasksGauge = 
io.prometheus.client.Gauge.build()
   .name("active_tasks")
   .help("active_tasks").register();
   submitActiveTasksGauge.set(5L);
   
   pushGateway.push(CollectorRegistry.defaultRegistry, "PushGateway");
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1095: [HUDI-210] Implement 
prometheus metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095#discussion_r356413234
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMetricsConfig.java
 ##
 @@ -57,6 +57,14 @@
 
   public static final String GRAPHITE_METRIC_PREFIX = GRAPHITE_PREFIX + 
".metric.prefix";
 
+  // Prometheus
+  public static final String PROMETHEUS_PUSHGATEWAY_PREFIX = METRIC_PREFIX + 
".graphite";
+  public static final String PROMETHEUS_PUSHGATEWAY_HOST = 
PROMETHEUS_PUSHGATEWAY_PREFIX + ".host";
+  public static final String DEFAULT_PROMETHEUS_PUSHGATEWAY_HOST = "localhost";
+
+  public static final String  PROMETHEUS_PUSHGATEWAY_PORT = 
PROMETHEUS_PUSHGATEWAY_PREFIX + ".port";
+  public static final int DEFAULT_PROMETHEUS_PUSHGATEWAY_PORT = 8080;
 
 Review comment:
   Hi, default port is `9091`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-251:
---

Assignee: (was: Taher Koitawala)

> JDBC incremental load to HUDI with DeltaStreamer
> 
>
> Key: HUDI-251
> URL: https://issues.apache.org/jira/browse/HUDI-251
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Taher Koitawala
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-309:

Fix Version/s: (was: 0.5.1)
   0.5.2

> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.2
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-389:

Status: In Progress  (was: Open)

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 

[jira] [Assigned] (HUDI-322) DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-322:
---

Assignee: (was: Vinoth Chandar)

> DeltaSteamer should pick checkpoints off only deltacommits for MOR tables
> -
>
> Key: HUDI-322
> URL: https://issues.apache.org/jira/browse/HUDI-322
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-335) Improvements to DiskBasedMap

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-335:

Status: In Progress  (was: Open)

> Improvements to DiskBasedMap
> 
>
> Key: HUDI-335
> URL: https://issues.apache.org/jira/browse/HUDI-335
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balajee Nagasubramaniam
>Priority: Major
>  Labels: Hoodie
> Fix For: 0.5.1
>
> Attachments: Screen Shot 2019-11-11 at 1.22.44 PM.png, Screen Shot 
> 2019-11-13 at 2.56.53 PM.png
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> DiskBasedMap is used by ExternalSpillableMap for writing (K,V) pair to a file,
> keeping the (K, fileMetadata) in memory, to reduce the foot print of the 
> record on disk.
> This change improves the performance of the record get/read operation to 
> disk, by using
> a BufferedInputStream to cache the data.
> Results from POC are promising.   Before the write performance improvement, 
> spilling/writing 1 million records (record size ~ 350 bytes) to the file took 
> about 104 seconds. 
> After the improvement, same operation can be performed in under 5 seconds
> Similarly, before the read performance improvement reading 1 million records 
> (size ~350 bytes) from the spill file took about 23 seconds.  After the 
> improvement, same operation can be performed in under 4 seconds.
> {{without read/write performance improvements 
> 
> RecordsHandled:   1   totalTestTime:  3145writeTime:  1176
> readTime:   255
> RecordsHandled:   5   totalTestTime:  5775writeTime:  4187
> readTime:   1175
> RecordsHandled:   10  totalTestTime:  10570   writeTime:  7718
> readTime:   2203
> RecordsHandled:   50  totalTestTime:  59723   writeTime:  45618   
> readTime:   11093
> RecordsHandled:   100 totalTestTime:  120022  writeTime:  87918   
> readTime:   22355
> RecordsHandled:   200 totalTestTime:  258627  writeTime:  187185  
> readTime:   56431}}
> {{With write improvement:
> RecordsHandled:   1   totalTestTime:  2013writeTime:  700 
> readTime:   503
> RecordsHandled:   5   totalTestTime:  2525writeTime:  390 
> readTime:   1247
> RecordsHandled:   10  totalTestTime:  3583writeTime:  464 
> readTime:   2352
> RecordsHandled:   50  totalTestTime:  22934   writeTime:  3731
> readTime:   15778
> RecordsHandled:   100 totalTestTime:  42415   writeTime:  4816
> readTime:   30332
> RecordsHandled:   200 totalTestTime:  74158   writeTime:  10192   
> readTime:   53195}}
> {{With read improvements:
> RecordsHandled:   1   totalTestTime:  2473writeTime:  1562
> readTime:   87
> RecordsHandled:   5   totalTestTime:  6169writeTime:  5151
> readTime:   438
> RecordsHandled:   10  totalTestTime:  9967writeTime:  8636
> readTime:   252
> RecordsHandled:   50  totalTestTime:  50889   writeTime:  46766   
> readTime:   1014
> RecordsHandled:   100 totalTestTime:  114482  writeTime:  104353  
> readTime:   3776
> RecordsHandled:   200 totalTestTime:  239251  writeTime:  219041  
> readTime:   8127}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-349) Make cleaner retention based on time period to account for higher deviations in ingestion runs

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-349:

Fix Version/s: (was: 0.5.1)

> Make cleaner retention based on time period to account for higher deviations 
> in ingestion runs
> --
>
> Key: HUDI-349
> URL: https://issues.apache.org/jira/browse/HUDI-349
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Cleaner, newbie
>Reporter: Balaji Varadarajan
>Assignee: Aravind Suresh
>Priority: Major
>
> Cleaner by commits is based on number of commits to be retained.  Ingestion 
> time could vary across runs due to various factors. For providing a bound on 
> the maximum running time for a query and for providing consistent retention 
> period, it is better to use a retention config based on time (e:g 12h) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-305) Presto MOR "_rt" queries only reads base parquet file

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-305:

Fix Version/s: (was: 0.5.1)

> Presto MOR "_rt" queries only reads base parquet file 
> --
>
> Key: HUDI-305
> URL: https://issues.apache.org/jira/browse/HUDI-305
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Presto Integration
> Environment: On AWS EMR
>Reporter: Brandon Scheller
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>
> Code example to reproduce.
> {code:java}
> import org.apache.hudi.DataSourceWriteOptions
> import org.apache.hudi.config.HoodieWriteConfig
> import org.apache.spark.sql.SaveMode
> val df = Seq(
>   ("100", "event_name_900", "2015-01-01T13:51:39.340396Z", "type1"),
>   ("101", "event_name_546", "2015-01-01T12:14:58.597216Z", "type2"),
>   ("104", "event_name_123", "2015-01-01T12:15:00.512679Z", "type1"),
>   ("105", "event_name_678", "2015-01-01T13:51:42.248818Z", "type2")
>   ).toDF("event_id", "event_name", "event_ts", "event_type")
> var tableName = "hudi_events_mor_1"
> var tablePath = "s3://emr-users/wenningd/hudi/tables/events/" + tableName
> // write hudi dataset
> df.write.format("org.apache.hudi")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
>   .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type") 
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>   .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>   .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>   .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>   .option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>   .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>   .mode(SaveMode.Overwrite)
>   .save(tablePath)
> // update a record with event_name "event_name_123" => "event_name_changed"
> val df1 = spark.read.format("org.apache.hudi").load(tablePath + "/*/*")
> val df2 = df1.filter($"event_id" === "104")
> val df3 = df2.withColumn("event_name", lit("event_name_changed"))
> // update hudi dataset
> df3.write.format("org.apache.hudi")
>.option(HoodieWriteConfig.TABLE_NAME, tableName)
>.option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
>.option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
>.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type") 
>.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>.option("hoodie.compact.inline", "false")
>.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>.option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>.mode(SaveMode.Append)
>.save(tablePath)
> {code}
> Now when querying the real-time table from Hive, we have no issue seeing the 
> updated value:
> {code:java}
> hive> select event_name from hudi_events_mor_1_rt;
> OK
> event_name_900
> event_name_changed
> event_name_546
> event_name_678
> Time taken: 0.103 seconds, Fetched: 4 row(s)
> {code}
> But when querying the real-time table from Presto, we only read the base 
> parquet file and do not see the update that should be merged in from the log 
> file.
> {code:java}
> presto:default> select event_name from hudi_events_mor_1_rt;
>event_name
> 
>  event_name_900
>  event_name_123
>  event_name_546
>  event_name_678
> (4 rows)
> {code}
> Our current understanding of this issue is that while the 
> HoodieParquetRealtimeInputFormat correctly generates the splits. The 
> RealtimeCompactedRecordReader record reader is not used so it is not reading 
> the log file and only reading the base parquet file.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-53) Implement Record level Index to map a record key to a pair #90

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-53:
---
Fix Version/s: (was: 0.5.1)

> Implement Record level Index to map a record key to a  FileID> pair #90
> ---
>
> Key: HUDI-53
> URL: https://issues.apache.org/jira/browse/HUDI-53
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Write Client
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://github.com/uber/hudi/issues/90] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-377) Add Delete() support to HoodieDeltaStreamer

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-377:

Status: In Progress  (was: Open)

> Add Delete() support to HoodieDeltaStreamer
> ---
>
> Key: HUDI-377
> URL: https://issues.apache.org/jira/browse/HUDI-377
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 72h
>  Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> Add Delete() support to HoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-310:

Fix Version/s: (was: 0.5.1)

> DynamoDB/Kinesis Change Capture using Delta Streamer
> 
>
> Key: HUDI-310
> URL: https://issues.apache.org/jira/browse/HUDI-310
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: deltastreamer
>Reporter: Vinoth Chandar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-289:
---

Assignee: vinoyang

> Implement a test suite to support long running test for Hudi writing and 
> querying end-end
> -
>
> Key: HUDI-289
> URL: https://issues.apache.org/jira/browse/HUDI-289
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Major
> Fix For: 0.5.1
>
>
> We would need an equivalent of an end-end test which runs some workload for 
> few hours atleast, triggers various actions like commit, deltacopmmit, 
> rollback, compaction and ensures correctness of code before every release
> P.S: Learn from all the CSS issues managing compaction..
> The feature branch is here: 
> [https://github.com/apache/incubator-hudi/tree/hudi_test_suite_refactor]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-210) Implement prometheus metrics reporter

2019-12-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-210:

Labels: pull-request-available  (was: )

> Implement prometheus metrics reporter
> -
>
> Key: HUDI-210
> URL: https://issues.apache.org/jira/browse/HUDI-210
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> Since Prometheus is a very popular monitoring system and time series 
> database, it would be better to provide a metrics reporter to report metrics 
> to prometheus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] XuQianJin-Stars opened a new pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2019-12-10 Thread GitBox
XuQianJin-Stars opened a new pull request #1095: [HUDI-210] Implement 
prometheus metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1095
 
 
   ## What is the purpose of the pull request
   
   Since Prometheus is a very popular monitoring system and time series 
database, it would be better to provide a metrics reporter to report metrics to 
prometheus.
   
   ## Brief change log
   
   *The changes are as follows:*
 - *pom.xml*
 - *hudi-client/pom.xml*
 - 
*hudi-client/src/main/java/org/apache/hudi/config/HoodieMetricsConfig.java*
 - *hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java*
 - *metrics/MetricsPrometheusPushGatewayReporter.java*
 - *metrics/MetricsReporterFactory.java*
 - *metrics/MetricsReporterType.java*
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
 - *By TestHoodiePrometheusPushGatewayMetrics.java to verify the change.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #125

2019-12-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.19 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1094: [HUDI-375] Refactor the configure framework of hudi project

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1094: [HUDI-375] Refactor the configure 
framework of hudi project
URL: https://github.com/apache/incubator-hudi/pull/1094#issuecomment-564364755
 
 
   > Hi lamber-ken, Thanks for opening the PR. Just a reminder, should we reach 
a consensus and start a VOTE thread in ML before coding?
   
   hi, this is a quick preview for understanding.   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1094: [HUDI-375] Refactor the configure framework of hudi project

2019-12-10 Thread GitBox
lamber-ken commented on issue #1094: [HUDI-375] Refactor the configure 
framework of hudi project
URL: https://github.com/apache/incubator-hudi/pull/1094#issuecomment-564364755
 
 
   > Hi lamber-ken, Thanks for opening the PR. Just a reminder, should we reach 
a consensus and start a VOTE thread in ML before coding?
   
   hi, this is a quick preview for understanding.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1093: [MINOR] replace scala map add 
operator
URL: https://github.com/apache/incubator-hudi/pull/1093#issuecomment-564251346
 
 
   From scala api, we can learn about `++:`, we may are more familiar with 
`++`. From my side, it's better replace `++:` with `++`.  
   
   
https://www.scala-lang.org/api/2.11.0/index.html#scala.collection.immutable.List


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
lamber-ken edited a comment on issue #1093: [MINOR] replace scala map add 
operator
URL: https://github.com/apache/incubator-hudi/pull/1093#issuecomment-564251346
 
 
   From scala api, we can learn about `++:`, we may are more familiar with 
`++`. From my side, it's better replace `++:` with `++`.  
   
   It's better to unify the operator in one project, 
`IncrementalRelation#buildScan` use `++=`.
   
   
https://www.scala-lang.org/api/2.11.0/index.html#scala.collection.immutable.List


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1057: Hudi Test Suite

2019-12-10 Thread GitBox
yanghua commented on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-564340367
 
 
   > @yanghua Thanks for the details, I've squashed the commits.
   
   Thanks, if you do not mind, I'd like to rename your squashed commit message 
and carry `HUDI-394` at the beginning of the commit message so that all the 
works can be tracked via Jira issues. It can provide a more detailed context 
for the community. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-376) AWS Glue dependency issue for EMR 5.28.0

2019-12-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-376:

Fix Version/s: 0.5.1

> AWS Glue dependency issue for EMR 5.28.0
> 
>
> Key: HUDI-376
> URL: https://issues.apache.org/jira/browse/HUDI-376
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Xing Pan
>Priority: Minor
> Fix For: 0.5.1
>
>
> Hi hudi team, it's really encouraging that Hudi is finally officially 
> supported application on AWS EMR. Great job!
> I found a *ClassNotFound* exception when using:
> {code:java}
> /usr/lib/hudi/bin/run_sync_tool.sh
> {code}
> in emr master.
> And I think is due to demand of aws glue data sdk dependency. (I used aws 
> glue as hive meta data)
> So I added a line to run_sync_tool.sh to get a quick fix for this:
> {code:java}
> HIVE_JARS=$HIVE_JARS:/usr/lib/hive/auxlib/aws-glue-datacatalog-hive2-client.jar:/usr/share/aws/emr/emr-metrics-collector/lib/aws-java-sdk-glue-1.11.475.jar{code}
> not sure if any more jars needed, but these two jar fixed my problem.
>  
> I think it would be great if take glue in consideration for emr scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on issue #1057: Hudi Test Suite

2019-12-10 Thread GitBox
yanghua commented on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-564339647
 
 
   > @yanghua thanks! So I take it that HUDI-394 is what @n3nash owns via fixes 
you mentioned?
   
   Yes, it's the first step of HUDI-289 and it blocks other subtasks. It has 
been done and just mapping the exists implementation provided by @n3nash .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-394) Provide a basic implementation of test suite

2019-12-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-394:
-

Assignee: Nishith Agarwal

> Provide a basic implementation of test suite
> 
>
> Key: HUDI-394
> URL: https://issues.apache.org/jira/browse/HUDI-394
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: Nishith Agarwal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1094: [HUDI-375] Refactor the configure framework of hudi project

2019-12-10 Thread GitBox
leesf commented on issue #1094: [HUDI-375] Refactor the configure framework of 
hudi project
URL: https://github.com/apache/incubator-hudi/pull/1094#issuecomment-564338776
 
 
   Hi lamber-ken, Thanks for opening the PR. Just a reminder, should we reach a 
consensus and start a VOTE thread in ML before coding?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset

2019-12-10 Thread GitBox
bvaradar commented on a change in pull request #1009:  [HUDI-308] Avoid Renames 
for tracking state transitions of all actions on dataset
URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r356340283
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -824,14 +843,14 @@ private String startInstant() {
   "Found commits after time :" + lastCommit + ", please rollback 
greater commits first");
 }
 
-List inflights =
-
inflightCommitTimeline.getInstants().map(HoodieInstant::getTimestamp).collect(Collectors.toList());
+List inflights = 
inflightAndRequestedCommitTimeline.getInstants().map(HoodieInstant::getTimestamp)
 
 Review comment:
   Yes, it should work fine. For restoring (reverting completed) compactions, 
we would still delete log files as part of first rolling-back the inflight 
state of the compaction before we try to rollback requested state. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset

2019-12-10 Thread GitBox
n3nash commented on a change in pull request #1009:  [HUDI-308] Avoid Renames 
for tracking state transitions of all actions on dataset
URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r356321928
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
 ##
 @@ -169,7 +180,14 @@ public boolean archiveIfRequired(final JavaSparkContext 
jsc) throws IOException
   }).limit(commitTimeline.countInstants() - minCommitsToKeep));
 }
 
-return instants;
+// For archiving and cleaning instants, we need to include intermediate 
state files if they exist
+HoodieActiveTimeline rawActiveTimeline = new 
HoodieActiveTimeline(metaClient, false);
+Map, List> groupByTsAction = 
rawActiveTimeline.getInstants()
+.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(),
+x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? 
HoodieTimeline.COMMIT_ACTION : x.getAction(;
+
+return instants.flatMap(hoodieInstant ->
 
 Review comment:
   What I'm saying is group by only the instants chosen to archive, all types 
of actions is fine..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset

2019-12-10 Thread GitBox
n3nash commented on a change in pull request #1009:  [HUDI-308] Avoid Renames 
for tracking state transitions of all actions on dataset
URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r356321448
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -824,14 +843,14 @@ private String startInstant() {
   "Found commits after time :" + lastCommit + ", please rollback 
greater commits first");
 }
 
-List inflights =
-
inflightCommitTimeline.getInstants().map(HoodieInstant::getTimestamp).collect(Collectors.toList());
+List inflights = 
inflightAndRequestedCommitTimeline.getInstants().map(HoodieInstant::getTimestamp)
 
 Review comment:
   So, that's true for other types of actions but for compaction we need to 
also delete the extra log file created (I'm guessing this part of the code 
works as expected already and isn't changed as part of this diff, correct ? )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1057: Hudi Test Suite

2019-12-10 Thread GitBox
n3nash commented on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-564295355
 
 
   @yanghua Thanks for the details, I've squashed the commits.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch hudi_test_suite_refactor updated (504c2cc -> c82d6d9)

2019-12-10 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 504c2cc  Fixing some unit tests
omit 0c2ed53  fixing build issues due to javax servlet
omit 07b4c12  Adressing CR comments part 1
omit c3f8c8c  Hudi Test Suite - Flexible schema payload generation  
   - Different types of workload generation such as inserts, upserts etc - 
Post process actions to perform validations - Interoperability of test 
suite to use HoodieWriteClient and HoodieDeltaStreamer so both code paths can 
be tested - Custom workload sequence generator - Ability to perform 
parallel operations, such as upsert and compaction
 new c82d6d9  Hudi Test Suite - Flexible schema payload generation 
- Different types of workload generation such as inserts, upserts etc - 
Post process actions to perform validations - Interoperability of test 
suite to use HoodieWriteClient and HoodieDeltaStreamer so both code paths can 
be tested - Custom workload sequence generator - Ability to perform 
parallel operations, such as upsert and compaction

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (504c2cc)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (c82d6d9)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:



[jira] [Closed] (HUDI-368) Code clean up in TestAsyncCompaction class

2019-12-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-368.
--
Resolution: Fixed

Fixed via master: 3790b75e059a06e6f5467c8b8d549ef38cd6b98a

> Code clean up in TestAsyncCompaction class
> --
>
> Key: HUDI-368
> URL: https://issues.apache.org/jira/browse/HUDI-368
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Compaction, Testing
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestAsyncCompaction class has a lot of redundant method calls, or lambda 
> functions which can be simplified further. Also there are few unused 
> variables getting defined which can be removed. 
>  
> For example -> 
> assertFalse("Verify all file-slices have no log-files",
>  fileSliceList.stream().filter(fs -> fs.getLogFiles().count() > 
> 0).findAny().isPresent());
> can be simplified as - 
> assertFalse("Verify all file-slices have no log-files",
>  fileSliceList.stream().anyMatch(fs -> fs.getLogFiles().count() > 0));



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-390) Hive Sync should support keywords are table names

2019-12-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-390.
--
Resolution: Fixed

Fixed via master: 8df4b83017f74173b8289ab50b0f723a38e8eebe

> Hive Sync should support keywords are table names
> -
>
> Key: HUDI-390
> URL: https://issues.apache.org/jira/browse/HUDI-390
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/1084]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf merged pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
leesf merged pull request #1050: [HUDI-368]: code clean up in 
TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-368] code clean up in TestAsyncCompaction class (#1050)

2019-12-10 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 3790b75  [HUDI-368] code clean up in TestAsyncCompaction class (#1050)
3790b75 is described below

commit 3790b75e059a06e6f5467c8b8d549ef38cd6b98a
Author: Pratyaksh Sharma 
AuthorDate: Wed Dec 11 03:22:41 2019 +0530

[HUDI-368] code clean up in TestAsyncCompaction class (#1050)
---
 .../java/org/apache/hudi/TestAsyncCompaction.java  | 89 +-
 1 file changed, 34 insertions(+), 55 deletions(-)

diff --git a/hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java 
b/hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
index 451a2b1..0643708 100644
--- a/hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
+++ b/hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
@@ -35,7 +35,6 @@ import 
org.apache.hudi.common.table.timeline.HoodieInstant.State;
 import org.apache.hudi.common.table.view.FileSystemViewStorageConfig;
 import org.apache.hudi.common.table.view.FileSystemViewStorageType;
 import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
-import org.apache.hudi.common.util.AvroUtils;
 import org.apache.hudi.common.util.CompactionUtils;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.Pair;
@@ -108,17 +107,15 @@ public class TestAsyncCompaction extends 
TestHoodieClientBase {
 
   HoodieInstant pendingCompactionInstant =
   
metaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant().get();
-  assertTrue("Pending Compaction instant has expected instant time",
-  
pendingCompactionInstant.getTimestamp().equals(compactionInstantTime));
-  assertTrue("Pending Compaction instant has expected state",
-  pendingCompactionInstant.getState().equals(State.REQUESTED));
+  assertEquals("Pending Compaction instant has expected instant time", 
pendingCompactionInstant.getTimestamp(),
+  compactionInstantTime);
+  assertEquals("Pending Compaction instant has expected state", 
pendingCompactionInstant.getState(), State.REQUESTED);
 
-  moveCompactionFromRequestedToInflight(compactionInstantTime, client, 
cfg);
+  moveCompactionFromRequestedToInflight(compactionInstantTime, cfg);
 
   // Reload and rollback inflight compaction
   metaClient = new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
cfg.getBasePath());
   HoodieTable hoodieTable = HoodieTable.getHoodieTable(metaClient, cfg, 
jsc);
-  hoodieTable.rollback(jsc, compactionInstantTime, false);
 
   client.rollbackInflightCompaction(
   new HoodieInstant(State.INFLIGHT, HoodieTimeline.COMPACTION_ACTION, 
compactionInstantTime), hoodieTable);
@@ -139,11 +136,6 @@ public class TestAsyncCompaction extends 
TestHoodieClientBase {
 }
   }
 
-  private Path getInstantPath(HoodieTableMetaClient metaClient, String 
timestamp, String action, State state) {
-HoodieInstant instant = new HoodieInstant(state, action, timestamp);
-return new Path(metaClient.getMetaPath(), instant.getFileName());
-  }
-
   @Test
   public void testRollbackInflightIngestionWithPendingCompaction() throws 
Exception {
 // Rollback inflight ingestion when there is pending compaction
@@ -171,12 +163,11 @@ public class TestAsyncCompaction extends 
TestHoodieClientBase {
   metaClient = new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
cfg.getBasePath());
   HoodieInstant pendingCompactionInstant =
   
metaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant().get();
-  assertTrue("Pending Compaction instant has expected instant time",
-  
pendingCompactionInstant.getTimestamp().equals(compactionInstantTime));
+  assertEquals("Pending Compaction instant has expected instant time", 
pendingCompactionInstant.getTimestamp(),
+  compactionInstantTime);
   HoodieInstant inflightInstant =
   
metaClient.getActiveTimeline().filterInflightsExcludingCompaction().firstInstant().get();
-  assertTrue("inflight instant has expected instant time",
-  inflightInstant.getTimestamp().equals(inflightInstantTime));
+  assertEquals("inflight instant has expected instant time", 
inflightInstant.getTimestamp(), inflightInstantTime);
 
   // This should rollback
   client.startCommitWithTime(nextInflightInstantTime);
@@ -184,14 +175,13 @@ public class TestAsyncCompaction extends 
TestHoodieClientBase {
   // Validate
   metaClient = new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
cfg.getBasePath());
   inflightInstant = 
metaClient.getActiveTimeline().filterInflightsExcludingCompaction().firstInstant().get();
-  assertTrue("inflight instant has expected instant time",
-  

[GitHub] [incubator-hudi] lamber-ken commented on issue #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
lamber-ken commented on issue #1093: [MINOR] replace scala map add operator
URL: https://github.com/apache/incubator-hudi/pull/1093#issuecomment-564251346
 
 
   From scala api, we can learn about `++:`, we may are more familiar with 
`++`. From my side, it's better replace `++:` with `++`.  
   
   
https://www.scala-lang.org/api/2.11.0/index.html#scala.collection.immutable.List


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-375) Refactor the configure framework of hudi project

2019-12-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-375:

Labels: pull-request-available  (was: )

> Refactor the configure framework of hudi project
> 
>
> Key: HUDI-375
> URL: https://issues.apache.org/jira/browse/HUDI-375
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> Currently, config items and their default value are dispersed in the java 
> class file. It's easy to confuse when config items are defined more and more, 
> so it's necessary to refactor the configure framework.
> May some things need to consider
>  # config item and default value may defined in a class
>  # provide a mechanism which can extract some config items for specific 
> component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1094: [HUDI-375] Refactor the configure framework of hudi project

2019-12-10 Thread GitBox
lamber-ken opened a new pull request #1094: [HUDI-375] Refactor the configure 
framework of hudi project
URL: https://github.com/apache/incubator-hudi/pull/1094
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Currently, many configuration items and their default values are dispersed 
in the config file like HoodieWriteConfig. It’s very confused for developers, 
and it's easy for developers to use them in a wrong place especially when there 
are more and more configuration items. If we can solve this, developers will 
benefit from it and the code structure will be more concise. 
   
   More details
   [1] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-375
   [2] 
https://cwiki.apache.org/confluence/display/HUDI/RFC-11+%3A+Refactor+of+the+configuration+framework+of+hudi+project
   
   ## Brief change log
   
 - Add ConfigOption
 - Modify HoodieWriteConfig
 - Make graphite metrics reporter as a demo
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell closed pull request #1052: [HUDI-326] Add new index to suppport global update/delete

2019-12-10 Thread GitBox
bschell closed pull request #1052: [HUDI-326] Add new index to suppport global 
update/delete
URL: https://github.com/apache/incubator-hudi/pull/1052
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-387) Fix NPE when create savepoint via hudi-cli

2019-12-10 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken closed HUDI-387.
---
Resolution: Fixed

Fixed via master: 24a09c775f6d4b855e2096001f5e034da80158a8

> Fix NPE when create savepoint via hudi-cli
> --
>
> Key: HUDI-387
> URL: https://issues.apache.org/jira/browse/HUDI-387
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When create savepoint via hudi-cli, throw NPE
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hudi.AbstractHoodieClient.(AbstractHoodieClient.java:68)
> at org.apache.hudi.HoodieWriteClient.(HoodieWriteClient.java:133)
> at org.apache.hudi.HoodieWriteClient.(HoodieWriteClient.java:128)
> at org.apache.hudi.HoodieWriteClient.(HoodieWriteClient.java:123)
> at 
> org.apache.hudi.cli.commands.SavepointsCommand.createHoodieClient(SavepointsCommand.java:143)
> at 
> org.apache.hudi.cli.commands.SavepointsCommand.savepoint(SavepointsCommand.java:98)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
> at 
> org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
> at 
> org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
> at 
> org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
> at 
> org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
> at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1077: Improvements to DiskbasedMap

2019-12-10 Thread GitBox
bvaradar commented on issue #1077: Improvements to DiskbasedMap
URL: https://github.com/apache/incubator-hudi/pull/1077#issuecomment-564155359
 
 
   @nbalajee : Can you take a look at comments  and address them ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-10 Thread GitBox
vinothchandar commented on issue #1075: [HUDI-114]: added option to overwrite 
payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#issuecomment-564153012
 
 
   Will take another pass! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1084: Hive Sync fails when table name is a keyword

2019-12-10 Thread GitBox
vinothchandar commented on issue #1084: Hive Sync fails when table name is a 
keyword
URL: https://github.com/apache/incubator-hudi/issues/1084#issuecomment-564152263
 
 
   Merged the fix. Closing


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar closed issue #1084: Hive Sync fails when table name is a keyword

2019-12-10 Thread GitBox
vinothchandar closed issue #1084: Hive Sync fails when table name is a keyword
URL: https://github.com/apache/incubator-hudi/issues/1084
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1057: Hudi Test Suite

2019-12-10 Thread GitBox
vinothchandar commented on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-564147389
 
 
   @yanghua thanks! So I take it that HUDI-394 is what @n3nash owns via fixes 
you mentioned?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992762#comment-16992762
 ] 

Vinoth Chandar commented on HUDI-389:
-

Brandon and I can review. 

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> 

[incubator-hudi] branch asf-site updated: [HUDI-380]: updated ide setup on contributing.html page (#1082)

2019-12-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9e30add  [HUDI-380]: updated ide setup on contributing.html page 
(#1082)
9e30add is described below

commit 9e30add249bdadb6b94cf0ff0090c4eaac625d68
Author: Pratyaksh Sharma 
AuthorDate: Tue Dec 10 23:03:56 2019 +0530

[HUDI-380]: updated ide setup on contributing.html page (#1082)
---
 docs/contributing.md | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/docs/contributing.md b/docs/contributing.md
index 0320ef7..335c1cd 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -21,10 +21,13 @@ To contribute code, you need
 
 ## IDE Setup
 
-To contribute, you would need to fork the Hudi code on Github & then clone 
your own fork locally. Once cloned, we recommend building as per instructions 
on [quickstart](quickstart.html)
-
-We have embraced the code style largely based on [google 
format](https://google.github.io/styleguide/javaguide.html). Please setup your 
IDE with style files from 
[here](https://github.com/apache/incubator-hudi/tree/master/style).
-These instructions have been tested on IntelliJ. We also recommend setting up 
the [Save Action 
Plugin](https://plugins.jetbrains.com/plugin/7642-save-actions) to auto format 
& organize imports on save. The Maven Compilation life-cycle will fail if there 
are checkstyle violations.
+To contribute, you would need to do the following
+ 
+ - Fork the Hudi code on Github & then clone your own fork locally. Once 
cloned, we recommend building as per instructions on 
[quickstart](quickstart.html)
+ - [Recommended] We have embraced the code style largely based on [google 
format](https://google.github.io/styleguide/javaguide.html). Please setup your 
IDE with style files from 
[here](https://github.com/apache/incubator-hudi/tree/master/style).
+These instructions have been tested on IntelliJ. 
+ - [Recommended] Set up the [Save Action 
Plugin](https://plugins.jetbrains.com/plugin/7642-save-actions) to auto format 
& organize imports on save. The Maven Compilation life-cycle will fail if there 
are checkstyle violations.
+ - [Optional] If needed, add spark jars to the classpath of your module in 
Intellij by following the steps from 
[here](https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project).
 
 
 
 ## Lifecycle



[GitHub] [incubator-hudi] vinothchandar merged pull request #1082: [HUDI-380]: updated ide setup on contributing.html page

2019-12-10 Thread GitBox
vinothchandar merged pull request #1082: [HUDI-380]: updated ide setup on 
contributing.html page
URL: https://github.com/apache/incubator-hudi/pull/1082
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1052: [HUDI-326] Add new index to suppport global update/delete

2019-12-10 Thread GitBox
vinothchandar edited a comment on issue #1052: [HUDI-326] Add new index to 
suppport global update/delete
URL: https://github.com/apache/incubator-hudi/pull/1052#issuecomment-564102891
 
 
   Okay saw your comment here. https://issues.apache.org/jira/browse/HUDI-389 . 
I will let you clarify if you think we need both the PRs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1052: [HUDI-326] Add new index to suppport global update/delete

2019-12-10 Thread GitBox
vinothchandar commented on issue #1052: [HUDI-326] Add new index to suppport 
global update/delete
URL: https://github.com/apache/incubator-hudi/pull/1052#issuecomment-564102891
 
 
   Okay saw your comment here. https://issues.apache.org/jira/browse/HUDI-389 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (8df4b83 -> 24a09c7)

2019-12-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 8df4b83  [HUDI-390] Add backtick character in hive queries to support 
hive identifier as tablename (#1090)
 add 24a09c7  [HUDI-387] Fix NPE when create savepoint via hudi-cli (#1085)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/cli/commands/SavepointsCommand.java   | 15 ++-
 .../main/java/org/apache/hudi/cli/utils/SparkUtil.java|  6 +-
 2 files changed, 15 insertions(+), 6 deletions(-)



[GitHub] [incubator-hudi] vinothchandar merged pull request #1085: [HUDI-387] Fix NPE when create savepoint via hudi-cli

2019-12-10 Thread GitBox
vinothchandar merged pull request #1085: [HUDI-387] Fix NPE when create 
savepoint via hudi-cli
URL: https://github.com/apache/incubator-hudi/pull/1085
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1052: [HUDI-326] Add new index to suppport global update/delete

2019-12-10 Thread GitBox
vinothchandar commented on issue #1052: [HUDI-326] Add new index to suppport 
global update/delete
URL: https://github.com/apache/incubator-hudi/pull/1052#issuecomment-564101116
 
 
   @bschell help review his PR? :) .. and close this one? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #1090: [HUDI-390] Add backtick character in hive queries to support hive identiter as tablename

2019-12-10 Thread GitBox
vinothchandar merged pull request #1090: [HUDI-390] Add backtick character in 
hive queries to support hive identiter as tablename
URL: https://github.com/apache/incubator-hudi/pull/1090
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (d447e2d -> 8df4b83)

2019-12-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from d447e2d  [checkstyle] Unify LOG form (#1092)
 add 8df4b83  [HUDI-390] Add backtick character in hive queries to support 
hive identifier as tablename (#1090)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/hive/HoodieHiveClient.java | 17 +++--
 .../main/java/org/apache/hudi/hive/util/SchemaUtil.java |  4 +++-
 2 files changed, 14 insertions(+), 7 deletions(-)



[GitHub] [incubator-hudi] vinothchandar commented on issue #1090: [HUDI-390] Add backtick character in hive queries to support hive identiter as tablename

2019-12-10 Thread GitBox
vinothchandar commented on issue #1090: [HUDI-390] Add backtick character in 
hive queries to support hive identiter as tablename
URL: https://github.com/apache/incubator-hudi/pull/1090#issuecomment-564093259
 
 
   thanks for reviewing @leesf :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1093: [MINOR] replace scala 
map add operator
URL: https://github.com/apache/incubator-hudi/pull/1093#discussion_r356016830
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
 ##
 @@ -50,7 +50,7 @@ class DefaultSource extends RelationProvider
   optParams: Map[String, String],
   schema: StructType): BaseRelation = {
 // Add default options for unspecified read options keys.
-val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++: 
optParams
+val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++ 
optParams
 
 Review comment:
   hi, @yanghua, can you help to verify whether this works fine in your env?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
lamber-ken commented on a change in pull request #1093: [MINOR] replace scala 
map add operator
URL: https://github.com/apache/incubator-hudi/pull/1093#discussion_r356016830
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
 ##
 @@ -50,7 +50,7 @@ class DefaultSource extends RelationProvider
   optParams: Map[String, String],
   schema: StructType): BaseRelation = {
 // Add default options for unspecified read options keys.
-val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++: 
optParams
+val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++ 
optParams
 
 Review comment:
   hi, @yanghua,  Can you help to verify whether this works fine in your env?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nisheet195 commented on a change in pull request #1090: [HUDI-390] Add backtick character in hive queries to support hive identiter as tablename

2019-12-10 Thread GitBox
nisheet195 commented on a change in pull request #1090: [HUDI-390] Add backtick 
character in hive queries to support hive identiter as tablename
URL: https://github.com/apache/incubator-hudi/pull/1090#discussion_r356014371
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/util/SchemaUtil.java
 ##
 @@ -402,7 +402,7 @@ public static String generateCreateDDL(MessageType 
storageSchema, HiveSyncConfig
 
 String partitionsStr = 
partitionFields.stream().collect(Collectors.joining(","));
 StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
-sb = sb.append(config.databaseName).append(".").append(config.tableName);
+sb = 
sb.append("`").append(config.databaseName).append("`.`").append(config.tableName).append("`");
 
 Review comment:
   Moved it to a variable


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
pratyakshsharma commented on a change in pull request #1050: [HUDI-368]: code 
clean up in TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050#discussion_r356001634
 
 

 ##
 File path: hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
 ##
 @@ -248,8 +238,7 @@ public void testScheduleIngestionBeforePendingCompaction() 
throws Exception {
 metaClient = new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
cfg.getBasePath());
 HoodieInstant pendingCompactionInstant =
 
metaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant().get();
-assertTrue("Pending Compaction instant has expected instant time",
-pendingCompactionInstant.getTimestamp().equals(compactionInstantTime));
+assertEquals("Pending Compaction instant has expected instant time", 
pendingCompactionInstant.getTimestamp(), compactionInstantTime);
 
 
 Review comment:
   done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
pratyakshsharma commented on a change in pull request #1050: [HUDI-368]: code 
clean up in TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050#discussion_r356001481
 
 

 ##
 File path: hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
 ##
 @@ -552,16 +535,14 @@ private void executeCompaction(String 
compactionInstantTime, HoodieWriteClient c
 FileStatus[] allFiles = 
HoodieTestUtils.listAllDataFilesInPath(table.getMetaClient().getFs(), 
cfg.getBasePath());
 HoodieTableFileSystemView view =
 new HoodieTableFileSystemView(table.getMetaClient(), 
table.getCompletedCommitsTimeline(), allFiles);
-List dataFilesToRead = 
view.getLatestDataFiles().collect(Collectors.toList());
-return dataFilesToRead;
+return view.getLatestDataFiles().collect(Collectors.toList());
   }
 
   private List getCurrentLatestFileSlices(HoodieTable table, 
HoodieWriteConfig cfg) throws IOException {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1093: [MINOR] replace scala map add operator

2019-12-10 Thread GitBox
leesf commented on a change in pull request #1093: [MINOR] replace scala map 
add operator
URL: https://github.com/apache/incubator-hudi/pull/1093#discussion_r355995018
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
 ##
 @@ -50,7 +50,7 @@ class DefaultSource extends RelationProvider
   optParams: Map[String, String],
   schema: StructType): BaseRelation = {
 // Add default options for unspecified read options keys.
-val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++: 
optParams
+val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++ 
optParams
 
 Review comment:
   `++:` works fine in my IDEA, did you install scala lib?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in 
TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050#discussion_r355992635
 
 

 ##
 File path: hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
 ##
 @@ -552,16 +535,14 @@ private void executeCompaction(String 
compactionInstantTime, HoodieWriteClient c
 FileStatus[] allFiles = 
HoodieTestUtils.listAllDataFilesInPath(table.getMetaClient().getFs(), 
cfg.getBasePath());
 HoodieTableFileSystemView view =
 new HoodieTableFileSystemView(table.getMetaClient(), 
table.getCompletedCommitsTimeline(), allFiles);
-List dataFilesToRead = 
view.getLatestDataFiles().collect(Collectors.toList());
-return dataFilesToRead;
+return view.getLatestDataFiles().collect(Collectors.toList());
   }
 
   private List getCurrentLatestFileSlices(HoodieTable table, 
HoodieWriteConfig cfg) throws IOException {
 
 Review comment:
   remove throws IOException? along with `moveCompactionFromRequestedToInflight`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in 
TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050#discussion_r355992635
 
 

 ##
 File path: hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
 ##
 @@ -552,16 +535,14 @@ private void executeCompaction(String 
compactionInstantTime, HoodieWriteClient c
 FileStatus[] allFiles = 
HoodieTestUtils.listAllDataFilesInPath(table.getMetaClient().getFs(), 
cfg.getBasePath());
 HoodieTableFileSystemView view =
 new HoodieTableFileSystemView(table.getMetaClient(), 
table.getCompletedCommitsTimeline(), allFiles);
-List dataFilesToRead = 
view.getLatestDataFiles().collect(Collectors.toList());
-return dataFilesToRead;
+return view.getLatestDataFiles().collect(Collectors.toList());
   }
 
   private List getCurrentLatestFileSlices(HoodieTable table, 
HoodieWriteConfig cfg) throws IOException {
 
 Review comment:
   remove throws IOException?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in TestAsyncCompaction class

2019-12-10 Thread GitBox
leesf commented on a change in pull request #1050: [HUDI-368]: code clean up in 
TestAsyncCompaction class
URL: https://github.com/apache/incubator-hudi/pull/1050#discussion_r355991790
 
 

 ##
 File path: hudi-client/src/test/java/org/apache/hudi/TestAsyncCompaction.java
 ##
 @@ -248,8 +238,7 @@ public void testScheduleIngestionBeforePendingCompaction() 
throws Exception {
 metaClient = new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
cfg.getBasePath());
 HoodieInstant pendingCompactionInstant =
 
metaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant().get();
-assertTrue("Pending Compaction instant has expected instant time",
-pendingCompactionInstant.getTimestamp().equals(compactionInstantTime));
+assertEquals("Pending Compaction instant has expected instant time", 
pendingCompactionInstant.getTimestamp(), compactionInstantTime);
 
 
 Review comment:
   line 246 `HoodieTableMetaClient metaClient = new 
HoodieTableMetaClient(jsc.hadoopConfiguration(), cfg.getBasePath());` would be 
removed too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (70a1040 -> d447e2d)

2019-12-10 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 70a1040  [MINOR] Beautify the cli banner (#1089)
 add d447e2d  [checkstyle] Unify LOG form (#1092)

No new revisions were added by this update.

Summary of changes:
 .../hudi/cli/commands/CompactionCommand.java   |  4 +-
 .../cli/commands/HDFSParquetImportCommand.java |  2 +-
 .../org/apache/hudi/cli/commands/SparkMain.java|  2 +-
 .../apache/hudi/cli/utils/InputStreamConsumer.java |  2 +-
 .../java/org/apache/hudi/cli/utils/SparkUtil.java  |  2 +-
 .../org/apache/hudi/CompactionAdminClient.java | 18 +++
 .../java/org/apache/hudi/HoodieCleanClient.java| 16 +++---
 .../java/org/apache/hudi/HoodieWriteClient.java| 62 +++---
 .../client/embedded/EmbeddedTimelineService.java   | 12 ++---
 .../bloom/BucketizedBloomCheckPartitioner.java | 10 ++--
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  | 12 ++---
 .../hbase/DefaultHBaseQPSResourceAllocator.java|  4 +-
 .../org/apache/hudi/index/hbase/HBaseIndex.java| 44 +++
 .../org/apache/hudi/io/HoodieAppendHandle.java | 14 ++---
 .../java/org/apache/hudi/io/HoodieCleanHelper.java | 10 ++--
 .../org/apache/hudi/io/HoodieCommitArchiveLog.java | 18 +++
 .../org/apache/hudi/io/HoodieCreateHandle.java | 10 ++--
 .../org/apache/hudi/io/HoodieKeyLookupHandle.java  | 20 +++
 .../java/org/apache/hudi/io/HoodieMergeHandle.java | 22 
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |  6 +--
 .../io/compact/HoodieRealtimeTableCompactor.java   | 22 
 .../org/apache/hudi/metrics/HoodieMetrics.java | 10 ++--
 .../apache/hudi/metrics/JmxMetricsReporter.java|  4 +-
 .../main/java/org/apache/hudi/metrics/Metrics.java |  4 +-
 .../hudi/metrics/MetricsGraphiteReporter.java  |  6 +--
 .../hudi/metrics/MetricsReporterFactory.java   |  4 +-
 .../apache/hudi/table/HoodieCopyOnWriteTable.java  | 48 -
 .../apache/hudi/table/HoodieMergeOnReadTable.java  | 22 
 .../java/org/apache/hudi/table/HoodieTable.java| 10 ++--
 .../org/apache/hudi/table/RollbackExecutor.java| 10 ++--
 hudi-client/src/test/java/HoodieClientExample.java |  8 +--
 .../src/test/java/org/apache/hudi/TestCleaner.java |  4 +-
 .../org/apache/hudi/TestCompactionAdminClient.java |  6 ++-
 .../java/org/apache/hudi/TestHoodieClientBase.java |  2 +-
 .../hudi/TestHoodieClientOnCopyOnWriteStorage.java |  6 ++-
 .../src/test/java/org/apache/hudi/TestMultiFS.java | 10 ++--
 .../apache/hudi/common/HoodieClientTestUtils.java  |  2 +-
 .../apache/hudi/table/TestCopyOnWriteTable.java|  4 +-
 .../hudi/common/model/HoodieCommitMetadata.java|  4 +-
 .../hudi/common/model/HoodiePartitionMetadata.java |  6 +--
 .../common/model/HoodieRollingStatMetadata.java|  4 +-
 .../hudi/common/table/HoodieTableConfig.java   |  2 +-
 .../hudi/common/table/HoodieTableMetaClient.java   |  2 +-
 .../table/timeline/HoodieActiveTimeline.java   |  2 +-
 .../table/timeline/HoodieArchivedTimeline.java |  2 +-
 .../table/timeline/HoodieDefaultTimeline.java  |  2 +-
 .../table/view/AbstractTableFileSystemView.java| 16 +++---
 .../common/table/view/FileSystemViewManager.java   | 20 +++
 .../table/view/HoodieTableFileSystemView.java  |  4 +-
 .../IncrementalTimelineSyncFileSystemView.java | 42 +++
 .../table/view/PriorityBasedFileSystemView.java| 18 +++
 .../view/RemoteHoodieTableFileSystemView.java  |  4 +-
 .../table/view/RocksDbBasedFileSystemView.java | 24 -
 .../view/SpillableMapBasedFileSystemView.java  |  6 +--
 .../common/util/DFSPropertiesConfiguration.java|  4 +-
 .../hudi/common/util/FailSafeConsistencyGuard.java |  2 +-
 .../common/util/HoodieRecordSizeEstimator.java |  4 +-
 .../org/apache/hudi/common/util/RocksDBDAO.java|  2 +-
 .../hudi/common/util/TimelineDiffHelper.java   |  6 +--
 .../common/util/queue/BoundedInMemoryExecutor.java | 10 ++--
 .../common/util/queue/BoundedInMemoryQueue.java|  4 +-
 .../table/view/TestHoodieTableFileSystemView.java  |  2 +-
 .../table/view/TestIncrementalFSViewSync.java  |  2 +-
 .../hudi/hadoop/HoodieParquetInputFormat.java  |  2 +-
 .../hudi/hadoop/HoodieROTablePathFilter.java   |  2 +-
 .../hudi/hadoop/RecordReaderValueIterator.java |  2 +-
 .../realtime/HoodieParquetRealtimeInputFormat.java |  2 +-
 .../realtime/HoodieRealtimeRecordReader.java   |  2 +-
 .../java/org/apache/hudi/hive/HiveSyncTool.java|  2 +-
 .../org/apache/hudi/hive/HoodieHiveClient.java |  2 +-
 .../org/apache/hudi/hive/util/HiveTestService.java |  2 +-
 hudi-spark/src/test/java/HoodieJavaApp.java| 12 ++---
 .../src/test/java/HoodieJavaStreamingApp.java  | 18 +++
 .../hudi/timeline/service/TimelineService.java |  8 +--
 

[GitHub] [incubator-hudi] leesf merged pull request #1092: [checkstyle] Unify LOG form

2019-12-10 Thread GitBox
leesf merged pull request #1092: [checkstyle] Unify LOG form
URL: https://github.com/apache/incubator-hudi/pull/1092
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1082: [HUDI-380]: updated ide setup on contributing.html page

2019-12-10 Thread GitBox
pratyakshsharma commented on issue #1082: [HUDI-380]: updated ide setup on 
contributing.html page
URL: https://github.com/apache/incubator-hudi/pull/1082#issuecomment-563939322
 
 
   @vinothchandar comments are taken care of. Please review again. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1082: [HUDI-380]: updated ide setup on contributing.html page

2019-12-10 Thread GitBox
pratyakshsharma commented on a change in pull request #1082: [HUDI-380]: 
updated ide setup on contributing.html page
URL: https://github.com/apache/incubator-hudi/pull/1082#discussion_r355915096
 
 

 ##
 File path: docs/contributing.md
 ##
 @@ -25,6 +25,7 @@ To contribute, you would need to fork the Hudi code on 
Github & then clone your
 
 We have embraced the code style largely based on [google 
format](https://google.github.io/styleguide/javaguide.html). Please setup your 
IDE with style files from 
[here](https://github.com/apache/incubator-hudi/tree/master/style).
 These instructions have been tested on IntelliJ. We also recommend setting up 
the [Save Action 
Plugin](https://plugins.jetbrains.com/plugin/7642-save-actions) to auto format 
& organize imports on save. The Maven Compilation life-cycle will fail if there 
are checkstyle violations.
+If you face jetty version related issues while running test cases, we 
recommend you to add spark jars to the classpath of your module in Intellij. 
 
 Review comment:
   Done with the changes @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1090: [HUDI-390] Add backtick character in hive queries to support hive identiter as tablename

2019-12-10 Thread GitBox
leesf commented on a change in pull request #1090: [HUDI-390] Add backtick 
character in hive queries to support hive identiter as tablename
URL: https://github.com/apache/incubator-hudi/pull/1090#discussion_r355912206
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/util/SchemaUtil.java
 ##
 @@ -402,7 +402,7 @@ public static String generateCreateDDL(MessageType 
storageSchema, HiveSyncConfig
 
 String partitionsStr = 
partitionFields.stream().collect(Collectors.joining(","));
 StringBuilder sb = new StringBuilder("CREATE EXTERNAL TABLE  IF NOT EXISTS 
");
-sb = sb.append(config.databaseName).append(".").append(config.tableName);
+sb = 
sb.append("`").append(config.databaseName).append("`.`").append(config.tableName).append("`");
 
 Review comment:
   Could we define \` as a viarable, and use it in HoodieHiveClient?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-397) Normalize log print statement

2019-12-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-397:
--
Component/s: Testing

> Normalize log print statement
> -
>
> Key: HUDI-397
> URL: https://issues.apache.org/jira/browse/HUDI-397
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Priority: Major
>
> In test suite module, there are many logging statements looks like this 
> pattern:
> {code:java}
> log.info(String.format("- inserting input data %s 
> --", this.getName()));
> {code}
> IMO, it's not a good design. We need to refactor it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-397) Normalize log print statement

2019-12-10 Thread vinoyang (Jira)
vinoyang created HUDI-397:
-

 Summary: Normalize log print statement
 Key: HUDI-397
 URL: https://issues.apache.org/jira/browse/HUDI-397
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang


In test suite module, there are many logging statements looks like this pattern:
{code:java}
log.info(String.format("- inserting input data %s 
--", this.getName()));
{code}
IMO, it's not a good design. We need to refactor it.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-396) Provide an documentation to describe how to use test suite

2019-12-10 Thread vinoyang (Jira)
vinoyang created HUDI-396:
-

 Summary: Provide an documentation to describe how to use test suite
 Key: HUDI-396
 URL: https://issues.apache.org/jira/browse/HUDI-396
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Docs
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)