[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183041
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) 
throws IOException {
 defaults.load(in);
 return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map options) throws Exception 
{
+scala.collection.immutable.Map ioptions = 
toScalaImmutableMap(options);
+JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+String url = jdbcOptions.url();
+String table = jdbcOptions.tableOrQuery();
+JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+if (tableExists) {
+  JdbcDialect dialect = JdbcDialects.get(url);
+  try {
+PreparedStatement statement = 
conn.prepareStatement(dialect.getSchemaQuery(table));
+try {
+  statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+  ResultSet rs = statement.executeQuery();
+  try {
+StructType structType;
+if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+  structType = JdbcUtils.getSchema(rs, dialect, true);
+} else {
+  structType = JdbcUtils.getSchema(rs, dialect, false);
+}
+return 
AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." 
+ table);
+  } finally {
+rs.close();
+  }
+} finally {
+  statement.close();
+}
+  } finally {
+conn.close();
+  }
+} else {
+  throw new HoodieException(String.format("%s table not exists!", table));
 
 Review comment:
   change to `table does not exist!`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183326
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) 
throws IOException {
 defaults.load(in);
 return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map options) throws Exception 
{
+scala.collection.immutable.Map ioptions = 
toScalaImmutableMap(options);
+JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+String url = jdbcOptions.url();
+String table = jdbcOptions.tableOrQuery();
+JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+if (tableExists) {
+  JdbcDialect dialect = JdbcDialects.get(url);
+  try {
+PreparedStatement statement = 
conn.prepareStatement(dialect.getSchemaQuery(table));
+try {
+  statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+  ResultSet rs = statement.executeQuery();
+  try {
+StructType structType;
+if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+  structType = JdbcUtils.getSchema(rs, dialect, true);
+} else {
+  structType = JdbcUtils.getSchema(rs, dialect, false);
+}
+return 
AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." 
+ table);
+  } finally {
+rs.close();
+  }
+} finally {
+  statement.close();
+}
+  } finally {
+conn.close();
+  }
+} else {
+  throw new HoodieException(String.format("%s table not exists!", table));
+}
+  }
+
+  @SuppressWarnings("unchecked")
+  private static  scala.collection.immutable.Map 
toScalaImmutableMap(java.util.Map javaMap) {
 
 Review comment:
   import the java collection classes?`Map`, `List`, `ArrayList` ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366184290
 
 

 ##
 File path: 
hudi-utilities/src/test/resources/delta-streamer-config/source-jdbc.avsc
 ##
 @@ -0,0 +1,59 @@
+/*
 
 Review comment:
   any reason why the existing `source.avsc` won't work for you? Like to avoid 
creating new schema if possible 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366182135
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) 
throws IOException {
 defaults.load(in);
 return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map options) throws Exception 
{
 
 Review comment:
   rename to `getJDBCSchema`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183554
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -511,6 +524,22 @@ public void testNullSchemaProvider() throws Exception {
 }
   }
 
+  @Test
+  public void testJdbcbasedSchemaProvider() throws Exception {
 
 Review comment:
   can we create a separate test class for this? given you are only testing the 
schema provider?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

2020-01-13 Thread GitBox
zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support 
Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   Most ideas of this PR is pretty similar to 
https://github.com/apache/incubator-hudi/pull/1109. 
   
   This PR is also compatible with Scala 2.12. You can build it with: 
   ```mvn clean install -Dscala.version=2.12.10 -scala.binary.version=2.12```
   
   Here are some major differences between 
https://github.com/apache/incubator-hudi/pull/1109:
   - updated kafka-source.properties & kafka-source.properties .
   - This parameter: ```ConsumerConfig.GROUP_ID_CONFIG``` is defined in 
```TestKafkaSource.java``` rather than in ```KafkaOffsetGen.java```. Because 
this config should be decided by the client side but not the Hudi side.
   -  For ```AvroKafkaSource.java ```, ```KafkaAvroDeserializer.class``` need 
to be set.
   ```
   props.put("key.deserializer", StringDeserializer.class);
   props.put("value.deserializer", KafkaAvroDeserializer.class);
   ```
   ## Verify this pull request
   
   This pull request is already covered by existing tests.
   
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-574033535
 
 
   @vinothchandar 
   Please review. I mainly did something:
   1. Added test cases
   2. All logic is implemented using java
   3. The jdbc code logic references spark 2.4.4 and spark 3.x


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-531) Add java doc for hudi test suite general classes

2020-01-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-531:
--
Description: Currently, the general classes (under src/main dir) has no 
java docs. We should add doc for those classes.  (was: Currently, the general 
classes (under src dir) has no java docs. We should add doc for those classes.)

> Add java doc for hudi test suite general classes
> 
>
> Key: HUDI-531
> URL: https://issues.apache.org/jira/browse/HUDI-531
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>
> Currently, the general classes (under src/main dir) has no java docs. We 
> should add doc for those classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-532) Add java doc for hudi test suite test classes

2020-01-13 Thread vinoyang (Jira)
vinoyang created HUDI-532:
-

 Summary: Add java doc for hudi test suite test classes
 Key: HUDI-532
 URL: https://issues.apache.org/jira/browse/HUDI-532
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: wangxianghu


Currently, the test classes(under test/java dir) has no java docs. We should 
add more doc for those classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-531) Add java doc for hudi test suite general classes

2020-01-13 Thread vinoyang (Jira)
vinoyang created HUDI-531:
-

 Summary: Add java doc for hudi test suite general classes
 Key: HUDI-531
 URL: https://issues.apache.org/jira/browse/HUDI-531
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Testing
Reporter: vinoyang
Assignee: wangxianghu


Currently, the general classes (under src dir) has no java docs. We should add 
doc for those classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-13 Thread GitBox
vinothchandar commented on issue #1157: [HUDI-332]Add operation type 
(insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-574027065
 
 
   @hddong thanks! avro works based on field positions, so reordering them was 
my concern. Thanks for addressing this. over to @bvaradar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1208: [HUDI-304] Bring back spotless plugin

2020-01-13 Thread GitBox
vinothchandar commented on issue #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574024869
 
 
   >ocument that developers could use checkstyle.xml file in style folder in 
checkstyle plugin and things will go well 
   
   I was able to use checkstyle to format in IntelliJ. This is fine.. but we 
should clearly document this. maybe file a JIRA?
   
   On import order, we can take a second stab may be down the line? again 
filing a JIRA would be great for tracking.. 
   
   On this PR, my concern was we are reformatting again due to the 120 
character limit? I was trying to see if we can avoid it. @leesf could you 
explain why 100+ files are being touched in this PR? If these were all 
checkstyle failures, then master would be broken right? I am just trying to 
understand what code really changed here, given we are close to a release..
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

2020-01-13 Thread GitBox
bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in 
adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574024580
 
 
   @nsivabalan will merge this once you are able to verify this method with 
quickstart steps.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-503) Add hudi test suite documentation into the README file of the test suite module

2020-01-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-503:
--
Status: Open  (was: New)

> Add hudi test suite documentation into the README file of the test suite 
> module
> ---
>
> Key: HUDI-503
> URL: https://issues.apache.org/jira/browse/HUDI-503
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-503) Add hudi test suite documentation into the README file of the test suite module

2020-01-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-503:
--
Status: In Progress  (was: Open)

> Add hudi test suite documentation into the README file of the test suite 
> module
> ---
>
> Key: HUDI-503
> URL: https://issues.apache.org/jira/browse/HUDI-503
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-13 Thread GitBox
vinothchandar commented on issue #1223: [HUDI-530] Fix conversion of Spark 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-574015931
 
 
   @umehrot2 is this by any way related to the quickstart breakage that 
@nsivabalan reported? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1225: Adding util methods to assist in adding deletion support to Quick Start

2020-01-13 Thread GitBox
vinothchandar commented on issue #1225: Adding util methods to assist in adding 
deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574005188
 
 
   @nsivabalan can you add a `[MINOR]` prefix to your commit and PR? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from 
mysql, and at the same time, we need to get the schema from the database. So I 
submitted this PR. A schema provider that obtains metadata through Jdbc calls 
the Spark JDBC related methods by design. And ensure the uniformity of the 
schema, such as reading historical data from spark jdbc, and Use delta streamer 
to synchronize data.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1225: Adding util methods to assist in adding deletion support to Quick Start

2020-01-13 Thread GitBox
nsivabalan opened a new pull request #1225: Adding util methods to assist in 
adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225
 
 
   Adding util methods to assist in adding deletion support to Quick Start
   
   ## Verify this pull request
   
   Latest master has issues w/ spark avro dependency. So, couldn't verify. But 
the code as such is not a prod code. It is just used in Quick start. 
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-01-13 Thread GitBox
yanghua commented on issue #1100: [HUDI-289] Implement a test suite to support 
long running test for Hudi writing and querying end-end
URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-573978162
 
 
   @n3nash This PR has conflicts, I have rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-523) Upgrade Hudi to Spark DataSource V2

2020-01-13 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated HUDI-523:
---
Description: May be need spark3  (was: As spark upgrade to 2.4, we can 
upgrade to datasource api v2 now.)

> Upgrade Hudi to Spark DataSource V2
> ---
>
> Key: HUDI-523
> URL: https://issues.apache.org/jira/browse/HUDI-523
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: hong dongdong
>Priority: Major
>
> May be need spark3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch hudi_test_suite_refactor updated (09c34a0 -> 3dc85eb)

2020-01-13 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


omit 09c34a0  [HUDI-442] Fix 
TestComplexKeyGenerator#testSingleValueKeyGenerator and 
testMultipleValueKeyGenerator NPE
omit 66463ff  [MINOR] Fix compile error about the deletion of 
HoodieActiveTimeline#createNewCommitTime
omit 1d2ecbc  [HUDI-391] Rename module name from hudi-bench to 
hudi-test-suite and fix some checkstyle issues (#1102)
omit 9b55d37  [HUDI-394] Provide a basic implementation of test suite
 add 8172197  Fix Error: java.lang.IllegalArgumentException: Can not create 
a Path from an empty string in HoodieCopyOnWrite#deleteFilesFunc (#1126)
 add 4b1b3fc  [MINOR] Set info servity for ImportOrder temporarily (#1127)
 add 41f3677  [MINOR] fix typo
 add dd06660  [MINOR] fix typo
 add 94aec96  [minor] Fix few typos in the java docs (#1132)
 add 9c4217a  [HUDI-389] Fixing Index look up to return right partitions 
for a given key along with fileId with Global Bloom (#1091)
 add 8affdf8  [HUDI-416] Improve hint information for cli (#1110)
 add 3c811ec  [MINOR] fix typos
 add def18a5  [MINOR] optimize hudi timeline service (#1137)
 add 842eabb  [HUDI-470] Fix NPE when print result via hudi-cli (#1138)
 add f20a130  [MINOR] typo fix (#1142)
 add 01c25d6  [MINOR] Update the java doc of HoodieTableType (#1148)
 add 58c5bed  [HUDI-453] Fix throw failed to archive commits error when 
writing data to MOR/COW table
 add 179837e  Fix checkstyle
 add 2f25416  Skip setting commit metadata
 add 8440482  Fix empty content clean plan
 add e4ea7a2  Update comment
 add 2a823f3  [MINOR]: alter some wrong params which bring fatal exception
 add ab6ae5c  [HUDI-482] Fix missing @Override annotation on methods (#1156)
 add e637d9e  [HUDI-455] Redo hudi-client log statements using SLF4J (#1145)
 add bb90ded  [MINOR] Fix out of limits for results
 add 36c0e6b  [MINOR] Fix out of limits for results
 add 74b00d1  trigger rebuild
 add 619f501  Clean up code
 add add4b1e  Merge pull request #1143 from BigDataArtisans/outoflimit
 add 47c1f74  [HUDI-343]: Create a DOAP file for Hudi
 add 98c0d8c  Merge pull request #1160 from smarthi/HUDI-343
 add dde21e7  [HUDI-402]: code clean up in test cases
 add e1e5fe3  [MINOR] Fix error usage of String.format (#1169)
 add ff1113f  [HUDI-492]Fix show env all in hudi-cli
 add 290278f  [HUDI-118]: Options provided for passing properties to 
Cleaner, compactor and importer commands
 add a733f4e  [MINOR] Optimize hudi-cli module (#1136)
 add 726ae47  [MINOR]Optimize hudi-client module (#1139)
 add 7031445  [HUDI-377] Adding Delete() support to DeltaStreamer (#1073)
 add 28ccf8c  [HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate 
in HiveIncrementalPuller (#1167)
 add b9fab0b  Revert "[HUDI-455] Redo hudi-client log statements using 
SLF4J (#1145)" (#1181)
 add 2d5b79d  [HUDI-438] Merge duplicated code fragment in 
HoodieSparkSqlWriter (#1114)
 add 8f935e7  [HUDI-406]: added default partition path in 
TimestampBasedKeyGenerator
 add c78092d  [HUDI-501] Execute docker/setup_demo.sh in any directory
 add 75c3f63  [HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config 
from DataSource
 add b5df672  [HUDI-464] Use Hive Exec Core for tests (#1125)
 add 8306f74  [HUDI-417] Refactor HoodieWriteClient so that commit logic 
can be shareable by both bootstrap and normal write operations (#1166)
 add 9706f65  [HUDI-508] Standardizing on "Table" instead of "Dataset" 
across code (#1197)
 add 9884972  [MINOR] Remove old jekyll config file (#1198)
 add aba8387  Update deprecated HBase API
 add 480fc78  [HUDI-319] Add a new maven profile to generate unified 
Javadoc for all Java and Scala classes (#1195)
 add d09eacd  [HUDI-25] Optimize HoodieInputformat.listStatus() for faster 
Hive incremental queries on Hoodie
 add 5af3dc6  [HUDI-331]Fix java docs for all public apis in 
HoodieWriteClient (#)
 add 3c90d25  [HUDI-114]: added option to overwrite payload implementation 
in hoodie.properties file
 add 04afac9  [HUDI-248] CLI doesn't allow rolling back a Delta commit
 add b95367d  [HUDI-469] Fix: HoodieCommitMetadata only show first commit 
insert rows.
 add e103165  [CLEAN] replace utf-8 constant with StandardCharsets.UTF_8
 add 017ee8e  [MINOR] Fix partition typo (#1209)
 add d9675c4  [HUDI-522] Use the same version jcommander uniformly (#1214)
 add ad50008  [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to 
spark-avro library instead of databricks-avro, add support for Decimal/Date 
types
 add 971c7d4  [HUDI-322] DeltaSteamer should pick checkpoints off only 
deltacommits for MOR tables
 add a44c61b  [HUDI-502] provide a custom time zone 

[jira] [Assigned] (HUDI-523) Upgrade Hudi to Spark DataSource V2

2020-01-13 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong reassigned HUDI-523:
--

Assignee: (was: hong dongdong)

> Upgrade Hudi to Spark DataSource V2
> ---
>
> Key: HUDI-523
> URL: https://issues.apache.org/jira/browse/HUDI-523
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: hong dongdong
>Priority: Major
>
> As spark upgrade to 2.4, we can upgrade to datasource api v2 now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1224: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu opened a new pull request #1224: [HUDI-397] Normalize log print 
statement
URL: https://github.com/apache/incubator-hudi/pull/1224
 
 
   ## What is the purpose of the pull request
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Brief change log
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Verify this pull request
   
   This pull request should be covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (c1f8aca -> fd8f1c7)

2020-01-13 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from c1f8aca  [HUDI-526] fix the HoodieAppendHandle
 add fd8f1c7  [MINOR] Reuse random object (#1222)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/io/strategy/TestHoodieCompactionStrategy.java | 3 ++-
 .../test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java   | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)



[GitHub] [incubator-hudi] vinothchandar merged pull request #1222: [MINOR] Reuse random object

2020-01-13 Thread GitBox
vinothchandar merged pull request #1222: [MINOR] Reuse random object
URL: https://github.com/apache/incubator-hudi/pull/1222
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573958871
 
 
   @hmatu Roll back, It closed automatically


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573957902
 
 
   Close again?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu closed pull request #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu closed pull request #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573955655
 
 
   Hi @n3nash, This PR covered all the logs in test-suite module. Besides, I 
found the wrong email was used. I will fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573953742
 
 
   Hi @hmatu, Thanks for your advice, I will pay attention next time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-13 Thread GitBox
umehrot2 commented on issue #1223: [HUDI-530] Fix conversion of Spark struct 
type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-573951118
 
 
   @vinothchandar @bvaradar The migration to `spark-avro` has introduced this 
issue which was earlier reported for EMR 
https://github.com/apache/incubator-hudi/issues/1034 as we were already using 
`spark-avro` internally. Lets try to get this in before code freeze if possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-530) Datasource Writer throws error on resolving struct fields

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-530:

Labels: pull-request-available  (was: )

> Datasource Writer throws error on resolving struct fields
> -
>
> Key: HUDI-530
> URL: https://issues.apache.org/jira/browse/HUDI-530
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
>
> The issue was reported in 
> [https://github.com/apache/incubator-hudi/issues/1034] . With migration of 
> Hudi to spark 2.4.4 and using Spark's native spark-avro module, this issue 
> now exists in Hudi master.
>  
> This struct fields will not work as of now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 opened a new pull request #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-13 Thread GitBox
umehrot2 opened a new pull request #1223: [HUDI-530] Fix conversion of Spark 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   With migration of Hudi to `spark 2.4.4` and to using `native spark-avro`, 
there is an issue with conversion of struct fields because of the way 
spark-avro handles avro schema conversion vs databricks-avro. This has been 
reported earlier for EMR in 
https://github.com/apache/incubator-hudi/issues/1034 and now exists in Hudi 
master as well.
   
   The issue is `spark-avro` has a different way of naming `Avro namespace` 
than `databricks-avro`, while converting the schema to avro schema. For example 
suppose the data is:
   
   ```
   List("{ \"deviceId\": \"a\", \"eventType\": \"uditevent1\", 
\"eventTimeMilli\": 1574297893836, \"location\": { \"latitude\": 2.5, 
\"longitude\": 3.5 }}");
   ```
   
   `databricks-avro` used to convert it to avro schema, such that namespace of 
`location` struct field has field name in it:
   ```
   {
 "type" : "record",
 "name" : "hudi_issue_1034_dec30_01_record",
 "namespace" : "hoodie.hudi_issue_1034_dec30_01",
 "fields" : [ {
   "name" : "deviceId",
   "type" : [ "string", "null" ]
 }, {
   "name" : "eventTimeMilli",
   "type" : [ "long", "null" ]
 }, {
   "name" : "location",
   "type" : [ {
 "type" : "record",
 "name" : "location",
 "namespace" : "hoodie.hudi_issue_1034_dec30_01.location",
 "fields" : [ {
   "name" : "latitude",
   "type" : [ "double", "null" ]
 }, {
   "name" : "longitude",
   "type" : [ "double", "null" ]
 } ]
   }, "null" ]
 } ]
   }
   ```
   `spark-avro` now converts the same to the following, and uses the `record 
name` in the schema instead:
   ```
   {
 "type" : "record",
 "name" : "hudi_issue_1034_dec31_01_record",
 "namespace" : "hoodie.hudi_issue_1034_dec31_01",
 "fields" : [ {
   "name" : "deviceId",
   "type" : [ "string", "null" ]
 }, {
   "name" : "eventTimeMilli",
   "type" : [ "long", "null" ]
 }, {
   "name" : "location",
   "type" : [ {
 "type" : "record",
 "name" : "location",
 "namespace" : 
"hoodie.hudi_issue_1034_dec31_01.hudi_issue_1034_dec31_01_record",
 "fields" : [ {
   "name" : "latitude",
   "type" : [ "double", "null" ]
 }, {
   "name" : "longitude",
   "type" : [ "double", "null" ]
 } ]
   }, "null" ]
 } ]
   }
   ```
   
   This PR fixes the above issue as we have now migrated to spark-avro.
   
   ## Brief change log
   
   - Fix conversion of Spark struct type to Avro schema
   - Modify the schema of data used in unit tests and integration tests to have 
struct type data as well, so that any issue with struct type can be caught 
earlier
   
   ## Verify this pull request
   
   This PR modifies the schema of the data that is being used across unit tests 
and certain integration tests to have a struct field. From now on 
Unit/Integration tests would catch any issue with struct fields.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1222: [MINOR] Reuse random object

2020-01-13 Thread GitBox
lamber-ken opened a new pull request #1222: [MINOR] Reuse random object
URL: https://github.com/apache/incubator-hudi/pull/1222
 
 
   ## What is the purpose of the pull request
   
   Reuse random object.
   
   ## Brief change log
   
 - *Reuse random object.*
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-530) Datasource Writer throws error on resolving struct fields

2020-01-13 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-530:
--

 Summary: Datasource Writer throws error on resolving struct fields
 Key: HUDI-530
 URL: https://issues.apache.org/jira/browse/HUDI-530
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Spark Integration
Reporter: Udit Mehrotra


The issue was reported in 
[https://github.com/apache/incubator-hudi/issues/1034] . With migration of Hudi 
to spark 2.4.4 and using Spark's native spark-avro module, this issue now 
exists in Hudi master.

 

This struct fields will not work as of now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-530) Datasource Writer throws error on resolving struct fields

2020-01-13 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra reassigned HUDI-530:
--

Assignee: Udit Mehrotra

> Datasource Writer throws error on resolving struct fields
> -
>
> Key: HUDI-530
> URL: https://issues.apache.org/jira/browse/HUDI-530
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>
> The issue was reported in 
> [https://github.com/apache/incubator-hudi/issues/1034] . With migration of 
> Hudi to spark 2.4.4 and using Spark's native spark-avro module, this issue 
> now exists in Hudi master.
>  
> This struct fields will not work as of now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-13 Thread GitBox
bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366091392
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
 
 Review comment:
   @bschell : The only difference between GlobalDeleteKeyGenerator and 
ComplexKeyGenerator is that the former always creates an empty-partition path. 
right ? In that case, can we simply refactor the getKey() method in 
ComplexKeyGenerator and have GlobalDeleteKeyGenerator extend 
ComplexKeyGenerator with necessary changes to make it to work for empty 
partition-path. The advantage is we use all the logic related  to nested fields 
handling in one place ? Let me know your thoughts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-13 Thread GitBox
bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366088953
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
 ##
 @@ -16,8 +16,10 @@
  * limitations under the License.
  */
 
-package org.apache.hudi;
+package org.apache.hudi.keygen;
 
 Review comment:
   This is a backwards incompatible change. Users would have custom key 
generators using configuration : 
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()
   
   It makes sense to move to separate package but we need to call out the 
change in release notes . Please open a tracking ticket to update release notes 
for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573923932
 
 
   
   > Sure. I will send another PR. Currently our work only supports 2.12, but I 
can try to see if it is possible to support both 2.11 and 2.12.
   
   @zhedoubushishi : Is your change different from what is being done as part 
of this PR ?  Anyways, it would help if you can open a WIP PR and we can cross 
check with this PR to see if we are missing anything here. 
   
   Also @zhedoubushishi  @ezhux : I see this info in stack-overflow to build 
both 2.11 and 2.12 versions of packages. https://stackoverflow.com/a/46785150. 
Can you check if this model would work for hudi ? We would need to change pom 
for hudi-spark and its dependents :  hudi-spark-bundle and hudi-utilities-bundle


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573920277
 
 
   @ezhux : just see that you have pushed some changes. Let us know when you 
want us to review the code. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-529) Enable cobertura coverage reporting

2020-01-13 Thread Prashant Wason (Jira)
Prashant Wason created HUDI-529:
---

 Summary: Enable cobertura coverage reporting
 Key: HUDI-529
 URL: https://issues.apache.org/jira/browse/HUDI-529
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: Prashant Wason


Hudi project has code coverage enabled via the jacoco plugin. Jenkins has 
better support for coverage reporting using the Jenkins Cobertura plugin. 

This enhancement provides a way to convert the jacoco coverage report to 
cobertura format at the end of the unit test runs. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-528) Incremental Pull fails when latest commit is empty

2020-01-13 Thread Javier Vega (Jira)
Javier Vega created HUDI-528:


 Summary: Incremental Pull fails when latest commit is empty
 Key: HUDI-528
 URL: https://issues.apache.org/jira/browse/HUDI-528
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Incremental Pull
Reporter: Javier Vega


When trying to create an incremental view of a dataset, an exception is thrown 
when the latest commit in the time range is empty. In order to determine the 
schema of the dataset, Hudi will grab the [latest commit file, parse it, and 
grab the first metadata file 
path|[https://github.com/apache/incubator-hudi/blob/480fc7869d4d69e1219bf278fd9a37f27ac260f6/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala#L78-L80]].
 If the latest commit was empty though, the field which is used to determine 
file paths (partitionToWriteStats) will be empty causing the following 
exception:

 

 
{code:java}
java.util.NoSuchElementException
  at java.util.HashMap$HashIterator.nextNode(HashMap.java:1447)
  at java.util.HashMap$ValueIterator.next(HashMap.java:1474)
  at org.apache.hudi.IncrementalRelation.(IncrementalRelation.scala:80)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:65)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:46)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] garyli1019 commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

2020-01-13 Thread GitBox
garyli1019 commented on issue #1128: [HUDI-453] Fix throw failed to archive 
commits error when writing data to MOR/COW table
URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-573880940
 
 
   Thanks for the discussion.  I have been running into the same issue and 
manually remove all `xxx.clean.requested` worked!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] gfn9cho removed a comment on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2020-01-13 Thread GitBox
gfn9cho removed a comment on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-573861890
 
 
   Hi, I am using EMR 5.28 with built in hudi support.
   I was able to use hudi through spark-shell. However, when running a spark 
application as mentioned in the issue, I am getting the below error. Any 
pointers on how to resolve the conflict.
   my build.sbt is pretty much similar..
   `
   Caused by: java.lang.NoSuchMethodError: 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/HostnameVerifier;)V
   at 
com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.(SdkTLSSocketFactory.java:58)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:93)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:66)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:59)
   at 
com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
   at 
com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)
   at 
com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:324)
   at 
com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:308)
   at 
com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:237)
   at 
com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:223)
   at 
com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:177)
   at 
com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:163)
   at 
com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:61)
   at 
com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:27)
   at 
com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
   at 
com.amazonaws.glue.catalog.metastore.AWSGlueClientFactory.newClient(AWSGlueClientFactory.java:72)
   at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.(AWSCatalogMetastoreClient.java:146)
   at 
com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3007)
   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3042)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1235)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175)
   at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:167)
   at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:117)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271)
   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384)
   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] ezhux commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
ezhux commented on a change in pull request #1109: [HUDI-238] - Migrating to 
Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r366029872
 
 

 ##
 File path: hudi-utilities/pom.xml
 ##
 @@ -28,14 +28,52 @@
 
   
 ${project.parent.basedir}
+2.0.0
+2.12
   
 
   
+
+
 
 Review comment:
   removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] TadasSugintasYields commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
TadasSugintasYields commented on a change in pull request #1109: [HUDI-238] - 
Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r366029309
 
 

 ##
 File path: hudi-utilities/pom.xml
 ##
 @@ -28,14 +28,52 @@
 
   
 ${project.parent.basedir}
+2.0.0
+2.12
   
 
   
+
+
+  
+
+  net.alchim31.maven
+  scala-maven-plugin
+  ${scala-maven-plugin.version}
+
+
+  org.apache.maven.plugins
+  maven-compiler-plugin
+
+  
+
+
 
   
 org.jacoco
 jacoco-maven-plugin
   
+  
+net.alchim31.maven
+scala-maven-plugin
+
+  
+scala-compile-first
+process-resources
+
+  add-source
+  compile
+
+  
+  
+scala-test-compile
+process-test-resources
+
+  testCompile
+
+  
+
+  
 
 Review comment:
   finally got it working, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] gfn9cho commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync

2020-01-13 Thread GitBox
gfn9cho commented on issue #894: Getting java.lang.NoSuchMethodError while 
doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-573861890
 
 
   Hi, I am using EMR 5.28 with built in hudi support.
   I was able to use hudi through spark-shell. However, when running a spark 
application as mentioned in the issue, I am getting the below error. Any 
pointers on how to resolve the conflict.
   my build.sbt is pretty much similar..
   `
   Caused by: java.lang.NoSuchMethodError: 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/HostnameVerifier;)V
   at 
com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.(SdkTLSSocketFactory.java:58)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:93)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:66)
   at 
com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:59)
   at 
com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
   at 
com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)
   at 
com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:324)
   at 
com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:308)
   at 
com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:237)
   at 
com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:223)
   at 
com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:177)
   at 
com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:163)
   at 
com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:61)
   at 
com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:27)
   at 
com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
   at 
com.amazonaws.glue.catalog.metastore.AWSGlueClientFactory.newClient(AWSGlueClientFactory.java:72)
   at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.(AWSCatalogMetastoreClient.java:146)
   at 
com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3007)
   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3042)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1235)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175)
   at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:167)
   at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183)
   at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:117)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271)
   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384)
   at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215)
   at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread Prashant Wason (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014641#comment-17014641
 ] 

Prashant Wason commented on HUDI-527:
-

[WARNING] The POM for org.jamon:jamon-runtime:jar:2.3.1 is missing, no 
dependency information available

[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ hudi-hadoop-mr ---
[WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1
[WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1

 

 

[INFO] --- scala-maven-plugin:3.3.1:compile (scala-compile-first) @ hudi-spark 
---
[WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1
[WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1
[WARNING] Expected all dependencies to require Scala version: 2.11.8
[WARNING] org.apache.hudi:hudi-spark:0.5.1-SNAPSHOT requires scala version: 
2.11.8
[WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7 requires 
scala version: 2.11.8
[WARNING] org.scala-lang:scala-reflect:2.11.8 requires scala version: 2.11.8
[WARNING] com.twitter:chill_2.11:0.9.3 requires scala version: 2.11.12
[WARNING] Multiple versions of scala libraries detected!

 

 

[INFO] Compiling 16 source files to 
/home/pwason/uber/incubator-hudi/hudi-spark/target/classes at 1578943931369
[WARNING] 
/home/pwason/uber/incubator-hudi/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:84:
 warning: non-variable type argument 
org.apache.hudi.common.model.HoodieRecordPayload[Nothing] in type pattern 
org.apache.hudi.HoodieWriteClient[org.apache.hudi.common.model.HoodieRecordPayload[Nothing]]
 is unchecked since it is eliminated by erasure
[WARNING] val (writeStatuses, writeClient: 
HoodieWriteClient[HoodieRecordPayload[Nothing]]) =
[WARNING] ^
[WARNING] one warning found

 

[INFO] --- scala-maven-plugin:3.3.1:compile (scala-compile-first) @ hudi-cli ---
[WARNING] Expected all dependencies to require Scala version: 2.11.8
[WARNING] org.apache.hudi:hudi-cli:0.5.1-SNAPSHOT requires scala version: 2.11.8
[WARNING] org.apache.hudi:hudi-spark:0.5.1-SNAPSHOT requires scala version: 
2.11.8
[WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7 requires 
scala version: 2.11.8
[WARNING] org.scala-lang:scala-reflect:2.11.8 requires scala version: 2.11.8
[WARNING] org.apache.spark:spark-tags_2.11:2.4.4 requires scala version: 2.11.12
[WARNING] Multiple versions of scala libraries detected!

 

> Fix warning in project compilation
> --
>
> Key: HUDI-527
> URL: https://issues.apache.org/jira/browse/HUDI-527
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> "mvn compile" issues various warnings. 
> This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-527:

Labels: pull-request-available  (was: )

> Fix warning in project compilation
> --
>
> Key: HUDI-527
> URL: https://issues.apache.org/jira/browse/HUDI-527
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>
> "mvn compile" issues various warnings. 
> This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] prashantwason opened a new pull request #1221: [HUDI-527] scalastyle-maven-plugin moved to pluginManagement as it ionly used in hoodie-spark and hoodie-cli modules

2020-01-13 Thread GitBox
prashantwason opened a new pull request #1221: [HUDI-527] 
scalastyle-maven-plugin moved to pluginManagement as it ionly used in 
hoodie-spark and hoodie-cli modules
URL: https://github.com/apache/incubator-hudi/pull/1221
 
 
   ## What is the purpose of the pull request
   
   This fixes scalastyle-maven-plugin warnings as well as unnecessary plugin 
invocation for most of the modules which do not have scala code.
   
   ## Brief change log
   - Scala code is used in only hudi-cli and hudi-spark modules
   - scalastyle-maven-plugin has been moved from pom.xml  to 

   -  entries have been added in hudi-cli/pom.xml and hudi-spark/pom.xml
   
   This ensures that the scalastyle-maven-plugin will only be executed for the 
modules which have scala code.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   mvn clean package 
   
   ## Committer checklist
   
- [ *] Has a corresponding JIRA in PR title & commit

- [ *] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573847670
 
 
   > @zhedoubushishi : As you had mentioned that AWS EMR has internally made it 
possible to package hudi jars using scala 2.12, can you shepherd this PR ? This 
is one of the critical PRs to be fixed before next week (deadline end of week).
   > 
   > I also have a question here : Has AWS EMR migrated the scala compile 
version to 2.12 or are you supporting both 2.11 and 2.12 ? It looks like 
spark-2.4.4 (which is used for compiling Hudi) has both 2.11 and 2.12 packaging 
support. So, wondering if we can support both 2.11 and 2.12 hudi package 
generation. Let us know.
   
   Sure. I will send another PR. Currently our work only supports 2.12, but I 
can try to see if it is possible to support  both 2.11 and 2.12.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread Prashant Wason (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014616#comment-17014616
 ] 

Prashant Wason commented on HUDI-527:
-

scalastyle-maven-plugin emits missing file warnings if there are no scala files 
present in a module. Hence, it should only be enabled for modules which contain 
scala files.

[INFO] --- scalastyle-maven-plugin:1.0.0:check (default) @ hoodie ---
[WARNING] sourceDirectory is not specified or does not exist 
value=/home/pwason/uber/hoodie_oss/src/main/scala
[WARNING] testSourceDirectory is not specified or does not exist 
value=/home/pwason/uber/hoodie_oss/src/test/scala

> Fix warning in project compilation
> --
>
> Key: HUDI-527
> URL: https://issues.apache.org/jira/browse/HUDI-527
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Prashant Wason
>Priority: Minor
>
> "mvn compile" issues various warnings. 
> This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-527:

Status: In Progress  (was: Open)

> Fix warning in project compilation
> --
>
> Key: HUDI-527
> URL: https://issues.apache.org/jira/browse/HUDI-527
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Prashant Wason
>Priority: Minor
>
> "mvn compile" issues various warnings. 
> This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-527:

Status: Open  (was: New)

> Fix warning in project compilation
> --
>
> Key: HUDI-527
> URL: https://issues.apache.org/jira/browse/HUDI-527
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Prashant Wason
>Priority: Minor
>
> "mvn compile" issues various warnings. 
> This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-527) Fix warning in project compilation

2020-01-13 Thread Prashant Wason (Jira)
Prashant Wason created HUDI-527:
---

 Summary: Fix warning in project compilation
 Key: HUDI-527
 URL: https://issues.apache.org/jira/browse/HUDI-527
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Prashant Wason


"mvn compile" issues various warnings. 

This is a task to look into those warnings and fix them if required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] n3nash commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
n3nash commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573811899
 
 
   @wangxianghu Thanks for the PR, does this exhaustively take care of all the 
logs in the test-suite module ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-526] fix the HoodieAppendHandle

2020-01-13 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c1f8aca  [HUDI-526] fix the HoodieAppendHandle
c1f8aca is described below

commit c1f8acab344fa632f1cce6268d2fc765c45e8b22
Author: liujianhui 
AuthorDate: Mon Jan 13 19:16:32 2020 +0800

[HUDI-526] fix the HoodieAppendHandle
---
 hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java | 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
index edf01ce..e2dbf64 100644
--- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
+++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
@@ -23,6 +23,7 @@ import org.apache.hudi.common.model.FileSlice;
 import org.apache.hudi.common.model.HoodieDeltaWriteStat;
 import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
@@ -132,6 +133,10 @@ public class HoodieAppendHandle extends HoodieWri
   writeStatus.getStat().setFileId(fileId);
   averageRecordSize = SizeEstimator.estimate(record);
   try {
+//save hoodie partition meta in the partition path
+HoodiePartitionMetadata partitionMetadata = new 
HoodiePartitionMetadata(fs, baseInstantTime,
+new Path(config.getBasePath()), 
FSUtils.getPartitionPath(config.getBasePath(), partitionPath));
+partitionMetadata.trySave(TaskContext.getPartitionId());
 this.writer = createLogWriter(fileSlice, baseInstantTime);
 this.currentLogFile = writer.getLogFile();
 ((HoodieDeltaWriteStat) 
writeStatus.getStat()).setLogVersion(currentLogFile.getLogVersion());



[GitHub] [incubator-hudi] n3nash merged pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle

2020-01-13 Thread GitBox
n3nash merged pull request #1218: [HUDI-526] add parition meta file in 
HoodieAppendHandle
URL: https://github.com/apache/incubator-hudi/pull/1218
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1216: [HUDI-525] lack of insert info in delta_commit inflight

2020-01-13 Thread GitBox
n3nash commented on issue #1216: [HUDI-525] lack of insert info in delta_commit 
inflight
URL: https://github.com/apache/incubator-hudi/pull/1216#issuecomment-573809598
 
 
   @liujianhuiouc What functionality are we going to enhance by adding this 
information to the inflight workload profile metadata ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-01-13 Thread GitBox
xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with 
GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-573781552
 
 
   @nsivabalan Thank you for the thorough review! I'll try to address these in 
the next few days.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573730094
 
 
   Please use `git config --list` to check whether your email is right or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573729421
 
 
   If you modify some changes, you can continue to commit to same branch, no 
need to create a new pr, like 
https://github.com/apache/incubator-hudi/pull/1219 
https://github.com/apache/incubator-hudi/pull/1217 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1220: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu opened a new pull request #1220: [HUDI-397] Normalize log print 
statement
URL: https://github.com/apache/incubator-hudi/pull/1220
 
 
   ## What is the purpose of the pull request
   
   *Redo hudi-test-suite log statements using SLF4J*
   *Normalize log print statement*
   
   ## Brief change log
   
   *Redo hudi-test-suite log statements using SLF4J*
   *Normalize log print statement*
   
   ## Verify this pull request
   
   This pull request should be covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu closed pull request #1219: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu closed pull request #1219: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1219
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1219: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu opened a new pull request #1219: [HUDI-397] Normalize log print 
statement
URL: https://github.com/apache/incubator-hudi/pull/1219
 
 
   ## What is the purpose of the pull request
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Brief change log
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Verify this pull request
   
   This pull requestshould be covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1217
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-517) compact error when hoodie.compact.inline is true

2020-01-13 Thread liujianhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujianhui closed HUDI-517.
---
Resolution: Fixed

> compact error when hoodie.compact.inline is true
> 
>
> Key: HUDI-517
> URL: https://issues.apache.org/jira/browse/HUDI-517
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction
>Reporter: liujianhui
>Priority: Minor
>
> # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as 
> true
>  # the duration of the write process is 1 second
>  # the instant time of the compact is same to the commit instant time
>  
> ```
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (20200110171526) Instants 
> :[[20200110171526__deltacommit__COMPLETED]]
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
>  at 
> org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-518) compact error when hoodie.compact.inline is true

2020-01-13 Thread liujianhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujianhui resolved HUDI-518.
-
Resolution: Fixed

> compact error when hoodie.compact.inline is true
> 
>
> Key: HUDI-518
> URL: https://issues.apache.org/jira/browse/HUDI-518
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction, Writer Core
>Reporter: liujianhui
>Priority: Minor
>
> # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as 
> true
>  # the duration of the write process is 1 second
>  # the instant time of the compact is same to the commit instant time
>  
> {code}
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (20200110171526) Instants 
> :[[20200110171526__deltacommit__COMPLETED]]
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
>  at 
> org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-518) compact error when hoodie.compact.inline is true

2020-01-13 Thread liujianhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujianhui updated HUDI-518:

Status: Open  (was: New)

> compact error when hoodie.compact.inline is true
> 
>
> Key: HUDI-518
> URL: https://issues.apache.org/jira/browse/HUDI-518
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction, Writer Core
>Reporter: liujianhui
>Priority: Minor
>
> # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as 
> true
>  # the duration of the write process is 1 second
>  # the instant time of the compact is same to the commit instant time
>  
> {code}
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (20200110171526) Instants 
> :[[20200110171526__deltacommit__COMPLETED]]
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
>  at 
> org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print 
statement
URL: https://github.com/apache/incubator-hudi/pull/1217
 
 
   ## What is the purpose of the pull request
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Brief change log
   
   *Normalize log print statement*
   *Redo hudi-test-suite log statements using SLF4J*
   
   ## Verify this pull request
   
   This pull request should be covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-517) compact error when hoodie.compact.inline is true

2020-01-13 Thread liujianhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujianhui updated HUDI-517:

Status: Open  (was: New)

> compact error when hoodie.compact.inline is true
> 
>
> Key: HUDI-517
> URL: https://issues.apache.org/jira/browse/HUDI-517
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction
>Reporter: liujianhui
>Priority: Minor
>
> # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as 
> true
>  # the duration of the write process is 1 second
>  # the instant time of the compact is same to the commit instant time
>  
> ```
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (20200110171526) Instants 
> :[[20200110171526__deltacommit__COMPLETED]]
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043)
>  at 
> org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018)
>  at 
> org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479)
>  at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152)
>  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-526) inline compact not work

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-526:

Labels: pull-request-available  (was: )

> inline compact not work
> ---
>
> Key: HUDI-526
> URL: https://issues.apache.org/jira/browse/HUDI-526
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Compaction
>Reporter: liujianhui
>Priority: Minor
>  Labels: pull-request-available
>
> hoodie.compact.inline set as true
> hoodie.index.type set as INMEMEORY
>  
> compact not occur after dela commit
> {code}
> 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Checking if compaction needs 
> to be run on file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Compacting merge on read table 
> file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO FileSystemViewManager: Creating InMemory based view 
> for basePath file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
> from file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: 
> [file:///], Config:[Configuration: core-default.xml, core-site.xml, 
> mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, 
> hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: 
> [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2]
> 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from 
> file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) 
> from file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline 
> for file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants 
> [[20200109181330__deltacommit__COMPLETED], 
> [2020011017__deltacommit__COMPLETED], 
> [20200110171526__deltacommit__COMPLETED], 
> [20200113105844__deltacommit__COMPLETED], 
> [20200113145851__deltacommit__COMPLETED], 
> [20200113155502__deltacommit__COMPLETED], 
> [20200113164342__deltacommit__COMPLETED]]
> 20/01/13 16:43:43 INFO HoodieRealtimeTableCompactor: Compacting 
> file:///tmp/hudi_cow_table_read with commit 20200113164343
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
> from file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: 
> [file:///], Config:[Configuration: core-default.xml, core-site.xml, 
> mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, 
> hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: 
> [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2]
> 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from 
> file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) 
> from file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline 
> for file:///tmp/hudi_cow_table_read
> 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants 
> [[20200109181330__deltacommit__COMPLETED], 
> [2020011017__deltacommit__COMPLETED], 
> [20200110171526__deltacommit__COMPLETED], 
> [20200113105844__deltacommit__COMPLETED], 
> [20200113145851__deltacommit__COMPLETED], 
> [20200113155502__deltacommit__COMPLETED], 
> [20200113164342__deltacommit__COMPLETED]]
> {code} 
> not compact time record in the .hoodie path



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement
URL: https://github.com/apache/incubator-hudi/pull/1217
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] liujianhuiouc opened a new pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle

2020-01-13 Thread GitBox
liujianhuiouc opened a new pull request #1218: [HUDI-526] add parition meta 
file in HoodieAppendHandle
URL: https://github.com/apache/incubator-hudi/pull/1218
 
 
   compact will retrieve the partition path which has partition meta file at 
that path
   
   add partition meta file in the partition path
   ```
   -rw-r--r--   1 liujianhui  wheel2576  1 10 17:00 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.2_2-149-210
   -rw-r--r--   1 liujianhui  wheel2886  1 10 17:15 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.3_1-160-226
   -rw-r--r--   1 liujianhui  wheel2571  1 13 10:58 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.4_1-7-12
   -rw-r--r--   1 liujianhui  wheel2572  1 13 14:58 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.5_1-7-12
   -rw-r--r--   1 liujianhui  wheel2572  1 13 15:55 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.6_1-7-12
   -rw-r--r--   1 liujianhui  wheel2576  1 13 16:43 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.7_1-18-29
   -rw-r--r--   1 liujianhui  wheel2571  1 13 19:09 
.5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.8_1-7-12
   -rw-r--r--   1 liujianhui  wheel  93  1 13 19:09 
.hoodie_partition_metadata
   -rw-r--r--   1 liujianhui  wheel  439809  1 13 19:09 
5e8241a6-7844-4c8e-8428-966438424640-0_0-12-20_20200113190955.parquet
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-397) Normalize log print statement

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-397:

Labels: pull-request-available  (was: )

> Normalize log print statement
> -
>
> Key: HUDI-397
> URL: https://issues.apache.org/jira/browse/HUDI-397
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>
> In test suite module, there are many logging statements looks like this 
> pattern:
> {code:java}
> log.info(String.format("- inserting input data %s 
> --", this.getName()));
> {code}
> IMO, it's not a good design. We need to refactor it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement

2020-01-13 Thread GitBox
wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print 
statement
URL: https://github.com/apache/incubator-hudi/pull/1217
 
 
   ## What is the purpose of the pull request
   
   *Normalize log print statement*
   *Redo udi-test-suite log statements using SLF4J*
   
   ## Brief change log
   
   *Normalize log print statement*
   *Redo udi-test-suite log statements using SLF4J*
   
   ## Verify this pull request
   
   This pull request should be covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-397) Normalize log print statement

2020-01-13 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-397:
-
Status: In Progress  (was: Open)

> Normalize log print statement
> -
>
> Key: HUDI-397
> URL: https://issues.apache.org/jira/browse/HUDI-397
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>
> In test suite module, there are many logging statements looks like this 
> pattern:
> {code:java}
> log.info(String.format("- inserting input data %s 
> --", this.getName()));
> {code}
> IMO, it's not a good design. We need to refactor it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-397) Normalize log print statement

2020-01-13 Thread wangxianghu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014226#comment-17014226
 ] 

wangxianghu commented on HUDI-397:
--

Hi [~jotarada], 3 days has passed since you asked for this issue, and I haven't 
received any reply from you. I am not sure whether you are still focus on this 
issue. So, In order to fix this issue in time, I picked this up again. Sorry 
for this inconvenience, I am sure you can find another issue to practice.

> Normalize log print statement
> -
>
> Key: HUDI-397
> URL: https://issues.apache.org/jira/browse/HUDI-397
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>
> In test suite module, there are many logging statements looks like this 
> pattern:
> {code:java}
> log.info(String.format("- inserting input data %s 
> --", this.getName()));
> {code}
> IMO, it's not a good design. We need to refactor it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-526) inline compact not work

2020-01-13 Thread liujianhui (Jira)
liujianhui created HUDI-526:
---

 Summary: inline compact not work
 Key: HUDI-526
 URL: https://issues.apache.org/jira/browse/HUDI-526
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Compaction
Reporter: liujianhui


hoodie.compact.inline set as true

hoodie.index.type set as INMEMEORY

 

compact not occur after dela commit

{code}

20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Checking if compaction needs to 
be run on file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Compacting merge on read table 
file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO FileSystemViewManager: Creating InMemory based view for 
basePath file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: 
[org.apache.hadoop.fs.LocalFileSystem@6a24b9e2]
20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties
20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) 
from file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline 
for file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants 
[[20200109181330__deltacommit__COMPLETED], 
[2020011017__deltacommit__COMPLETED], 
[20200110171526__deltacommit__COMPLETED], 
[20200113105844__deltacommit__COMPLETED], 
[20200113145851__deltacommit__COMPLETED], 
[20200113155502__deltacommit__COMPLETED], 
[20200113164342__deltacommit__COMPLETED]]
20/01/13 16:43:43 INFO HoodieRealtimeTableCompactor: Compacting 
file:///tmp/hudi_cow_table_read with commit 20200113164343
20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
from file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], 
Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, 
hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: 
[org.apache.hadoop.fs.LocalFileSystem@6a24b9e2]
20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from 
file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties
20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) 
from file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline 
for file:///tmp/hudi_cow_table_read
20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants 
[[20200109181330__deltacommit__COMPLETED], 
[2020011017__deltacommit__COMPLETED], 
[20200110171526__deltacommit__COMPLETED], 
[20200113105844__deltacommit__COMPLETED], 
[20200113145851__deltacommit__COMPLETED], 
[20200113155502__deltacommit__COMPLETED], 
[20200113164342__deltacommit__COMPLETED]]

{code} 

not compact time record in the .hoodie path



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1216: [HUDI-525]

2020-01-13 Thread GitBox
leesf commented on issue #1216: [HUDI-525]
URL: https://github.com/apache/incubator-hudi/pull/1216#issuecomment-573590408
 
 
   Thanks for opening the contribution @liujianhuiouc ! would you please change 
the title and follow the guide 
https://hudi.apache.org/contributing.html#life-of-a-contributor?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-13 Thread GitBox
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from 
mysql, and at the same time, we need to get the schema from the database. So I 
submitted this PR. A schema provider that obtains metadata through Jdbc calls 
the Spark JDBC related methods by design. And ensure the uniformity of the 
schema, such as reading historical data from spark jdbc, and Use delta streamer 
to synchronize data.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-525) inserts info miss in delta_commit_inflight meta file

2020-01-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-525:

Labels: pull-request-available  (was: )

> inserts info miss in delta_commit_inflight meta file
> 
>
> Key: HUDI-525
> URL: https://issues.apache.org/jira/browse/HUDI-525
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: liujianhui
>Priority: Minor
>  Labels: pull-request-available
>
> should add  insert info in WorkInfoStat
> {code}
> private void saveWorkloadProfileMetadataToInflight(WorkloadProfile profile, 
> HoodieTable table, String commitTime)
>  throws HoodieCommitException {
>  try {
>  HoodieCommitMetadata metadata = new HoodieCommitMetadata();
>  profile.getPartitionPaths().forEach(path -> {
>  WorkloadStat partitionStat = profile.getWorkloadStat(path.toString());
>  partitionStat.getUpdateLocationToCount().forEach((key, value) -> {
>  HoodieWriteStat writeStat = new HoodieWriteStat();
>  writeStat.setFileId(key);
>  // TODO : Write baseCommitTime is possible here ?
>  writeStat.setPrevCommit(value.getKey());
>  writeStat.setNumUpdateWrites(value.getValue());
>  metadata.addWriteStat(path.toString(), writeStat);
>  });
>  });
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] liujianhuiouc opened a new pull request #1216: [HUDI-525]

2020-01-13 Thread GitBox
liujianhuiouc opened a new pull request #1216: [HUDI-525]
URL: https://github.com/apache/incubator-hudi/pull/1216
 
 
   add insert info with insert records num in the HoodieCommitMeta. 
   because of the file id is unknown, set it as empty string
   
testing result:
   ```
   
 "partitionToWriteStats" : {
   "americas/brazil/sao_paulo" : [ {
 "fileId" : "",
 "path" : null,
 "prevCommit" : null,
 "numWrites" : 0,
 "numDeletes" : 0,
 "numUpdateWrites" : 0,
 "numInserts" : 3,
 "totalWriteBytes" : 0,
 "totalWriteErrors" : 0,
 "tempPath" : null,
 "partitionPath" : null,
 "totalLogRecords" : 0,
 "totalLogFilesCompacted" : 0,
 "totalLogSizeCompacted" : 0,
 "totalUpdatedRecordsCompacted" : 0,
 "totalLogBlocks" : 0,
 "totalCorruptLogBlock" : 0,
 "totalRollbackBlocks" : 0,
 "fileSizeInBytes" : 0
   } ],
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-13 Thread GitBox
zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - 
Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r365680930
 
 

 ##
 File path: hudi-utilities/pom.xml
 ##
 @@ -28,14 +28,52 @@
 
   
 ${project.parent.basedir}
+2.0.0
+2.12
   
 
   
+
+
+  
+
+  net.alchim31.maven
+  scala-maven-plugin
+  ${scala-maven-plugin.version}
+
+
+  org.apache.maven.plugins
+  maven-compiler-plugin
+
+  
+
+
 
   
 org.jacoco
 jacoco-maven-plugin
   
+  
+net.alchim31.maven
+scala-maven-plugin
+
+  
+scala-compile-first
+process-resources
+
+  add-source
+  compile
+
+  
+  
+scala-test-compile
+process-test-resources
+
+  testCompile
+
+  
+
+  
 
 Review comment:
   > hi @zhedoubushishi, I couldn't find a way to use 
`spark-streaming-kafka-0-10_2.12`, because this jar does not include test 
classes. This is the reason why I copied and adapted those 3 Scala files 
(`KafkaTestUtils.scala`, `ShutdownHookManager.scala` and `Utils.scala`). As far 
as my maven knowledge goes, the only way to use 
`https://github.com/apache/spark/blob/master/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala`
 would be to build a test-jar as described here: 
https://maven.apache.org/plugins/maven-jar-plugin/examples/create-test-jar.html.
 I can't find such jar anywhere.
   > Do you have other suggestions or maybe I'm missing something?
   
   Try appending this code in the hudi-utilities/pom.xml. It works for me.
   
   ```
   
 org.apache.spark
 spark-streaming-kafka-0-10_2.12
 ${spark.version}
 tests
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-295) Do one-time cleanup of Hudi git history

2020-01-13 Thread Pratyaksh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014104#comment-17014104
 ] 

Pratyaksh Sharma commented on HUDI-295:
---

[~vbalaji] Few of my commits are not getting counted in my contributions 
([https://github.com/apache/incubator-hudi/graphs/contributors]) in sense the 
number of commits does not match my actual number of commits here 
([https://github.com/apache/incubator-hudi/commits?author=pratyakshsharma]). 
The reason being those commits were committed with a different mail id 
(pratyakshsharma@.local) which github does not take into account for 
counting contributions. 

So whenever we update this git history, I would like to get my email id changed 
for few commits. I contacted github support and they have shared a script with 
me already for doing the same. 

Please let me know in case of any questions. 

> Do one-time cleanup of Hudi git history
> ---
>
> Key: HUDI-295
> URL: https://issues.apache.org/jira/browse/HUDI-295
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> https://lists.apache.org/thread.html/dc6eb516e248088dac1a2b5c9690383dfe2eb3912f76bbe9dd763c2b@



--
This message was sent by Atlassian Jira
(v8.3.4#803005)