[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183041 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java ## @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException { defaults.load(in); return defaults; } + + /*** + * call spark function get the schema through jdbc. + * @param options + * @return + * @throws Exception + */ + public static Schema getSchema(Map options) throws Exception { +scala.collection.immutable.Map ioptions = toScalaImmutableMap(options); +JDBCOptions jdbcOptions = new JDBCOptions(ioptions); +Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply(); +String url = jdbcOptions.url(); +String table = jdbcOptions.tableOrQuery(); +JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions); +boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite); +if (tableExists) { + JdbcDialect dialect = JdbcDialects.get(url); + try { +PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table)); +try { + statement.setQueryTimeout(Integer.parseInt(options.get("timeout"))); + ResultSet rs = statement.executeQuery(); + try { +StructType structType; +if (Boolean.parseBoolean(ioptions.get("nullable").get())) { + structType = JdbcUtils.getSchema(rs, dialect, true); +} else { + structType = JdbcUtils.getSchema(rs, dialect, false); +} +return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table); + } finally { +rs.close(); + } +} finally { + statement.close(); +} + } finally { +conn.close(); + } +} else { + throw new HoodieException(String.format("%s table not exists!", table)); Review comment: change to `table does not exist!`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183326 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java ## @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException { defaults.load(in); return defaults; } + + /*** + * call spark function get the schema through jdbc. + * @param options + * @return + * @throws Exception + */ + public static Schema getSchema(Map options) throws Exception { +scala.collection.immutable.Map ioptions = toScalaImmutableMap(options); +JDBCOptions jdbcOptions = new JDBCOptions(ioptions); +Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply(); +String url = jdbcOptions.url(); +String table = jdbcOptions.tableOrQuery(); +JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions); +boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite); +if (tableExists) { + JdbcDialect dialect = JdbcDialects.get(url); + try { +PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table)); +try { + statement.setQueryTimeout(Integer.parseInt(options.get("timeout"))); + ResultSet rs = statement.executeQuery(); + try { +StructType structType; +if (Boolean.parseBoolean(ioptions.get("nullable").get())) { + structType = JdbcUtils.getSchema(rs, dialect, true); +} else { + structType = JdbcUtils.getSchema(rs, dialect, false); +} +return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table); + } finally { +rs.close(); + } +} finally { + statement.close(); +} + } finally { +conn.close(); + } +} else { + throw new HoodieException(String.format("%s table not exists!", table)); +} + } + + @SuppressWarnings("unchecked") + private static scala.collection.immutable.Map toScalaImmutableMap(java.util.Map javaMap) { Review comment: import the java collection classes?`Map`, `List`, `ArrayList` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366184290 ## File path: hudi-utilities/src/test/resources/delta-streamer-config/source-jdbc.avsc ## @@ -0,0 +1,59 @@ +/* Review comment: any reason why the existing `source.avsc` won't work for you? Like to avoid creating new schema if possible This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366182135 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java ## @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException { defaults.load(in); return defaults; } + + /*** + * call spark function get the schema through jdbc. + * @param options + * @return + * @throws Exception + */ + public static Schema getSchema(Map options) throws Exception { Review comment: rename to `getJDBCSchema`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183554 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java ## @@ -511,6 +524,22 @@ public void testNullSchemaProvider() throws Exception { } } + @Test + public void testJdbcbasedSchemaProvider() throws Exception { Review comment: can we create a separate test class for this? given you are only testing the schema provider? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1226 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log Most ideas of this PR is pretty similar to https://github.com/apache/incubator-hudi/pull/1109. This PR is also compatible with Scala 2.12. You can build it with: ```mvn clean install -Dscala.version=2.12.10 -scala.binary.version=2.12``` Here are some major differences between https://github.com/apache/incubator-hudi/pull/1109: - updated kafka-source.properties & kafka-source.properties . - This parameter: ```ConsumerConfig.GROUP_ID_CONFIG``` is defined in ```TestKafkaSource.java``` rather than in ```KafkaOffsetGen.java```. Because this config should be decided by the client side but not the Hudi side. - For ```AvroKafkaSource.java ```, ```KafkaAvroDeserializer.class``` need to be set. ``` props.put("key.deserializer", StringDeserializer.class); props.put("value.deserializer", KafkaAvroDeserializer.class); ``` ## Verify this pull request This pull request is already covered by existing tests. ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-574033535 @vinothchandar Please review. I mainly did something: 1. Added test cases 2. All logic is implemented using java 3. The jdbc code logic references spark 2.4.4 and spark 3.x This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-531) Add java doc for hudi test suite general classes
[ https://issues.apache.org/jira/browse/HUDI-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-531: -- Description: Currently, the general classes (under src/main dir) has no java docs. We should add doc for those classes. (was: Currently, the general classes (under src dir) has no java docs. We should add doc for those classes.) > Add java doc for hudi test suite general classes > > > Key: HUDI-531 > URL: https://issues.apache.org/jira/browse/HUDI-531 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: wangxianghu >Priority: Major > > Currently, the general classes (under src/main dir) has no java docs. We > should add doc for those classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-532) Add java doc for hudi test suite test classes
vinoyang created HUDI-532: - Summary: Add java doc for hudi test suite test classes Key: HUDI-532 URL: https://issues.apache.org/jira/browse/HUDI-532 Project: Apache Hudi (incubating) Issue Type: Sub-task Reporter: vinoyang Assignee: wangxianghu Currently, the test classes(under test/java dir) has no java docs. We should add more doc for those classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-531) Add java doc for hudi test suite general classes
vinoyang created HUDI-531: - Summary: Add java doc for hudi test suite general classes Key: HUDI-531 URL: https://issues.apache.org/jira/browse/HUDI-531 Project: Apache Hudi (incubating) Issue Type: Sub-task Components: Testing Reporter: vinoyang Assignee: wangxianghu Currently, the general classes (under src dir) has no java docs. We should add doc for those classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] vinothchandar commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
vinothchandar commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-574027065 @hddong thanks! avro works based on field positions, so reordering them was my concern. Thanks for addressing this. over to @bvaradar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1208: [HUDI-304] Bring back spotless plugin
vinothchandar commented on issue #1208: [HUDI-304] Bring back spotless plugin URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574024869 >ocument that developers could use checkstyle.xml file in style folder in checkstyle plugin and things will go well I was able to use checkstyle to format in IntelliJ. This is fine.. but we should clearly document this. maybe file a JIRA? On import order, we can take a second stab may be down the line? again filing a JIRA would be great for tracking.. On this PR, my concern was we are reformatting again due to the 120 character limit? I was trying to see if we can avoid it. @leesf could you explain why 100+ files are being touched in this PR? If these were all checkstyle failures, then master would be broken right? I am just trying to understand what code really changed here, given we are close to a release.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574024580 @nsivabalan will merge this once you are able to verify this method with quickstart steps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-503) Add hudi test suite documentation into the README file of the test suite module
[ https://issues.apache.org/jira/browse/HUDI-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-503: -- Status: Open (was: New) > Add hudi test suite documentation into the README file of the test suite > module > --- > > Key: HUDI-503 > URL: https://issues.apache.org/jira/browse/HUDI-503 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Docs >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-503) Add hudi test suite documentation into the README file of the test suite module
[ https://issues.apache.org/jira/browse/HUDI-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-503: -- Status: In Progress (was: Open) > Add hudi test suite documentation into the README file of the test suite > module > --- > > Key: HUDI-503 > URL: https://issues.apache.org/jira/browse/HUDI-503 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Docs >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] vinothchandar commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema
vinothchandar commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-574015931 @umehrot2 is this by any way related to the quickstart breakage that @nsivabalan reported? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #1225: Adding util methods to assist in adding deletion support to Quick Start
vinothchandar commented on issue #1225: Adding util methods to assist in adding deletion support to Quick Start URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574005188 @nsivabalan can you add a `[MINOR]` prefix to your commit and PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200 ## What is the purpose of the pull request In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1225: Adding util methods to assist in adding deletion support to Quick Start
nsivabalan opened a new pull request #1225: Adding util methods to assist in adding deletion support to Quick Start URL: https://github.com/apache/incubator-hudi/pull/1225 Adding util methods to assist in adding deletion support to Quick Start ## Verify this pull request Latest master has issues w/ spark avro dependency. So, couldn't verify. But the code as such is not a prod code. It is just used in Quick start. This pull request is a trivial rework / code cleanup without any test coverage. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] yanghua commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end
yanghua commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-573978162 @n3nash This PR has conflicts, I have rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-523) Upgrade Hudi to Spark DataSource V2
[ https://issues.apache.org/jira/browse/HUDI-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated HUDI-523: --- Description: May be need spark3 (was: As spark upgrade to 2.4, we can upgrade to datasource api v2 now.) > Upgrade Hudi to Spark DataSource V2 > --- > > Key: HUDI-523 > URL: https://issues.apache.org/jira/browse/HUDI-523 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: hong dongdong >Priority: Major > > May be need spark3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[incubator-hudi] branch hudi_test_suite_refactor updated (09c34a0 -> 3dc85eb)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. omit 09c34a0 [HUDI-442] Fix TestComplexKeyGenerator#testSingleValueKeyGenerator and testMultipleValueKeyGenerator NPE omit 66463ff [MINOR] Fix compile error about the deletion of HoodieActiveTimeline#createNewCommitTime omit 1d2ecbc [HUDI-391] Rename module name from hudi-bench to hudi-test-suite and fix some checkstyle issues (#1102) omit 9b55d37 [HUDI-394] Provide a basic implementation of test suite add 8172197 Fix Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string in HoodieCopyOnWrite#deleteFilesFunc (#1126) add 4b1b3fc [MINOR] Set info servity for ImportOrder temporarily (#1127) add 41f3677 [MINOR] fix typo add dd06660 [MINOR] fix typo add 94aec96 [minor] Fix few typos in the java docs (#1132) add 9c4217a [HUDI-389] Fixing Index look up to return right partitions for a given key along with fileId with Global Bloom (#1091) add 8affdf8 [HUDI-416] Improve hint information for cli (#1110) add 3c811ec [MINOR] fix typos add def18a5 [MINOR] optimize hudi timeline service (#1137) add 842eabb [HUDI-470] Fix NPE when print result via hudi-cli (#1138) add f20a130 [MINOR] typo fix (#1142) add 01c25d6 [MINOR] Update the java doc of HoodieTableType (#1148) add 58c5bed [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table add 179837e Fix checkstyle add 2f25416 Skip setting commit metadata add 8440482 Fix empty content clean plan add e4ea7a2 Update comment add 2a823f3 [MINOR]: alter some wrong params which bring fatal exception add ab6ae5c [HUDI-482] Fix missing @Override annotation on methods (#1156) add e637d9e [HUDI-455] Redo hudi-client log statements using SLF4J (#1145) add bb90ded [MINOR] Fix out of limits for results add 36c0e6b [MINOR] Fix out of limits for results add 74b00d1 trigger rebuild add 619f501 Clean up code add add4b1e Merge pull request #1143 from BigDataArtisans/outoflimit add 47c1f74 [HUDI-343]: Create a DOAP file for Hudi add 98c0d8c Merge pull request #1160 from smarthi/HUDI-343 add dde21e7 [HUDI-402]: code clean up in test cases add e1e5fe3 [MINOR] Fix error usage of String.format (#1169) add ff1113f [HUDI-492]Fix show env all in hudi-cli add 290278f [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands add a733f4e [MINOR] Optimize hudi-cli module (#1136) add 726ae47 [MINOR]Optimize hudi-client module (#1139) add 7031445 [HUDI-377] Adding Delete() support to DeltaStreamer (#1073) add 28ccf8c [HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller (#1167) add b9fab0b Revert "[HUDI-455] Redo hudi-client log statements using SLF4J (#1145)" (#1181) add 2d5b79d [HUDI-438] Merge duplicated code fragment in HoodieSparkSqlWriter (#1114) add 8f935e7 [HUDI-406]: added default partition path in TimestampBasedKeyGenerator add c78092d [HUDI-501] Execute docker/setup_demo.sh in any directory add 75c3f63 [HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource add b5df672 [HUDI-464] Use Hive Exec Core for tests (#1125) add 8306f74 [HUDI-417] Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations (#1166) add 9706f65 [HUDI-508] Standardizing on "Table" instead of "Dataset" across code (#1197) add 9884972 [MINOR] Remove old jekyll config file (#1198) add aba8387 Update deprecated HBase API add 480fc78 [HUDI-319] Add a new maven profile to generate unified Javadoc for all Java and Scala classes (#1195) add d09eacd [HUDI-25] Optimize HoodieInputformat.listStatus() for faster Hive incremental queries on Hoodie add 5af3dc6 [HUDI-331]Fix java docs for all public apis in HoodieWriteClient (#) add 3c90d25 [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file add 04afac9 [HUDI-248] CLI doesn't allow rolling back a Delta commit add b95367d [HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows. add e103165 [CLEAN] replace utf-8 constant with StandardCharsets.UTF_8 add 017ee8e [MINOR] Fix partition typo (#1209) add d9675c4 [HUDI-522] Use the same version jcommander uniformly (#1214) add ad50008 [HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types add 971c7d4 [HUDI-322] DeltaSteamer should pick checkpoints off only deltacommits for MOR tables add a44c61b [HUDI-502] provide a custom time zone
[jira] [Assigned] (HUDI-523) Upgrade Hudi to Spark DataSource V2
[ https://issues.apache.org/jira/browse/HUDI-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong reassigned HUDI-523: -- Assignee: (was: hong dongdong) > Upgrade Hudi to Spark DataSource V2 > --- > > Key: HUDI-523 > URL: https://issues.apache.org/jira/browse/HUDI-523 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: hong dongdong >Priority: Major > > As spark upgrade to 2.4, we can upgrade to datasource api v2 now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1224: [HUDI-397] Normalize log print statement
wangxianghu opened a new pull request #1224: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1224 ## What is the purpose of the pull request *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Brief change log *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Verify this pull request This pull request should be covered by existing tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-hudi] branch master updated (c1f8aca -> fd8f1c7)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. from c1f8aca [HUDI-526] fix the HoodieAppendHandle add fd8f1c7 [MINOR] Reuse random object (#1222) No new revisions were added by this update. Summary of changes: .../java/org/apache/hudi/io/strategy/TestHoodieCompactionStrategy.java | 3 ++- .../test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
[GitHub] [incubator-hudi] vinothchandar merged pull request #1222: [MINOR] Reuse random object
vinothchandar merged pull request #1222: [MINOR] Reuse random object URL: https://github.com/apache/incubator-hudi/pull/1222 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573958871 @hmatu Roll back, It closed automatically This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573957902 Close again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu closed pull request #1220: [HUDI-397] Normalize log print statement
wangxianghu closed pull request #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573955655 Hi @n3nash, This PR covered all the logs in test-suite module. Besides, I found the wrong email was used. I will fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement
wangxianghu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573953742 Hi @hmatu, Thanks for your advice, I will pay attention next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema
umehrot2 commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-573951118 @vinothchandar @bvaradar The migration to `spark-avro` has introduced this issue which was earlier reported for EMR https://github.com/apache/incubator-hudi/issues/1034 as we were already using `spark-avro` internally. Lets try to get this in before code freeze if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-530) Datasource Writer throws error on resolving struct fields
[ https://issues.apache.org/jira/browse/HUDI-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-530: Labels: pull-request-available (was: ) > Datasource Writer throws error on resolving struct fields > - > > Key: HUDI-530 > URL: https://issues.apache.org/jira/browse/HUDI-530 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > Labels: pull-request-available > > The issue was reported in > [https://github.com/apache/incubator-hudi/issues/1034] . With migration of > Hudi to spark 2.4.4 and using Spark's native spark-avro module, this issue > now exists in Hudi master. > > This struct fields will not work as of now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] umehrot2 opened a new pull request #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema
umehrot2 opened a new pull request #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema URL: https://github.com/apache/incubator-hudi/pull/1223 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request With migration of Hudi to `spark 2.4.4` and to using `native spark-avro`, there is an issue with conversion of struct fields because of the way spark-avro handles avro schema conversion vs databricks-avro. This has been reported earlier for EMR in https://github.com/apache/incubator-hudi/issues/1034 and now exists in Hudi master as well. The issue is `spark-avro` has a different way of naming `Avro namespace` than `databricks-avro`, while converting the schema to avro schema. For example suppose the data is: ``` List("{ \"deviceId\": \"a\", \"eventType\": \"uditevent1\", \"eventTimeMilli\": 1574297893836, \"location\": { \"latitude\": 2.5, \"longitude\": 3.5 }}"); ``` `databricks-avro` used to convert it to avro schema, such that namespace of `location` struct field has field name in it: ``` { "type" : "record", "name" : "hudi_issue_1034_dec30_01_record", "namespace" : "hoodie.hudi_issue_1034_dec30_01", "fields" : [ { "name" : "deviceId", "type" : [ "string", "null" ] }, { "name" : "eventTimeMilli", "type" : [ "long", "null" ] }, { "name" : "location", "type" : [ { "type" : "record", "name" : "location", "namespace" : "hoodie.hudi_issue_1034_dec30_01.location", "fields" : [ { "name" : "latitude", "type" : [ "double", "null" ] }, { "name" : "longitude", "type" : [ "double", "null" ] } ] }, "null" ] } ] } ``` `spark-avro` now converts the same to the following, and uses the `record name` in the schema instead: ``` { "type" : "record", "name" : "hudi_issue_1034_dec31_01_record", "namespace" : "hoodie.hudi_issue_1034_dec31_01", "fields" : [ { "name" : "deviceId", "type" : [ "string", "null" ] }, { "name" : "eventTimeMilli", "type" : [ "long", "null" ] }, { "name" : "location", "type" : [ { "type" : "record", "name" : "location", "namespace" : "hoodie.hudi_issue_1034_dec31_01.hudi_issue_1034_dec31_01_record", "fields" : [ { "name" : "latitude", "type" : [ "double", "null" ] }, { "name" : "longitude", "type" : [ "double", "null" ] } ] }, "null" ] } ] } ``` This PR fixes the above issue as we have now migrated to spark-avro. ## Brief change log - Fix conversion of Spark struct type to Avro schema - Modify the schema of data used in unit tests and integration tests to have struct type data as well, so that any issue with struct type can be caught earlier ## Verify this pull request This PR modifies the schema of the data that is being used across unit tests and certain integration tests to have a struct field. From now on Unit/Integration tests would catch any issue with struct fields. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1222: [MINOR] Reuse random object
lamber-ken opened a new pull request #1222: [MINOR] Reuse random object URL: https://github.com/apache/incubator-hudi/pull/1222 ## What is the purpose of the pull request Reuse random object. ## Brief change log - *Reuse random object.* ## Verify this pull request This pull request is code cleanup without any test coverage. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HUDI-530) Datasource Writer throws error on resolving struct fields
Udit Mehrotra created HUDI-530: -- Summary: Datasource Writer throws error on resolving struct fields Key: HUDI-530 URL: https://issues.apache.org/jira/browse/HUDI-530 Project: Apache Hudi (incubating) Issue Type: Bug Components: Spark Integration Reporter: Udit Mehrotra The issue was reported in [https://github.com/apache/incubator-hudi/issues/1034] . With migration of Hudi to spark 2.4.4 and using Spark's native spark-avro module, this issue now exists in Hudi master. This struct fields will not work as of now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-530) Datasource Writer throws error on resolving struct fields
[ https://issues.apache.org/jira/browse/HUDI-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-530: -- Assignee: Udit Mehrotra > Datasource Writer throws error on resolving struct fields > - > > Key: HUDI-530 > URL: https://issues.apache.org/jira/browse/HUDI-530 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Spark Integration >Reporter: Udit Mehrotra >Assignee: Udit Mehrotra >Priority: Major > > The issue was reported in > [https://github.com/apache/incubator-hudi/issues/1034] . With migration of > Hudi to spark 2.4.4 and using Spark's native spark-avro module, this issue > now exists in Hudi master. > > This struct fields will not work as of now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key
bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366091392 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.keygen; + +import java.util.Arrays; +import java.util.List; +import org.apache.avro.generic.GenericRecord; +import org.apache.hudi.DataSourceUtils; +import org.apache.hudi.DataSourceWriteOptions; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.util.TypedProperties; +import org.apache.hudi.exception.HoodieKeyException; + +/** + * Key generator for deletes using global indices. Global index deletes do not require partition value + * so this key generator avoids using partition value for generating HoodieKey. + */ +public class GlobalDeleteKeyGenerator extends KeyGenerator { + + private static final String EMPTY_PARTITION = ""; + private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__"; + private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__"; + + protected final List recordKeyFields; + + public GlobalDeleteKeyGenerator(TypedProperties config) { +super(config); +this.recordKeyFields = Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(",")); + } + + @Override + public HoodieKey getKey(GenericRecord record) { Review comment: @bschell : The only difference between GlobalDeleteKeyGenerator and ComplexKeyGenerator is that the former always creates an empty-partition path. right ? In that case, can we simply refactor the getKey() method in ComplexKeyGenerator and have GlobalDeleteKeyGenerator extend ComplexKeyGenerator with necessary changes to make it to work for empty partition-path. The advantage is we use all the logic related to nested fields handling in one place ? Let me know your thoughts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key
bvaradar commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366088953 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java ## @@ -16,8 +16,10 @@ * limitations under the License. */ -package org.apache.hudi; +package org.apache.hudi.keygen; Review comment: This is a backwards incompatible change. Users would have custom key generators using configuration : DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY() It makes sense to move to separate package but we need to call out the change in release notes . Please open a tracking ticket to update release notes for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573923932 > Sure. I will send another PR. Currently our work only supports 2.12, but I can try to see if it is possible to support both 2.11 and 2.12. @zhedoubushishi : Is your change different from what is being done as part of this PR ? Anyways, it would help if you can open a WIP PR and we can cross check with this PR to see if we are missing anything here. Also @zhedoubushishi @ezhux : I see this info in stack-overflow to build both 2.11 and 2.12 versions of packages. https://stackoverflow.com/a/46785150. Can you check if this model would work for hudi ? We would need to change pom for hudi-spark and its dependents : hudi-spark-bundle and hudi-utilities-bundle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
bvaradar commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573920277 @ezhux : just see that you have pushed some changes. Let us know when you want us to review the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (HUDI-529) Enable cobertura coverage reporting
Prashant Wason created HUDI-529: --- Summary: Enable cobertura coverage reporting Key: HUDI-529 URL: https://issues.apache.org/jira/browse/HUDI-529 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: Prashant Wason Hudi project has code coverage enabled via the jacoco plugin. Jenkins has better support for coverage reporting using the Jenkins Cobertura plugin. This enhancement provides a way to convert the jacoco coverage report to cobertura format at the end of the unit test runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-528) Incremental Pull fails when latest commit is empty
Javier Vega created HUDI-528: Summary: Incremental Pull fails when latest commit is empty Key: HUDI-528 URL: https://issues.apache.org/jira/browse/HUDI-528 Project: Apache Hudi (incubating) Issue Type: Bug Components: Incremental Pull Reporter: Javier Vega When trying to create an incremental view of a dataset, an exception is thrown when the latest commit in the time range is empty. In order to determine the schema of the dataset, Hudi will grab the [latest commit file, parse it, and grab the first metadata file path|[https://github.com/apache/incubator-hudi/blob/480fc7869d4d69e1219bf278fd9a37f27ac260f6/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala#L78-L80]]. If the latest commit was empty though, the field which is used to determine file paths (partitionToWriteStats) will be empty causing the following exception: {code:java} java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1447) at java.util.HashMap$ValueIterator.next(HashMap.java:1474) at org.apache.hudi.IncrementalRelation.(IncrementalRelation.scala:80) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:65) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:46) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] garyli1019 commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table
garyli1019 commented on issue #1128: [HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table URL: https://github.com/apache/incubator-hudi/pull/1128#issuecomment-573880940 Thanks for the discussion. I have been running into the same issue and manually remove all `xxx.clean.requested` worked! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] gfn9cho removed a comment on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync
gfn9cho removed a comment on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-573861890 Hi, I am using EMR 5.28 with built in hudi support. I was able to use hudi through spark-shell. However, when running a spark application as mentioned in the issue, I am getting the below error. Any pointers on how to resolve the conflict. my build.sbt is pretty much similar.. ` Caused by: java.lang.NoSuchMethodError: org.apache.http.conn.ssl.SSLConnectionSocketFactory.(Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/HostnameVerifier;)V at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.(SdkTLSSocketFactory.java:58) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:93) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:66) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:59) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38) at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:324) at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:308) at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:237) at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:223) at com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:177) at com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:163) at com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:61) at com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:27) at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46) at com.amazonaws.glue.catalog.metastore.AWSGlueClientFactory.newClient(AWSGlueClientFactory.java:72) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.(AWSCatalogMetastoreClient.java:146) at com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3007) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3042) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1235) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:167) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:117) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] ezhux commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12
ezhux commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r366029872 ## File path: hudi-utilities/pom.xml ## @@ -28,14 +28,52 @@ ${project.parent.basedir} +2.0.0 +2.12 + + Review comment: removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] TadasSugintasYields commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12
TadasSugintasYields commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r366029309 ## File path: hudi-utilities/pom.xml ## @@ -28,14 +28,52 @@ ${project.parent.basedir} +2.0.0 +2.12 + + + + + net.alchim31.maven + scala-maven-plugin + ${scala-maven-plugin.version} + + + org.apache.maven.plugins + maven-compiler-plugin + + + + org.jacoco jacoco-maven-plugin + +net.alchim31.maven +scala-maven-plugin + + +scala-compile-first +process-resources + + add-source + compile + + + +scala-test-compile +process-test-resources + + testCompile + + + + Review comment: finally got it working, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] gfn9cho commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync
gfn9cho commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-573861890 Hi, I am using EMR 5.28 with built in hudi support. I was able to use hudi through spark-shell. However, when running a spark application as mentioned in the issue, I am getting the below error. Any pointers on how to resolve the conflict. my build.sbt is pretty much similar.. ` Caused by: java.lang.NoSuchMethodError: org.apache.http.conn.ssl.SSLConnectionSocketFactory.(Ljavax/net/ssl/SSLContext;Ljavax/net/ssl/HostnameVerifier;)V at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.(SdkTLSSocketFactory.java:58) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:93) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:66) at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:59) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50) at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38) at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:324) at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:308) at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:237) at com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:223) at com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:177) at com.amazonaws.services.glue.AWSGlueClient.(AWSGlueClient.java:163) at com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:61) at com.amazonaws.services.glue.AWSGlueClientBuilder.build(AWSGlueClientBuilder.java:27) at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46) at com.amazonaws.glue.catalog.metastore.AWSGlueClientFactory.newClient(AWSGlueClientFactory.java:72) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.(AWSCatalogMetastoreClient.java:146) at com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3007) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3042) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1235) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:167) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:183) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:117) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:384) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:215) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (HUDI-527) Fix warning in project compilation
[ https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014641#comment-17014641 ] Prashant Wason commented on HUDI-527: - [WARNING] The POM for org.jamon:jamon-runtime:jar:2.3.1 is missing, no dependency information available [INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) @ hudi-hadoop-mr --- [WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1 [WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1 [INFO] --- scala-maven-plugin:3.3.1:compile (scala-compile-first) @ hudi-spark --- [WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1 [WARNING] Missing POM for org.jamon:jamon-runtime:jar:2.3.1 [WARNING] Expected all dependencies to require Scala version: 2.11.8 [WARNING] org.apache.hudi:hudi-spark:0.5.1-SNAPSHOT requires scala version: 2.11.8 [WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7 requires scala version: 2.11.8 [WARNING] org.scala-lang:scala-reflect:2.11.8 requires scala version: 2.11.8 [WARNING] com.twitter:chill_2.11:0.9.3 requires scala version: 2.11.12 [WARNING] Multiple versions of scala libraries detected! [INFO] Compiling 16 source files to /home/pwason/uber/incubator-hudi/hudi-spark/target/classes at 1578943931369 [WARNING] /home/pwason/uber/incubator-hudi/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:84: warning: non-variable type argument org.apache.hudi.common.model.HoodieRecordPayload[Nothing] in type pattern org.apache.hudi.HoodieWriteClient[org.apache.hudi.common.model.HoodieRecordPayload[Nothing]] is unchecked since it is eliminated by erasure [WARNING] val (writeStatuses, writeClient: HoodieWriteClient[HoodieRecordPayload[Nothing]]) = [WARNING] ^ [WARNING] one warning found [INFO] --- scala-maven-plugin:3.3.1:compile (scala-compile-first) @ hudi-cli --- [WARNING] Expected all dependencies to require Scala version: 2.11.8 [WARNING] org.apache.hudi:hudi-cli:0.5.1-SNAPSHOT requires scala version: 2.11.8 [WARNING] org.apache.hudi:hudi-spark:0.5.1-SNAPSHOT requires scala version: 2.11.8 [WARNING] com.fasterxml.jackson.module:jackson-module-scala_2.11:2.6.7 requires scala version: 2.11.8 [WARNING] org.scala-lang:scala-reflect:2.11.8 requires scala version: 2.11.8 [WARNING] org.apache.spark:spark-tags_2.11:2.4.4 requires scala version: 2.11.12 [WARNING] Multiple versions of scala libraries detected! > Fix warning in project compilation > -- > > Key: HUDI-527 > URL: https://issues.apache.org/jira/browse/HUDI-527 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Prashant Wason >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > "mvn compile" issues various warnings. > This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-527) Fix warning in project compilation
[ https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-527: Labels: pull-request-available (was: ) > Fix warning in project compilation > -- > > Key: HUDI-527 > URL: https://issues.apache.org/jira/browse/HUDI-527 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Prashant Wason >Priority: Minor > Labels: pull-request-available > > "mvn compile" issues various warnings. > This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] prashantwason opened a new pull request #1221: [HUDI-527] scalastyle-maven-plugin moved to pluginManagement as it ionly used in hoodie-spark and hoodie-cli modules
prashantwason opened a new pull request #1221: [HUDI-527] scalastyle-maven-plugin moved to pluginManagement as it ionly used in hoodie-spark and hoodie-cli modules URL: https://github.com/apache/incubator-hudi/pull/1221 ## What is the purpose of the pull request This fixes scalastyle-maven-plugin warnings as well as unnecessary plugin invocation for most of the modules which do not have scala code. ## Brief change log - Scala code is used in only hudi-cli and hudi-spark modules - scalastyle-maven-plugin has been moved from pom.xml to - entries have been added in hudi-cli/pom.xml and hudi-spark/pom.xml This ensures that the scalastyle-maven-plugin will only be executed for the modules which have scala code. ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. mvn clean package ## Committer checklist - [ *] Has a corresponding JIRA in PR title & commit - [ *] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-573847670 > @zhedoubushishi : As you had mentioned that AWS EMR has internally made it possible to package hudi jars using scala 2.12, can you shepherd this PR ? This is one of the critical PRs to be fixed before next week (deadline end of week). > > I also have a question here : Has AWS EMR migrated the scala compile version to 2.12 or are you supporting both 2.11 and 2.12 ? It looks like spark-2.4.4 (which is used for compiling Hudi) has both 2.11 and 2.12 packaging support. So, wondering if we can support both 2.11 and 2.12 hudi package generation. Let us know. Sure. I will send another PR. Currently our work only supports 2.12, but I can try to see if it is possible to support both 2.11 and 2.12. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (HUDI-527) Fix warning in project compilation
[ https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014616#comment-17014616 ] Prashant Wason commented on HUDI-527: - scalastyle-maven-plugin emits missing file warnings if there are no scala files present in a module. Hence, it should only be enabled for modules which contain scala files. [INFO] --- scalastyle-maven-plugin:1.0.0:check (default) @ hoodie --- [WARNING] sourceDirectory is not specified or does not exist value=/home/pwason/uber/hoodie_oss/src/main/scala [WARNING] testSourceDirectory is not specified or does not exist value=/home/pwason/uber/hoodie_oss/src/test/scala > Fix warning in project compilation > -- > > Key: HUDI-527 > URL: https://issues.apache.org/jira/browse/HUDI-527 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Prashant Wason >Priority: Minor > > "mvn compile" issues various warnings. > This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-527) Fix warning in project compilation
[ https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-527: Status: In Progress (was: Open) > Fix warning in project compilation > -- > > Key: HUDI-527 > URL: https://issues.apache.org/jira/browse/HUDI-527 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Prashant Wason >Priority: Minor > > "mvn compile" issues various warnings. > This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-527) Fix warning in project compilation
[ https://issues.apache.org/jira/browse/HUDI-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-527: Status: Open (was: New) > Fix warning in project compilation > -- > > Key: HUDI-527 > URL: https://issues.apache.org/jira/browse/HUDI-527 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Code Cleanup >Reporter: Prashant Wason >Priority: Minor > > "mvn compile" issues various warnings. > This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-527) Fix warning in project compilation
Prashant Wason created HUDI-527: --- Summary: Fix warning in project compilation Key: HUDI-527 URL: https://issues.apache.org/jira/browse/HUDI-527 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Code Cleanup Reporter: Prashant Wason "mvn compile" issues various warnings. This is a task to look into those warnings and fix them if required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] n3nash commented on issue #1220: [HUDI-397] Normalize log print statement
n3nash commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573811899 @wangxianghu Thanks for the PR, does this exhaustively take care of all the logs in the test-suite module ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-hudi] branch master updated: [HUDI-526] fix the HoodieAppendHandle
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new c1f8aca [HUDI-526] fix the HoodieAppendHandle c1f8aca is described below commit c1f8acab344fa632f1cce6268d2fc765c45e8b22 Author: liujianhui AuthorDate: Mon Jan 13 19:16:32 2020 +0800 [HUDI-526] fix the HoodieAppendHandle --- hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java | 5 + 1 file changed, 5 insertions(+) diff --git a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java index edf01ce..e2dbf64 100644 --- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java +++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java @@ -23,6 +23,7 @@ import org.apache.hudi.common.model.FileSlice; import org.apache.hudi.common.model.HoodieDeltaWriteStat; import org.apache.hudi.common.model.HoodieKey; import org.apache.hudi.common.model.HoodieLogFile; +import org.apache.hudi.common.model.HoodiePartitionMetadata; import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecordLocation; import org.apache.hudi.common.model.HoodieRecordPayload; @@ -132,6 +133,10 @@ public class HoodieAppendHandle extends HoodieWri writeStatus.getStat().setFileId(fileId); averageRecordSize = SizeEstimator.estimate(record); try { +//save hoodie partition meta in the partition path +HoodiePartitionMetadata partitionMetadata = new HoodiePartitionMetadata(fs, baseInstantTime, +new Path(config.getBasePath()), FSUtils.getPartitionPath(config.getBasePath(), partitionPath)); +partitionMetadata.trySave(TaskContext.getPartitionId()); this.writer = createLogWriter(fileSlice, baseInstantTime); this.currentLogFile = writer.getLogFile(); ((HoodieDeltaWriteStat) writeStatus.getStat()).setLogVersion(currentLogFile.getLogVersion());
[GitHub] [incubator-hudi] n3nash merged pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle
n3nash merged pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle URL: https://github.com/apache/incubator-hudi/pull/1218 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on issue #1216: [HUDI-525] lack of insert info in delta_commit inflight
n3nash commented on issue #1216: [HUDI-525] lack of insert info in delta_commit inflight URL: https://github.com/apache/incubator-hudi/pull/1216#issuecomment-573809598 @liujianhuiouc What functionality are we going to enhance by adding this information to the inflight workload profile metadata ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM
xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-573781552 @nsivabalan Thank you for the thorough review! I'll try to address these in the next few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573730094 Please use `git config --list` to check whether your email is right or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] hmatu commented on issue #1220: [HUDI-397] Normalize log print statement
hmatu commented on issue #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220#issuecomment-573729421 If you modify some changes, you can continue to commit to same branch, no need to create a new pr, like https://github.com/apache/incubator-hudi/pull/1219 https://github.com/apache/incubator-hudi/pull/1217 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1220: [HUDI-397] Normalize log print statement
wangxianghu opened a new pull request #1220: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1220 ## What is the purpose of the pull request *Redo hudi-test-suite log statements using SLF4J* *Normalize log print statement* ## Brief change log *Redo hudi-test-suite log statements using SLF4J* *Normalize log print statement* ## Verify this pull request This pull request should be covered by existing tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu closed pull request #1219: [HUDI-397] Normalize log print statement
wangxianghu closed pull request #1219: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1219 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1219: [HUDI-397] Normalize log print statement
wangxianghu opened a new pull request #1219: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1219 ## What is the purpose of the pull request *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Brief change log *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Verify this pull request This pull requestshould be covered by existing tests, such as *(please describe tests)*. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement
wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1217 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (HUDI-517) compact error when hoodie.compact.inline is true
[ https://issues.apache.org/jira/browse/HUDI-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujianhui closed HUDI-517. --- Resolution: Fixed > compact error when hoodie.compact.inline is true > > > Key: HUDI-517 > URL: https://issues.apache.org/jira/browse/HUDI-517 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Compaction >Reporter: liujianhui >Priority: Minor > > # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as > true > # the duration of the write process is 1 second > # the instant time of the compact is same to the commit instant time > > ``` > java.lang.IllegalArgumentException: Following instants have timestamps >= > compactionInstant (20200110171526) Instants > :[[20200110171526__deltacommit__COMPLETED]] > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > at > org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043) > at > org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018) > at > org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-518) compact error when hoodie.compact.inline is true
[ https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujianhui resolved HUDI-518. - Resolution: Fixed > compact error when hoodie.compact.inline is true > > > Key: HUDI-518 > URL: https://issues.apache.org/jira/browse/HUDI-518 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Compaction, Writer Core >Reporter: liujianhui >Priority: Minor > > # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as > true > # the duration of the write process is 1 second > # the instant time of the compact is same to the commit instant time > > {code} > java.lang.IllegalArgumentException: Following instants have timestamps >= > compactionInstant (20200110171526) Instants > :[[20200110171526__deltacommit__COMPLETED]] > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > at > org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043) > at > org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018) > at > org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-518) compact error when hoodie.compact.inline is true
[ https://issues.apache.org/jira/browse/HUDI-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujianhui updated HUDI-518: Status: Open (was: New) > compact error when hoodie.compact.inline is true > > > Key: HUDI-518 > URL: https://issues.apache.org/jira/browse/HUDI-518 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Compaction, Writer Core >Reporter: liujianhui >Priority: Minor > > # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as > true > # the duration of the write process is 1 second > # the instant time of the compact is same to the commit instant time > > {code} > java.lang.IllegalArgumentException: Following instants have timestamps >= > compactionInstant (20200110171526) Instants > :[[20200110171526__deltacommit__COMPLETED]] > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > at > org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043) > at > org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018) > at > org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement
wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1217 ## What is the purpose of the pull request *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Brief change log *Normalize log print statement* *Redo hudi-test-suite log statements using SLF4J* ## Verify this pull request This pull request should be covered by existing tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-517) compact error when hoodie.compact.inline is true
[ https://issues.apache.org/jira/browse/HUDI-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujianhui updated HUDI-517: Status: Open (was: New) > compact error when hoodie.compact.inline is true > > > Key: HUDI-517 > URL: https://issues.apache.org/jira/browse/HUDI-517 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Compaction >Reporter: liujianhui >Priority: Minor > > # set the property [hoodie.compact.inline|http://hoodie.compact.inline/] as > true > # the duration of the write process is 1 second > # the instant time of the compact is same to the commit instant time > > ``` > java.lang.IllegalArgumentException: Following instants have timestamps >= > compactionInstant (20200110171526) Instants > :[[20200110171526__deltacommit__COMPLETED]] > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > at > org.apache.hudi.HoodieWriteClient.scheduleCompactionAtInstant(HoodieWriteClient.java:1043) > at > org.apache.hudi.HoodieWriteClient.scheduleCompaction(HoodieWriteClient.java:1018) > at > org.apache.hudi.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1292) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:510) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:479) > at org.apache.hudi.HoodieWriteClient.commit(HoodieWriteClient.java:470) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:152) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-526) inline compact not work
[ https://issues.apache.org/jira/browse/HUDI-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-526: Labels: pull-request-available (was: ) > inline compact not work > --- > > Key: HUDI-526 > URL: https://issues.apache.org/jira/browse/HUDI-526 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Compaction >Reporter: liujianhui >Priority: Minor > Labels: pull-request-available > > hoodie.compact.inline set as true > hoodie.index.type set as INMEMEORY > > compact not occur after dela commit > {code} > 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Checking if compaction needs > to be run on file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Compacting merge on read table > file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO FileSystemViewManager: Creating InMemory based view > for basePath file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient > from file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: > [file:///], Config:[Configuration: core-default.xml, core-site.xml, > mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, > hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: > [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2] > 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from > file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) > from file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline > for file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants > [[20200109181330__deltacommit__COMPLETED], > [2020011017__deltacommit__COMPLETED], > [20200110171526__deltacommit__COMPLETED], > [20200113105844__deltacommit__COMPLETED], > [20200113145851__deltacommit__COMPLETED], > [20200113155502__deltacommit__COMPLETED], > [20200113164342__deltacommit__COMPLETED]] > 20/01/13 16:43:43 INFO HoodieRealtimeTableCompactor: Compacting > file:///tmp/hudi_cow_table_read with commit 20200113164343 > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient > from file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: > [file:///], Config:[Configuration: core-default.xml, core-site.xml, > mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, > hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: > [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2] > 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from > file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) > from file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline > for file:///tmp/hudi_cow_table_read > 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants > [[20200109181330__deltacommit__COMPLETED], > [2020011017__deltacommit__COMPLETED], > [20200110171526__deltacommit__COMPLETED], > [20200113105844__deltacommit__COMPLETED], > [20200113145851__deltacommit__COMPLETED], > [20200113155502__deltacommit__COMPLETED], > [20200113164342__deltacommit__COMPLETED]] > {code} > not compact time record in the .hoodie path -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement
wangxianghu closed pull request #1217: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1217 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] liujianhuiouc opened a new pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle
liujianhuiouc opened a new pull request #1218: [HUDI-526] add parition meta file in HoodieAppendHandle URL: https://github.com/apache/incubator-hudi/pull/1218 compact will retrieve the partition path which has partition meta file at that path add partition meta file in the partition path ``` -rw-r--r-- 1 liujianhui wheel2576 1 10 17:00 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.2_2-149-210 -rw-r--r-- 1 liujianhui wheel2886 1 10 17:15 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.3_1-160-226 -rw-r--r-- 1 liujianhui wheel2571 1 13 10:58 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.4_1-7-12 -rw-r--r-- 1 liujianhui wheel2572 1 13 14:58 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.5_1-7-12 -rw-r--r-- 1 liujianhui wheel2572 1 13 15:55 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.6_1-7-12 -rw-r--r-- 1 liujianhui wheel2576 1 13 16:43 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.7_1-18-29 -rw-r--r-- 1 liujianhui wheel2571 1 13 19:09 .5e8241a6-7844-4c8e-8428-966438424640-0_20200109181330.log.8_1-7-12 -rw-r--r-- 1 liujianhui wheel 93 1 13 19:09 .hoodie_partition_metadata -rw-r--r-- 1 liujianhui wheel 439809 1 13 19:09 5e8241a6-7844-4c8e-8428-966438424640-0_0-12-20_20200113190955.parquet ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-397) Normalize log print statement
[ https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-397: Labels: pull-request-available (was: ) > Normalize log print statement > - > > Key: HUDI-397 > URL: https://issues.apache.org/jira/browse/HUDI-397 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: wangxianghu >Priority: Major > Labels: pull-request-available > > In test suite module, there are many logging statements looks like this > pattern: > {code:java} > log.info(String.format("- inserting input data %s > --", this.getName())); > {code} > IMO, it's not a good design. We need to refactor it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement
wangxianghu opened a new pull request #1217: [HUDI-397] Normalize log print statement URL: https://github.com/apache/incubator-hudi/pull/1217 ## What is the purpose of the pull request *Normalize log print statement* *Redo udi-test-suite log statements using SLF4J* ## Brief change log *Normalize log print statement* *Redo udi-test-suite log statements using SLF4J* ## Verify this pull request This pull request should be covered by existing tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-397) Normalize log print statement
[ https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-397: - Status: In Progress (was: Open) > Normalize log print statement > - > > Key: HUDI-397 > URL: https://issues.apache.org/jira/browse/HUDI-397 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: wangxianghu >Priority: Major > > In test suite module, there are many logging statements looks like this > pattern: > {code:java} > log.info(String.format("- inserting input data %s > --", this.getName())); > {code} > IMO, it's not a good design. We need to refactor it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-397) Normalize log print statement
[ https://issues.apache.org/jira/browse/HUDI-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014226#comment-17014226 ] wangxianghu commented on HUDI-397: -- Hi [~jotarada], 3 days has passed since you asked for this issue, and I haven't received any reply from you. I am not sure whether you are still focus on this issue. So, In order to fix this issue in time, I picked this up again. Sorry for this inconvenience, I am sure you can find another issue to practice. > Normalize log print statement > - > > Key: HUDI-397 > URL: https://issues.apache.org/jira/browse/HUDI-397 > Project: Apache Hudi (incubating) > Issue Type: Sub-task > Components: Testing >Reporter: vinoyang >Assignee: wangxianghu >Priority: Major > > In test suite module, there are many logging statements looks like this > pattern: > {code:java} > log.info(String.format("- inserting input data %s > --", this.getName())); > {code} > IMO, it's not a good design. We need to refactor it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-526) inline compact not work
liujianhui created HUDI-526: --- Summary: inline compact not work Key: HUDI-526 URL: https://issues.apache.org/jira/browse/HUDI-526 Project: Apache Hudi (incubating) Issue Type: Bug Components: Compaction Reporter: liujianhui hoodie.compact.inline set as true hoodie.index.type set as INMEMEORY compact not occur after dela commit {code} 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Checking if compaction needs to be run on file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieMergeOnReadTable: Compacting merge on read table file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO FileSystemViewManager: Creating InMemory based view for basePath file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2] 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) from file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants [[20200109181330__deltacommit__COMPLETED], [2020011017__deltacommit__COMPLETED], [20200110171526__deltacommit__COMPLETED], [20200113105844__deltacommit__COMPLETED], [20200113145851__deltacommit__COMPLETED], [20200113155502__deltacommit__COMPLETED], [20200113164342__deltacommit__COMPLETED]] 20/01/13 16:43:43 INFO HoodieRealtimeTableCompactor: Compacting file:///tmp/hudi_cow_table_read with commit 20200113164343 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@6a24b9e2] 20/01/13 16:43:43 INFO HoodieTableConfig: Loading table properties from file:/tmp/hudi_cow_table_read/.hoodie/hoodie.properties 20/01/13 16:43:43 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=org.apache.hudi.common.model.TimelineLayoutVersion@20) from file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///tmp/hudi_cow_table_read 20/01/13 16:43:43 INFO HoodieActiveTimeline: Loaded instants [[20200109181330__deltacommit__COMPLETED], [2020011017__deltacommit__COMPLETED], [20200110171526__deltacommit__COMPLETED], [20200113105844__deltacommit__COMPLETED], [20200113145851__deltacommit__COMPLETED], [20200113155502__deltacommit__COMPLETED], [20200113164342__deltacommit__COMPLETED]] {code} not compact time record in the .hoodie path -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] leesf commented on issue #1216: [HUDI-525]
leesf commented on issue #1216: [HUDI-525] URL: https://github.com/apache/incubator-hudi/pull/1216#issuecomment-573590408 Thanks for opening the contribution @liujianhuiouc ! would you please change the title and follow the guide https://hudi.apache.org/contributing.html#life-of-a-contributor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc URL: https://github.com/apache/incubator-hudi/pull/1200 ## What is the purpose of the pull request In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HUDI-525) inserts info miss in delta_commit_inflight meta file
[ https://issues.apache.org/jira/browse/HUDI-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-525: Labels: pull-request-available (was: ) > inserts info miss in delta_commit_inflight meta file > > > Key: HUDI-525 > URL: https://issues.apache.org/jira/browse/HUDI-525 > Project: Apache Hudi (incubating) > Issue Type: Bug >Reporter: liujianhui >Priority: Minor > Labels: pull-request-available > > should add insert info in WorkInfoStat > {code} > private void saveWorkloadProfileMetadataToInflight(WorkloadProfile profile, > HoodieTable table, String commitTime) > throws HoodieCommitException { > try { > HoodieCommitMetadata metadata = new HoodieCommitMetadata(); > profile.getPartitionPaths().forEach(path -> { > WorkloadStat partitionStat = profile.getWorkloadStat(path.toString()); > partitionStat.getUpdateLocationToCount().forEach((key, value) -> { > HoodieWriteStat writeStat = new HoodieWriteStat(); > writeStat.setFileId(key); > // TODO : Write baseCommitTime is possible here ? > writeStat.setPrevCommit(value.getKey()); > writeStat.setNumUpdateWrites(value.getValue()); > metadata.addWriteStat(path.toString(), writeStat); > }); > }); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-hudi] liujianhuiouc opened a new pull request #1216: [HUDI-525]
liujianhuiouc opened a new pull request #1216: [HUDI-525] URL: https://github.com/apache/incubator-hudi/pull/1216 add insert info with insert records num in the HoodieCommitMeta. because of the file id is unknown, set it as empty string testing result: ``` "partitionToWriteStats" : { "americas/brazil/sao_paulo" : [ { "fileId" : "", "path" : null, "prevCommit" : null, "numWrites" : 0, "numDeletes" : 0, "numUpdateWrites" : 0, "numInserts" : 3, "totalWriteBytes" : 0, "totalWriteErrors" : 0, "tempPath" : null, "partitionPath" : null, "totalLogRecords" : 0, "totalLogFilesCompacted" : 0, "totalLogSizeCompacted" : 0, "totalUpdatedRecordsCompacted" : 0, "totalLogBlocks" : 0, "totalCorruptLogBlock" : 0, "totalRollbackBlocks" : 0, "fileSizeInBytes" : 0 } ], ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12
zhedoubushishi commented on a change in pull request #1109: [HUDI-238] - Migrating to Scala 2.12 URL: https://github.com/apache/incubator-hudi/pull/1109#discussion_r365680930 ## File path: hudi-utilities/pom.xml ## @@ -28,14 +28,52 @@ ${project.parent.basedir} +2.0.0 +2.12 + + + + + net.alchim31.maven + scala-maven-plugin + ${scala-maven-plugin.version} + + + org.apache.maven.plugins + maven-compiler-plugin + + + + org.jacoco jacoco-maven-plugin + +net.alchim31.maven +scala-maven-plugin + + +scala-compile-first +process-resources + + add-source + compile + + + +scala-test-compile +process-test-resources + + testCompile + + + + Review comment: > hi @zhedoubushishi, I couldn't find a way to use `spark-streaming-kafka-0-10_2.12`, because this jar does not include test classes. This is the reason why I copied and adapted those 3 Scala files (`KafkaTestUtils.scala`, `ShutdownHookManager.scala` and `Utils.scala`). As far as my maven knowledge goes, the only way to use `https://github.com/apache/spark/blob/master/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala` would be to build a test-jar as described here: https://maven.apache.org/plugins/maven-jar-plugin/examples/create-test-jar.html. I can't find such jar anywhere. > Do you have other suggestions or maybe I'm missing something? Try appending this code in the hudi-utilities/pom.xml. It works for me. ``` org.apache.spark spark-streaming-kafka-0-10_2.12 ${spark.version} tests ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (HUDI-295) Do one-time cleanup of Hudi git history
[ https://issues.apache.org/jira/browse/HUDI-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014104#comment-17014104 ] Pratyaksh Sharma commented on HUDI-295: --- [~vbalaji] Few of my commits are not getting counted in my contributions ([https://github.com/apache/incubator-hudi/graphs/contributors]) in sense the number of commits does not match my actual number of commits here ([https://github.com/apache/incubator-hudi/commits?author=pratyakshsharma]). The reason being those commits were committed with a different mail id (pratyakshsharma@.local) which github does not take into account for counting contributions. So whenever we update this git history, I would like to get my email id changed for few commits. I contacted github support and they have shared a script with me already for doing the same. Please let me know in case of any questions. > Do one-time cleanup of Hudi git history > --- > > Key: HUDI-295 > URL: https://issues.apache.org/jira/browse/HUDI-295 > Project: Apache Hudi (incubating) > Issue Type: Task > Components: Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > Fix For: 0.5.1 > > > https://lists.apache.org/thread.html/dc6eb516e248088dac1a2b5c9690383dfe2eb3912f76bbe9dd763c2b@ -- This message was sent by Atlassian Jira (v8.3.4#803005)