[GitHub] [incubator-hudi] eisig edited a comment on issue #789: HoodieMergeOnReadTable rollback hangs
eisig edited a comment on issue #789: HoodieMergeOnReadTable rollback hangs URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-512675011 The timeline seems not work as expect . I rerun the demo https://hudi.apache.org/docker_demo.html Step 6(a) the ro and rt view return the same result. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] eisig commented on issue #789: HoodieMergeOnReadTable rollback hangs
eisig commented on issue #789: HoodieMergeOnReadTable rollback hangs URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-512675011 The timeline seems not work as expect . I rerun the demo https://hudi.apache.org/docker_demo.html Step 6(a) the ro and rt view is same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] thesuperzapper commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath
thesuperzapper commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r304743409 ## File path: hoodie-integ-test/pom.xml ## @@ -6,35 +6,62 @@ 0.4.8-SNAPSHOT ../pom.xml - hoodie-integ-test 4.0.0 + + hoodie-integ-test + - - org.glassfish.jersey.connectors - jersey-apache-connector - 2.17 - + + org.glassfish.jersey.core jersey-server - 2.17 + + + org.glassfish.jersey.connectors + jersey-apache-connector org.glassfish.jersey.containers jersey-container-servlet-core - 2.17 + + + Review comment: There are a lot of dependencies. But in general my approach has been to ensure we get the versions of packages which we specify in the base pom. (Unless there is some issue which requires precedence, in which case I have usually left a comment) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] thesuperzapper commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath
thesuperzapper commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r304743065 ## File path: hoodie-hadoop-mr/pom.xml ## @@ -118,11 +127,6 @@ - Review comment: I think this was a mistake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] garyli1019 opened a new pull request #795: HUDI-171 delete tmp file after split merge failure
garyli1019 opened a new pull request #795: HUDI-171 delete tmp file after split merge failure URL: https://github.com/apache/incubator-hudi/pull/795 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] cdmikechen commented on issue #757: spark-hoodie-bundle using hive-serde to sync hive table(Hive2.3.5)
cdmikechen commented on issue #757: spark-hoodie-bundle using hive-serde to sync hive table(Hive2.3.5) URL: https://github.com/apache/incubator-hudi/issues/757#issuecomment-512618204 @xuFabius 你是在集群上运行时候遇到的这个错误,还是在你本地IDE上测试的时候遇到的错误? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath
vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r304692830 ## File path: hoodie-integ-test/pom.xml ## @@ -43,69 +70,47 @@ test-jar test - - org.awaitility - awaitility - 3.1.2 - test - com.uber.hoodie hoodie-spark ${project.version} tests test-jar test - - - org.glassfish.** - * - - - - - com.google.guava - guava - 20.0 - test + + com.fasterxml.jackson.core jackson-annotations - 2.6.4 test com.fasterxml.jackson.core jackson-databind - 2.6.4 test com.fasterxml.jackson.datatype jackson-datatype-guava - 2.9.4 + ${fasterxml.version} Review comment: can't we inherit from parent like usual? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath
vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302762896 ## File path: hoodie-cli/pom.xml ## @@ -159,67 +176,51 @@ spark-sql_2.11 + - com.jakewharton.fliptables - fliptables - 1.0.2 + commons-dbcp + commons-dbcp - log4j - log4j - ${log4j.version} + org.springframework.shell + spring-shell + ${spring.shell.version} - com.uber.hoodie - hoodie-hive - ${project.version} + de.vandermeer + asciitable + 0.2.5 - com.uber.hoodie - hoodie-client - ${project.version} + com.jakewharton.fliptables + fliptables + 1.0.2 + + joda-time + joda-time Review comment: removed version This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath
vinothchandar commented on a change in pull request #780: Fixes HUDI-172 : Cleanup Maven POM/Classpath URL: https://github.com/apache/incubator-hudi/pull/780#discussion_r302763536 ## File path: hoodie-client/pom.xml ## @@ -78,6 +79,71 @@ hoodie-timeline-service ${project.version} + + + + log4j + log4j + + + + + org.apache.parquet + parquet-avro + + + org.apache.parquet Review comment: removed already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI
vinothchandar edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-512610891 @NetsanetGeb what time works for you.. are you on slack? we can coordinate 1-1 there.. Next week works for me. I am trying to establish a baseline using the TestDataGenerator/DeltaStreamer, in the meantime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI
vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-512610891 @NetsanetGeb what time works for you.. are you on slack? we can coordinate 1-1 there.. I am trying to establish a baseline using the TestDataGenerator/DeltaStreamer, in the meantime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bhasudha commented on issue #689: [HUDI-25] Optimize HoodieInputFormat.listStatus for faster Hive Incremental queries
bhasudha commented on issue #689: [HUDI-25] Optimize HoodieInputFormat.listStatus for faster Hive Incremental queries URL: https://github.com/apache/incubator-hudi/pull/689#issuecomment-512528018 I was able to successfully cross verify the query results between the current HoodieInputFormat and this new HoodieInputFormat for few Uber production tables using spark. I ran different snapshot queries on MOR tables that has count(*), group by's, joins etc. The query latencies were also comparable. For Incremental queries I can't test it yet, without changing the jar in Hive MetaStore. I will be doing that next. My plan is to have that tested in staging and then gradually rolling it to production. @n3nash @vinothchandar ^^ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #689: [HUDI-25] Optimize HoodieInputFormat.listStatus for faster Hive Incremental queries
bhasudha commented on a change in pull request #689: [HUDI-25] Optimize HoodieInputFormat.listStatus for faster Hive Incremental queries URL: https://github.com/apache/incubator-hudi/pull/689#discussion_r304590801 ## File path: hoodie-hadoop-mr/src/test/java/com/uber/hoodie/hadoop/HoodieInputFormatTest.java ## @@ -209,6 +231,22 @@ public void testPredicatePushDown() throws IOException { commit2, 2, 10); } + @Test + public void testgetIncrementalTableNames() throws IOException { Review comment: done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] garyli1019 commented on issue #768: No Space Left On Device for upsert
garyli1019 commented on issue #768: No Space Left On Device for upsert URL: https://github.com/apache/incubator-hudi/issues/768#issuecomment-512514758 https://issues.apache.org/jira/browse/HUDI-171 @vinothchandar In my cluster set up, all the spark shuffle services are not using `/tmp`, so I think those files are left behind by hudi. Example of a file left in `/tmp`: `-rw-r- 1 u_ops 168M Jul 14 18:10 d7b2a7a3-5706-4ffd-90cb-70c6650ef1e4` I think we can find a way to predict the file size before actually writing to tmp. It will be difficult to go back to the worker node to delete those files after the job failed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar merged pull request #794: Update writing_data for operations/deletes
vinothchandar merged pull request #794: Update writing_data for operations/deletes URL: https://github.com/apache/incubator-hudi/pull/794 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bhasudha commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present
bhasudha commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-51525 With PR [775](https://github.com/apache/incubator-hudi/pull/775) this issue seems to have been fixed. I was able to reproduce this error before the fix. After applying PR 775 could not reproduce it anymore. @amaranathv can you test this PR for empty path exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar opened a new pull request #794: Update writing_data for operations/deletes
vinothchandar opened a new pull request #794: Update writing_data for operations/deletes URL: https://github.com/apache/incubator-hudi/pull/794 - provided guidance for upsert vs insert vs bulk_insert - provided guidance for soft deletes vs hard deletes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bhasudha commented on issue #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation
bhasudha commented on issue #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation URL: https://github.com/apache/incubator-hudi/pull/793#issuecomment-512213882 Looks good to me. @bvaradar Looks like with PR - [700](https://github.com/apache/incubator-hudi/pull/700) the HoodieWrapperFileSystem is added to HoodieTableMetaClient now. Do we need to test other query engines as well for same issue as this touches query side? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation
vinothchandar commented on a change in pull request #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation URL: https://github.com/apache/incubator-hudi/pull/793#discussion_r304344989 ## File path: hoodie-common/src/main/java/com/uber/hoodie/common/io/storage/HoodieWrapperFileSystem.java ## @@ -121,13 +121,15 @@ public void initialize(URI uri, Configuration conf) throws IOException { // Remove 'hoodie-' prefix from path if (path.toString().startsWith(HOODIE_SCHEME_PREFIX)) { path = new Path(path.toString().replace(HOODIE_SCHEME_PREFIX, "")); + this.uri = path.toUri(); +} else { + this.uri = uri; } this.fileSystem = FSUtils.getFs(path.toString(), conf); // Do not need to explicitly initialize the default filesystem, its done already in the above Review comment: do we clean up lines 127-129? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation
vinothchandar commented on issue #793: Allow HoodieWrapperFileSystem to wrap other proxy file-system implementations with no getScheme implementation URL: https://github.com/apache/incubator-hudi/pull/793#issuecomment-512203745 @bhasudha can you also please review this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #789: HoodieMergeOnReadTable rollback hangs
vinothchandar edited a comment on issue #789: HoodieMergeOnReadTable rollback hangs URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-512201473 In (2), it seems like you are not seeing the delta commit data reflected? again how do we reconcile this with duplicates you were reporting on #779 ? do you think they are related? What query engine are you using in (2) to query rt table? (I noticed some issues in sparkSQL when running demo; trying to see if thats related).. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record?
vinothchandar commented on issue #779: HoodieDeltaStreamer may insert duplicate record? URL: https://github.com/apache/incubator-hudi/issues/779#issuecomment-512202751 >here are multiple log files with the file id = c87d3580-86fe-40f9-8f6c-7c95cc91caa6 but I don't see a corresponding parquet file we have to drill into whats causing this. at the moment, we don't index the log files, so we expect only updates to go there.. if not, it will result in duplicates.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #789: HoodieMergeOnReadTable rollback hangs
vinothchandar commented on issue #789: HoodieMergeOnReadTable rollback hangs URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-512201473 In (2), it seems like you are not seeing the delta commit data reflected? again how do we reconcile this with duplicates you were reporting on #779 ? do you think they are related? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-hudi] branch master updated: Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null (#792)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new 6efa163 Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null (#792) 6efa163 is described below commit 6efa16317c0f0f13798d739d9615dda24bf91bcf Author: n3nash AuthorDate: Wed Jul 17 03:25:54 2019 -0700 Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null (#792) --- .../java/com/uber/hoodie/common/util/HoodieAvroUtils.java | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/hoodie-common/src/main/java/com/uber/hoodie/common/util/HoodieAvroUtils.java b/hoodie-common/src/main/java/com/uber/hoodie/common/util/HoodieAvroUtils.java index 9b34fab..3a9443e 100644 --- a/hoodie-common/src/main/java/com/uber/hoodie/common/util/HoodieAvroUtils.java +++ b/hoodie-common/src/main/java/com/uber/hoodie/common/util/HoodieAvroUtils.java @@ -102,15 +102,15 @@ public class HoodieAvroUtils { List parentFields = new ArrayList<>(); Schema.Field commitTimeField = new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field commitSeqnoField = new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field recordKeyField = new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field partitionPathField = new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema.Field fileNameField = new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); parentFields.add(commitTimeField); parentFields.add(commitSeqnoField); @@ -119,7 +119,7 @@ public class HoodieAvroUtils { parentFields.add(fileNameField); for (Schema.Field field : schema.getFields()) { if (!isMetadataField(field.name())) { -Schema.Field newField = new Schema.Field(field.name(), field.schema(), field.doc(), null); +Schema.Field newField = new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultValue()); for (Map.Entry prop : field.getJsonProps().entrySet()) { newField.addProp(prop.getKey(), prop.getValue()); } @@ -135,7 +135,7 @@ public class HoodieAvroUtils { private static Schema initRecordKeySchema() { Schema.Field recordKeyField = new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, -METADATA_FIELD_SCHEMA, "", null); +METADATA_FIELD_SCHEMA, "", NullNode.getInstance()); Schema recordKeySchema = Schema.createRecord("HoodieRecordKey", "", "", false); recordKeySchema.setFields(Arrays.asList(recordKeyField)); return recordKeySchema;
[GitHub] [incubator-hudi] vinothchandar merged pull request #792: Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null
vinothchandar merged pull request #792: Fixing default value for avro 1.7 which assumes NULL value instead of a jsonnode that is null URL: https://github.com/apache/incubator-hudi/pull/792 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI
NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-512140516 Yes, you can extract data from [IPUMS USA](https://usa.ipums.org/usa/) to run the workload locally. I am not allowed to share the files i downloaded from there. Hence, You can extract the dataset from their site by specifying the column fields that you want in a csv fromat and later change it to JSON for using JSON as a source class. Am also glad to do a video call on time thats convenient for the both of us may be on weekends or next week to debug it together. Thanks, This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services