[GitHub] [incubator-hudi] luke-zhu opened a new pull request #818: Fix typo in hoodie-presto-bundle

2019-07-31 Thread GitBox
luke-zhu opened a new pull request #818: Fix typo in hoodie-presto-bundle URL: https://github.com/apache/incubator-hudi/pull/818 I was getting some java import error involving Jackson which was resolved after I removed https://mvnrepository.com/artifact/com.uber.hoodie/hoodie-presto-bundle

[GitHub] [incubator-hudi] anchalkataria commented on issue #796: Error hive sync via delta streamer

2019-07-31 Thread GitBox
anchalkataria commented on issue #796: Error hive sync via delta streamer URL: https://github.com/apache/incubator-hudi/issues/796#issuecomment-517127576 > @anchalkataria we have some leads on the null issue. we expect it to be fixed on master soon.. > > on your original registration

[GitHub] [incubator-hudi] vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
vinothchandar commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-517125538 @NetsanetGeb 2 comes from the configs you are setting? hoodie.upsert.shuffle.parallellism & hoodie.inser

[GitHub] [incubator-hudi] vinothchandar commented on issue #774: Matching question of the version in Spark and Hive2

2019-07-31 Thread GitBox
vinothchandar commented on issue #774: Matching question of the version in Spark and Hive2 URL: https://github.com/apache/incubator-hudi/issues/774#issuecomment-517124662 @cdmikechen can we have a call or can you write up how we can take a fresh look at the hive sync aspects? It definitel

[GitHub] [incubator-hudi] vinothchandar commented on issue #789: Demo : Unexpected result in some queries

2019-07-31 Thread GitBox
vinothchandar commented on issue #789: Demo : Unexpected result in some queries URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-517124343 @n3nash is debugging the join issue, which seems different? This

[GitHub] [incubator-hudi] vinothchandar commented on issue #796: Error hive sync via delta streamer

2019-07-31 Thread GitBox
vinothchandar commented on issue #796: Error hive sync via delta streamer URL: https://github.com/apache/incubator-hudi/issues/796#issuecomment-517124229 @n3nash can you paste the error you got hive syncing on the apache hive 2.x servers if any? ---

[GitHub] [incubator-hudi] vinothchandar commented on issue #796: Not able to use S3 as storage for Hudi dataset

2019-07-31 Thread GitBox
vinothchandar commented on issue #796: Not able to use S3 as storage for Hudi dataset URL: https://github.com/apache/incubator-hudi/issues/796#issuecomment-517123959 @anchalkataria we have some leads on the null issue. we expect it to be fixed on master soon.. on your original regi

[GitHub] [incubator-hudi] vinothchandar commented on issue #800: Performance tuning

2019-07-31 Thread GitBox
vinothchandar commented on issue #800: Performance tuning URL: https://github.com/apache/incubator-hudi/issues/800#issuecomment-517123657 hi.. any updates? This is an automated message from the Apache Git Service. To respond

[GitHub] [incubator-hudi] vinothchandar commented on issue #801: How to customize schema

2019-07-31 Thread GitBox
vinothchandar commented on issue #801: How to customize schema URL: https://github.com/apache/incubator-hudi/issues/801#issuecomment-517123554 Closing. Reopen new issues on JIRA or mailing list as needed This is an automated m

[GitHub] [incubator-hudi] vinothchandar closed issue #801: How to customize schema

2019-07-31 Thread GitBox
vinothchandar closed issue #801: How to customize schema URL: https://github.com/apache/incubator-hudi/issues/801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [incubator-hudi] cdmikechen opened a new issue #817: spark-submit with userClassPathFirst config error

2019-07-31 Thread GitBox
cdmikechen opened a new issue #817: spark-submit with userClassPathFirst config error URL: https://github.com/apache/incubator-hudi/issues/817 When I used spark-submit to run some codes like that(spark 2.4.3 and scala 2.11.12): ``` ../bin/spark-submit --master yarn --class xxx.xxx.Ma

[GitHub] [incubator-hudi] jackwang2 commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-31 Thread GitBox
jackwang2 commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-517089256 @n3nash No, I didn't. The main logic is for just global deduplication, and code is pasted

[GitHub] [incubator-hudi] n3nash commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-31 Thread GitBox
n3nash commented on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-517076737 It looks like the "Not an Avro data file" exception is thrown when there is a 0 byte stream rea

[GitHub] [incubator-hudi] n3nash edited a comment on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present

2019-07-31 Thread GitBox
n3nash edited a comment on issue #764: Hoodie 0.4.7: Error upserting bucketType UPDATE for partition #, No value present URL: https://github.com/apache/incubator-hudi/issues/764#issuecomment-517076737 It looks like the "Not an Avro data file" exception is thrown when there is a 0 byte str

[GitHub] [incubator-hudi] tweise commented on issue #816: [HUDI-121] Add lresende signing key to KEYS file

2019-07-31 Thread GitBox
tweise commented on issue #816: [HUDI-121] Add lresende signing key to KEYS file URL: https://github.com/apache/incubator-hudi/pull/816#issuecomment-517066426 The KEYS file needs to be added to the dist area, not here. This is

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #816: [HUDI-121] Add lresende signing key to KEYS file

2019-07-31 Thread GitBox
bvaradar commented on a change in pull request #816: [HUDI-121] Add lresende signing key to KEYS file URL: https://github.com/apache/incubator-hudi/pull/816#discussion_r309469510 ## File path: KEYS ## @@ -126,3 +126,286 @@ txTq7YpleWQhcz9+9Fruu7jA+l1pSUJSR0+DZegBOq+zWIHcZ

[GitHub] [incubator-hudi] lresende opened a new pull request #816: [HUDI-121] Add lresende signing key to KEYS file

2019-07-31 Thread GitBox
lresende opened a new pull request #816: [HUDI-121] Add lresende signing key to KEYS file URL: https://github.com/apache/incubator-hudi/pull/816 This is an automated message from the Apache Git Service. To respond to the mes

svn commit: r35085 - /dev/incubator/hudi/

2019-07-31 Thread lresende
Author: lresende Date: Wed Jul 31 22:45:37 2019 New Revision: 35085 Log: Adding release staging directory for Hudi Added: dev/incubator/hudi/

svn commit: r35084 - /release/incubator/hudi/

2019-07-31 Thread lresende
Author: lresende Date: Wed Jul 31 22:44:48 2019 New Revision: 35084 Log: Adding release directory for Hudi Added: release/incubator/hudi/

[GitHub] [incubator-hudi] vinothchandar commented on issue #803: [WIP] [Not For Merging] Demo automation with pom dep order fixes from PR-780

2019-07-31 Thread GitBox
vinothchandar commented on issue #803: [WIP] [Not For Merging] Demo automation with pom dep order fixes from PR-780 URL: https://github.com/apache/incubator-hudi/pull/803#issuecomment-517043325 have this code and #780 both testing in `pom-bundle-cleanup` branch.. Closing this. Will open a

[GitHub] [incubator-hudi] vinothchandar closed pull request #803: [WIP] [Not For Merging] Demo automation with pom dep order fixes from PR-780

2019-07-31 Thread GitBox
vinothchandar closed pull request #803: [WIP] [Not For Merging] Demo automation with pom dep order fixes from PR-780 URL: https://github.com/apache/incubator-hudi/pull/803 This is an automated message from the Apache Git Ser

[GitHub] [incubator-hudi] vinothchandar commented on issue #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links

2019-07-31 Thread GitBox
vinothchandar commented on issue #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links URL: https://github.com/apache/incubator-hudi/pull/815#issuecomment-517042646 No. have not seen them.. its auto generated content, so may be it ref

[GitHub] [incubator-hudi] vinothchandar merged pull request #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links

2019-07-31 Thread GitBox
vinothchandar merged pull request #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links URL: https://github.com/apache/incubator-hudi/pull/815 This is an automated mes

[incubator-hudi] branch asf-site updated: Fix formatting for new content in Writing Data page. Update hudi.incubator.apache.org website (#815)

2019-07-31 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 3190d6d Fix formatting for new cont

[GitHub] [incubator-hudi] n3nash commented on issue #814: Fix for realtime queries

2019-07-31 Thread GitBox
n3nash commented on issue #814: Fix for realtime queries URL: https://github.com/apache/incubator-hudi/pull/814#issuecomment-517026353 On another note, I debugged the issue with join queries as reported here : https://github.com/apache/incubator-hudi/issues/789 and found weird results (not

[GitHub] [incubator-hudi] n3nash commented on issue #814: Fix for realtime queries

2019-07-31 Thread GitBox
n3nash commented on issue #814: Fix for realtime queries URL: https://github.com/apache/incubator-hudi/pull/814#issuecomment-517025307 The code being removed was added to make Hive on Spark work. Due to a bug in Hive, Hive on Spark does not work seamlessly with RT tables. > Issue wi

[GitHub] [incubator-hudi] bvaradar commented on issue #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links

2019-07-31 Thread GitBox
bvaradar commented on issue #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links URL: https://github.com/apache/incubator-hudi/pull/815#issuecomment-517022364 @vinothchandar @n3nash : Fixed some doc formatting in Writing Data page an

[GitHub] [incubator-hudi] bvaradar opened a new pull request #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links

2019-07-31 Thread GitBox
bvaradar opened a new pull request #815: HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect new apache links URL: https://github.com/apache/incubator-hudi/pull/815 HUDI-186 Fix formatting for new content in Writing Data page. Update website to reflect n

[GitHub] [incubator-hudi] n3nash opened a new pull request #814: Fix for realtime queries

2019-07-31 Thread GitBox
n3nash opened a new pull request #814: Fix for realtime queries URL: https://github.com/apache/incubator-hudi/pull/814 - Fix realtime queries by removing COLUMN_ID and COLUMN_NAME cache in inputformat - These variables were cached to make Hive on Spark work -

[incubator-hudi] branch asf-site updated: HUDI-186 : Add missing Apache Links in hudi site

2019-07-31 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 018802c HUDI-186 : Add missing Ap

[GitHub] [incubator-hudi] n3nash merged pull request #813: HUDI-186 : Add missing Apache Links in hudi site

2019-07-31 Thread GitBox
n3nash merged pull request #813: HUDI-186 : Add missing Apache Links in hudi site URL: https://github.com/apache/incubator-hudi/pull/813 This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [incubator-hudi] bvaradar commented on issue #813: HUDI-186 : Add missing Apache Links in hudi site

2019-07-31 Thread GitBox
bvaradar commented on issue #813: HUDI-186 : Add missing Apache Links in hudi site URL: https://github.com/apache/incubator-hudi/pull/813#issuecomment-517005269 @vinothchandar @n3nash : Checked by running the website locally. Needed for making website checks all green. --

[GitHub] [incubator-hudi] bvaradar opened a new pull request #813: HUDI-186 : Add missing Apache Links in hudi site

2019-07-31 Thread GitBox
bvaradar opened a new pull request #813: HUDI-186 : Add missing Apache Links in hudi site URL: https://github.com/apache/incubator-hudi/pull/813 This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [incubator-hudi] n3nash commented on issue #812: KryoException: Unable to find class

2019-07-31 Thread GitBox
n3nash commented on issue #812: KryoException: Unable to find class URL: https://github.com/apache/incubator-hudi/issues/812#issuecomment-516965553 This looks more like a spark issue. So whenever spark shuffles data, if you choose kryo for serialization, one has to register java objects wit

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #805: fix DeltaStreamer writeConfig

2019-07-31 Thread GitBox
vinothchandar commented on a change in pull request #805: fix DeltaStreamer writeConfig URL: https://github.com/apache/incubator-hudi/pull/805#discussion_r309183898 ## File path: hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/DeltaSync.java ## @@ -

[GitHub] [incubator-hudi] bhasudha commented on issue #789: Demo : Unexpected result in some queries

2019-07-31 Thread GitBox
bhasudha commented on issue #789: Demo : Unexpected result in some queries URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-516820770 Oh this could also be causing the join issue then? This is an automated

[GitHub] [incubator-hudi] vinothchandar commented on issue #789: Demo : Unexpected result in some queries

2019-07-31 Thread GitBox
vinothchandar commented on issue #789: Demo : Unexpected result in some queries URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-516813828 With disabling of these static variables introduced, I can get the query to work now.. ``` diff --git a/hoodie-hadoop-m

[GitHub] [incubator-hudi] vinothchandar commented on issue #789: Demo : Unexpected result in some queries

2019-07-31 Thread GitBox
vinothchandar commented on issue #789: Demo : Unexpected result in some queries URL: https://github.com/apache/incubator-hudi/issues/789#issuecomment-516807719 I can reproduce this actually and what I see is that the user columns are removed off the column projection list,before being passe

[GitHub] [incubator-hudi] vinothchandar commented on issue #811: HUDI-182 : Adding HoodieCombineHiveInputFormat for COW tables

2019-07-31 Thread GitBox
vinothchandar commented on issue #811: HUDI-182 : Adding HoodieCombineHiveInputFormat for COW tables URL: https://github.com/apache/incubator-hudi/pull/811#issuecomment-516801178 @n3nash I think checkstyle is failing? what Hive version does this correspond to? can we also document that in

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb edited a comment on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![p

[GitHub] [incubator-hudi] NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI

2019-07-31 Thread GitBox
NetsanetGeb commented on issue #714: Performance Comparison of HoodieDeltaStreamer and DataSourceAPI URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-516753477 After i used hoodie 0.4.6 version, the performance improved and now its taking 4 minutes. ![per2](https:

[GitHub] [incubator-hudi] arw357 opened a new issue #812: KryoException: Unable to find class

2019-07-31 Thread GitBox
result com.esotericsoftware.kryo.KryoException: Unable to find class: hdfs://namenode:8020/test/20190731-091411-373/1564557251826_551/converted/A/4/2c5790b6-eb12-4c15-a84a-f287d9cd9984_1_20190731091435.parquetA/4 Serialization trace: deletePathPatterns (com.uber.hoodie.table.HoodieCopyOnWrit