[GitHub] [incubator-hudi] wannaberich edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
wannaberich edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-530228890 @cdmikechen I just tried to run some jobs on dataproc (GCP) and got the same exceptions. Then I m

[GitHub] [incubator-hudi] wannaberich commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
wannaberich commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-530228890 @cdmikechen I just tried to run some jobs and got the same exceptions. Then I moved bundles from %SPARK_

[GitHub] [incubator-hudi] simonqin commented on issue #883: can we set table.consume.mode equal incremental or latest with presto session?

2019-09-10 Thread GitBox
simonqin commented on issue #883: can we set table.consume.mode equal incremental or latest with presto session? URL: https://github.com/apache/incubator-hudi/issues/883#issuecomment-530190198 > It could work since it calls the input format anyway ultimately. But dont know of anyone whos t

[GitHub] [incubator-hudi] yihua commented on issue #884: [HUDI-240] Translate Use Cases page

2019-09-10 Thread GitBox
yihua commented on issue #884: [HUDI-240] Translate Use Cases page URL: https://github.com/apache/incubator-hudi/pull/884#issuecomment-530176750 @leesf I'll take a look in a day, as I'm traveling :) This is an automated messag

[GitHub] [incubator-hudi] ggeligible commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
ggeligible commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530140064 Hello @vinothchandar , Could you please add kate...@eligible.com as well. It would be really helpful if you could

[GitHub] [incubator-hudi] ankitdimania-eligible commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
ankitdimania-eligible commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530138249 Hi @vinothchandar, can you please add ankitdima...@eligible.com to the slack group. thank you

[GitHub] [incubator-hudi] ggeligible commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
ggeligible commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530131990 Hello @vinothchandar, Could you please add g...@eligible.com to the slack group as well. Thank you.

[GitHub] [incubator-hudi] thesuperzapper closed pull request #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-09-10 Thread GitBox
thesuperzapper closed pull request #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0... URL: https://github.com/apache/incubator-hudi/pull/638 This is an automated message from the Apache Git Service. To resp

[GitHub] [incubator-hudi] thesuperzapper commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-09-10 Thread GitBox
thesuperzapper commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0... URL: https://github.com/apache/incubator-hudi/pull/638#issuecomment-530116810 @vinothchandar I don't have much time over the next few weeks, so feel free to run ahead. This PR is n

[GitHub] [incubator-hudi] vinothchandar commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-09-10 Thread GitBox
vinothchandar commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0... URL: https://github.com/apache/incubator-hudi/pull/638#issuecomment-530110560 #873 is also up. and close to merging.. @thesuperzapper still interested in driving this? --

[GitHub] [incubator-hudi] rorra commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
rorra commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530076263 Can I be added to the slack group? ro...@rorra.com.ar Thank you ---

[GitHub] [incubator-hudi] rod-eligible removed a comment on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
rod-eligible removed a comment on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530075729 Can I be added to the slack group: ro...@rorra.com.ar Thank you

[GitHub] [incubator-hudi] rod-eligible commented on issue #143: Tracking ticket for folks to be added to slack group

2019-09-10 Thread GitBox
rod-eligible commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-530075729 Can I be added to the slack group: ro...@rorra.com.ar Thank you

[jira] [Created] (HUDI-242) Support Seamless bootstrap of legacy datasets to Hudi

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
BALAJI VARADARAJAN created HUDI-242: --- Summary: Support Seamless bootstrap of legacy datasets to Hudi Key: HUDI-242 URL: https://issues.apache.org/jira/browse/HUDI-242 Project: Apache Hudi (incubating

[incubator-hudi] branch pom-bundle-cleanup updated (83b7bc6 -> 5fad0ae)

2019-09-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch pom-bundle-cleanup in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. discard 83b7bc6 [HUDI-143] Excluding javax.* from utilities and spark bundles omit ffe67ac [HUDI-15

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations

2019-09-10 Thread GitBox
vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations URL: https://github.com/apache/incubator-hudi/pull/873#discussion_r322902762 ## File path: hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java ###

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations

2019-09-10 Thread GitBox
vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations URL: https://github.com/apache/incubator-hudi/pull/873#discussion_r322900645 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations

2019-09-10 Thread GitBox
vinothchandar commented on a change in pull request #873: [HUDI-159] Redesigning bundles for lighter-weight integrations URL: https://github.com/apache/incubator-hudi/pull/873#discussion_r322900301 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/utils/HiveUtil.java ###

[GitHub] [incubator-hudi] tooptoop4 edited a comment on issue #845: how to store null value in columns?

2019-09-10 Thread GitBox
tooptoop4 edited a comment on issue #845: how to store null value in columns? URL: https://github.com/apache/incubator-hudi/issues/845#issuecomment-530057096 @Gowthamsb12 i want to be able to have nulls in rows to signify no data as this has different meaning to rows with 0. ie source file

[GitHub] [incubator-hudi] tooptoop4 commented on issue #845: how to store null value in columns?

2019-09-10 Thread GitBox
tooptoop4 commented on issue #845: how to store null value in columns? URL: https://github.com/apache/incubator-hudi/issues/845#issuecomment-530057096 @Gowthamsb12 i want to be able to have nulls in rows to signify no data as this has different meaning to rows with 0. ie source file may hav

[GitHub] [incubator-hudi] bhasudha commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0...

2019-09-10 Thread GitBox
bhasudha commented on issue #638: [HUDI-12] Upgrade to Spark 2.4, Avro 1.8.2, Parquet 1.10.0... URL: https://github.com/apache/incubator-hudi/pull/638#issuecomment-530047896 +1 testing across spark versions would be useful On Thu, Aug 29, 2019 at 9:52 AM vinoth chandar wrote:

[GitHub] [incubator-hudi] vinothchandar commented on issue #879: Hive Sync Error when creating a table with partition

2019-09-10 Thread GitBox
vinothchandar commented on issue #879: Hive Sync Error when creating a table with partition URL: https://github.com/apache/incubator-hudi/issues/879#issuecomment-530027773 I am bit confused.. Can you paste /update the full stacktrace without replacing text? I dont think the sql is adding p

[jira] [Created] (HUDI-241) Track per column level statistics for each file

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
BALAJI VARADARAJAN created HUDI-241: --- Summary: Track per column level statistics for each file Key: HUDI-241 URL: https://issues.apache.org/jira/browse/HUDI-241 Project: Apache Hudi (incubating)

[jira] [Updated] (HUDI-62) Add metrics around IOHandle times #297

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-62: --- Labels: pull-request-available realtime-data-lakes (was: pull-request-available) > Add metrics

[jira] [Updated] (HUDI-86) Add indexing support to the log file format

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-86: --- Labels: realtime-data-lakes (was: ) > Add indexing support to the log file format > ---

[jira] [Updated] (HUDI-106) Dynamically tune bloom filter entries

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-106: Labels: realtime-data-lakes (was: ) > Dynamically tune bloom filter entries > --

[jira] [Updated] (HUDI-90) Explore ways of indexing record keys in addition to bloom filters

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-90: --- Labels: realtime-data-lakes (was: ) > Explore ways of indexing record keys in addition to bloom

[jira] [Updated] (HUDI-56) Dynamically configure the number of entries in BloomFilter index based on size of the record #70

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-56: --- Labels: realtime-data-lakes (was: ) > Dynamically configure the number of entries in BloomFilte

[jira] [Updated] (HUDI-84) Benchmark write/read paths on Hudi vs non-Hudi datasets

2019-09-10 Thread BALAJI VARADARAJAN (Jira)
[ https://issues.apache.org/jira/browse/HUDI-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BALAJI VARADARAJAN updated HUDI-84: --- Labels: realtime-data-lakes (was: ) > Benchmark write/read paths on Hudi vs non-Hudi datasets >

[jira] [Commented] (HUDI-233) Redo log statements using SLF4J

2019-09-10 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926772#comment-16926772 ] Vinoth Chandar commented on HUDI-233: - Renamed..!  > Redo log statements using SLF4J

[jira] [Updated] (HUDI-233) Redo log statements using SLF4J

2019-09-10 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-233: Summary: Redo log statements using SLF4J (was: Redo log statements using {} variable substitution)

[jira] [Comment Edited] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2019-09-10 Thread Davis Eric Broda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926712#comment-16926712 ] Davis Eric Broda edited comment on HUDI-238 at 9/10/19 2:58 PM: -

[jira] [Commented] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2019-09-10 Thread Davis Eric Broda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926712#comment-16926712 ] Davis Eric Broda commented on HUDI-238: --- My first instinct was to swap out the kafka

[GitHub] [incubator-hudi] leesf commented on issue #884: [HUDI-240] Translate Use Cases page

2019-09-10 Thread GitBox
leesf commented on issue #884: [HUDI-240] Translate Use Cases page URL: https://github.com/apache/incubator-hudi/pull/884#issuecomment-529968367 @yihua PTAL when you have time. Appreciate! This is an automated message from the

[GitHub] [incubator-hudi] leesf commented on a change in pull request #884: [HUDI-240] Translate Use Cases page

2019-09-10 Thread GitBox
leesf commented on a change in pull request #884: [HUDI-240] Translate Use Cases page URL: https://github.com/apache/incubator-hudi/pull/884#discussion_r322783732 ## File path: docs/use_cases.cn.md ## @@ -4,73 +4,65 @@ keywords: hudi, data ingestion, etl, real time, use ca

[jira] [Updated] (HUDI-240) Translate Use Cases page

2019-09-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-240: Labels: pull-request-available (was: ) > Translate Use Cases page > > >

[GitHub] [incubator-hudi] leesf opened a new pull request #884: [HUDI-240] Translate Use Cases page

2019-09-10 Thread GitBox
leesf opened a new pull request #884: [HUDI-240] Translate Use Cases page URL: https://github.com/apache/incubator-hudi/pull/884 see jira [HUDI-240](https://jira.apache.org/jira/browse/HUDI-240) This is an automated message fr

[jira] [Commented] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2019-09-10 Thread Davis Eric Broda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926667#comment-16926667 ] Davis Eric Broda commented on HUDI-238: --- Swapping out the kafka doesn't actually look

[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-529839338 It always happens when sparksession is started in a JVM environment and tasks are submitted to spa

[jira] [Commented] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2019-09-10 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926652#comment-16926652 ] Vinoth Chandar commented on HUDI-238: - Okay.. 0.8 is very old anyway. We can try upgrad

[GitHub] [incubator-hudi] vinothchandar commented on issue #881: Cross compilation for scala 2.12.x ?

2019-09-10 Thread GitBox
vinothchandar commented on issue #881: Cross compilation for scala 2.12.x ? URL: https://github.com/apache/incubator-hudi/issues/881#issuecomment-529937422 yes. lets continue on the jira.! This is an automated message from th

[GitHub] [incubator-hudi] vinothchandar closed issue #881: Cross compilation for scala 2.12.x ?

2019-09-10 Thread GitBox
vinothchandar closed issue #881: Cross compilation for scala 2.12.x ? URL: https://github.com/apache/incubator-hudi/issues/881 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [incubator-hudi] vinothchandar commented on issue #883: can we set table.consume.mode equal incremental or latest with presto session?

2019-09-10 Thread GitBox
vinothchandar commented on issue #883: can we set table.consume.mode equal incremental or latest with presto session? URL: https://github.com/apache/incubator-hudi/issues/883#issuecomment-529936416 It could work since it calls the input format anyway ultimately. But dont know of anyone who

[GitHub] [incubator-hudi] Gowthamsb12 commented on issue #845: how to store null value in columns?

2019-09-10 Thread GitBox
Gowthamsb12 commented on issue #845: how to store null value in columns? URL: https://github.com/apache/incubator-hudi/issues/845#issuecomment-529880460 I faced the same issue due to "double and null" values in my Hoodie partition column. Fix: For example : 1.joinedDF.col("STAR

[GitHub] [incubator-hudi] cdmikechen commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
cdmikechen commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-529851992 look some comment in `org.apache.spark.serializer.JavaSerializer` or `org.apache.spark.serializer.Seriali

[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-529851992 looking some comment in `org.apache.spark.serializer.JavaSerializer` or `org.apache.spark.serializ

[GitHub] [incubator-hudi] yanghua commented on issue #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases

2019-09-10 Thread GitBox
yanghua commented on issue #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases URL: https://github.com/apache/incubator-hudi/pull/871#issuecomment-529846305 cc @vinothchandar --

[GitHub] [incubator-hudi] cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
cdmikechen edited a comment on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-529839338 It always happens when sparksession is started in a JVM environment and tasks are submitted to spa

[GitHub] [incubator-hudi] cdmikechen commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath

2019-09-10 Thread GitBox
cdmikechen commented on issue #869: Hudi Spark error when spark bundle jar is added to spark's classpath URL: https://github.com/apache/incubator-hudi/issues/869#issuecomment-529839338 It always happens when sparksession is started in a JVM environment and tasks are submitted to spark. I h

[GitHub] [incubator-hudi] yanghua commented on issue #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases

2019-09-10 Thread GitBox
yanghua commented on issue #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases URL: https://github.com/apache/incubator-hudi/pull/871#issuecomment-529810656 @vinothchandar I also find the subclasses of `

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client

2019-09-10 Thread GitBox
yanghua commented on a change in pull request #871: [HUDI-217] Provide a unified resource management class to standardize the resource allocation and release for hudi client test cases URL: https://github.com/apache/incubator-hudi/pull/871#discussion_r322589108 ## File path: hudi-