[GitHub] [incubator-hudi] lamber-ken commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-02 Thread GitBox
lamber-ken commented on pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-623052515 hi @lamber-ken, go ahead [HUDI-622](https://github.com/apache/incubator-hudi/pull/1343) This is an

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #266

2020-05-02 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.38 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [incubator-hudi] nsivabalan edited a comment on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-02 Thread GitBox
nsivabalan edited a comment on pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-623006320 @lamber-ken : looks like visibleForTesting is part of illegal import. So, do you know of a way to test private methods?

[GitHub] [incubator-hudi] nandini57 commented on issue #1582: [SUPPORT] PreCombineAndUpdate in Payload

2020-05-02 Thread GitBox
nandini57 commented on issue #1582: URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-623024459 i did think about this, but our schemas are heavily nested and contain more than 5000 cols even for a very decent one .Need to think more around it. If i rethink the

[GitHub] [incubator-hudi] bvaradar commented on issue #1582: [SUPPORT] PreCombineAndUpdate in Payload

2020-05-02 Thread GitBox
bvaradar commented on issue #1582: URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-623022854 Thanks for the details. One of the primary contract within Hudi is the uniqueness of record key within partition/dataset. Instead, can you materialize the grouping within

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash removed a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash removed a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-622486217 This issue is also present in Hudi-0.6.0-SNAPSHOT (as of 2020-MAY-01). Hudi DeltaStreamer (run-once mode) seems to update the Hudi metastore with no issues

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash edited a comment on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash edited a comment on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] WTa-hash commented on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
WTa-hash commented on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-623013007 [log_INITIAL.log](https://github.com/apache/incubator-hudi/files/4568939/log_INITIAL.log)

[GitHub] [incubator-hudi] nsivabalan commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-02 Thread GitBox
nsivabalan commented on pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-623006320 @lamber-ken : looks like visibleForTesting is part of illegal import. So, do you know of a way to whats to test private methods?

[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1578: Add Hudi changes for Presto MOR query support

2020-05-02 Thread GitBox
bhasudha commented on a change in pull request #1578: URL: https://github.com/apache/incubator-hudi/pull/1578#discussion_r418977289 ## File path: packaging/hudi-presto-bundle/pom.xml ## @@ -128,5 +133,18 @@ hudi-hadoop-mr-bundle ${project.version} + + +

[GitHub] [incubator-hudi] jvaesteves opened a new issue #1585: [SUPPORT] Delete Hudi commit history

2020-05-02 Thread GitBox
jvaesteves opened a new issue #1585: URL: https://github.com/apache/incubator-hudi/issues/1585 Hello everyone, I am currently testing Hudi as a deduplication mecanism for a streaming project, and it is working pretty good. But as I do not have any update to any row, keeping previous

[GitHub] [incubator-hudi] bhasudha commented on pull request #1578: Add Hudi changes for Presto MOR query support

2020-05-02 Thread GitBox
bhasudha commented on pull request #1578: URL: https://github.com/apache/incubator-hudi/pull/1578#issuecomment-622972166 > @bschell Thanks for your contribution. It would be better to file a jira issue to track this work. More details, please refer to Hudi's contribution guide:

[GitHub] [incubator-hudi] lamber-ken commented on pull request #1526: [HUDI-783] Add pyspark example in quickstart

2020-05-02 Thread GitBox
lamber-ken commented on pull request #1526: URL: https://github.com/apache/incubator-hudi/pull/1526#issuecomment-622971831 Thanks @EdwinGuo @vingov, LGTM, left two minor comments.  This is an automated message from the

[jira] [Updated] (HUDI-783) Add official python support to create hudi datasets using pyspark

2020-05-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-783: Labels: features pull-request-available (was: features) > Add official python support to create

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1526: [HUDI-783] Add pyspark example in quickstart

2020-05-02 Thread GitBox
lamber-ken commented on a change in pull request #1526: URL: https://github.com/apache/incubator-hudi/pull/1526#discussion_r418972065 ## File path: docs/_docs/1_1_quick_start_guide.md ## @@ -204,6 +213,224 @@ spark.sql("select uuid, partitionPath from

[GitHub] [incubator-hudi] nandini57 commented on issue #1582: [SUPPORT] PreCombineAndUpdate in Payload

2020-05-02 Thread GitBox
nandini57 commented on issue #1582: URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-622970820 My apologies. Let me try to explain.If i don't upsert the data with each batch where applicable,when i query back the table,it will have duplicates as batch "n" need to

[GitHub] [incubator-hudi] lamber-ken commented on pull request #1402: [HUDI-407] Adding Simple Index

2020-05-02 Thread GitBox
lamber-ken commented on pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-622962674 hi @nsivabalan, ci report some checkstyle errors, you can use following command in local env ``` mvn checkstyle:check ```

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-05-02 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r418965126 ## File path: hudi-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndexV2.java ## @@ -0,0 +1,235 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-05-02 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r418965087 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HoodieTimer.java ## @@ -69,4 +76,13 @@ public long endTimer() { }

[GitHub] [incubator-hudi] bhasudha commented on pull request #1580: adding check for table name for Append Save mode HUDI-852

2020-05-02 Thread GitBox
bhasudha commented on pull request #1580: URL: https://github.com/apache/incubator-hudi/pull/1580#issuecomment-622961553 @AakashPradeep Also can you change the PR title to start with [HUDI-852] ? This is an automated

[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1580: adding check for table name for Append Save mode HUDI-852

2020-05-02 Thread GitBox
bhasudha commented on a change in pull request #1580: URL: https://github.com/apache/incubator-hudi/pull/1580#discussion_r418964519 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -118,6 +118,12 @@ private[hudi] object

[GitHub] [incubator-hudi] xushiyan edited a comment on pull request #1584: fix schema provider issue

2020-05-02 Thread GitBox
xushiyan edited a comment on pull request #1584: URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-622926124 @vinothchandar please kindly verify the described scenario above. If this make sense, I may add a test case for it. In case I misunderstood the logic or use

[GitHub] [incubator-hudi] xushiyan commented on pull request #1584: fix schema provider issue

2020-05-02 Thread GitBox
xushiyan commented on pull request #1584: URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-622926124 @vinothchandar please kindly verify the described scenario above. In case I misunderstood some logic or use case, I'll close the PR.

[GitHub] [incubator-hudi] xushiyan opened a new pull request #1584: fix schema provider issue

2020-05-02 Thread GitBox
xushiyan opened a new pull request #1584: URL: https://github.com/apache/incubator-hudi/pull/1584 When no new data is fetched after reading from row source (like parquet), schema provider cannot be inferred and calling `getSchemaProvider()` results in HoodieException and requires schema

[GitHub] [incubator-hudi] bvaradar commented on issue #1555: [SUPPORT] Meet java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

2020-05-02 Thread GitBox
bvaradar commented on issue #1555: URL: https://github.com/apache/incubator-hudi/issues/1555#issuecomment-622896431 @allenzhg : Please reopen if you are still having the issue. This is an automated message from the Apache

[GitHub] [incubator-hudi] bvaradar commented on issue #1583: Small File Issue

2020-05-02 Thread GitBox
bvaradar commented on issue #1583: URL: https://github.com/apache/incubator-hudi/issues/1583#issuecomment-622894674 @selvarajperiyasamy : It looks like your per record size is really small. Hudi uses previous commit's statistics to guess future record sizes. For very first commit, it

[GitHub] [incubator-hudi] bvaradar commented on issue #1582: [SUPPORT] PreCombineAndUpdate in Payload

2020-05-02 Thread GitBox
bvaradar commented on issue #1582: URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-622848990 @nandini57 : The flag is for internal hudi logic to preserve old record when hudi is not able to create a valid updated record to write. I am not sure I am following

[GitHub] [incubator-hudi] bvaradar commented on issue #1581: [SUPPORT] Hive Metastore not in sync with Hudi Dataset using DataSource API

2020-05-02 Thread GitBox
bvaradar commented on issue #1581: URL: https://github.com/apache/incubator-hudi/issues/1581#issuecomment-622818282 @WTa-hash : Both deltastreamer and Spark DS are essntially using same code to sync to hive. Can you enable INFO logging for spark data-source write (both first and last) and

[GitHub] [incubator-hudi] bvaradar commented on issue #1579: [SUPPORT] https://github.com/YotpoLtd/metorikku/issues/290

2020-05-02 Thread GitBox
bvaradar commented on issue #1579: URL: https://github.com/apache/incubator-hudi/issues/1579#issuecomment-622764285 @tooptoop4 : Did you try the blob pattern as described in https://hudi.apache.org/docs/quick-start-guide.html#query-data For 0.4.6, it should be something like :

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-05-02 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r418922163 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] bvaradar commented on issue #1571: [SUPPORT] Hudi IllegalArgumentException Wrong FS

2020-05-02 Thread GitBox
bvaradar commented on issue #1571: URL: https://github.com/apache/incubator-hudi/issues/1571#issuecomment-622723355 @dh376 : I couldnt tell how the path changed with the description. Can you turn on debug log level in spark shell and attach the entire output. BTW, it looks like you

[GitHub] [incubator-hudi] bvaradar commented on issue #1564: update hudi meta in hive with no partition

2020-05-02 Thread GitBox
bvaradar commented on issue #1564: URL: https://github.com/apache/incubator-hudi/issues/1564#issuecomment-622681815 Closing this ticket. Please reopen if you are still having issues. This is an automated message from the