[GitHub] [incubator-hudi] pratyakshsharma edited a comment on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-04-28 Thread GitBox
pratyakshsharma edited a comment on pull request #1566: URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-620661941 > I think it is not enough. Example: when consuming from kafka, schema might change midway. Example: we are reading 1 messages, schema will be fetched

[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-04-28 Thread GitBox
pratyakshsharma commented on pull request #1566: URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-620679844 > if we redid the schema provider implementations such that the schema is read each time from SR (schema registry) Do you have some plan around how to

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1566: URL: https://github.com/apache/incubator-hudi/pull/1566#discussion_r416694602 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -162,18 +162,23 @@ public

[GitHub] [incubator-hudi] vinothchandar commented on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-28 Thread GitBox
vinothchandar commented on pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-620688721 @lamber-ken the fetchRecordLocation() API and global indexing is not implemented..Do you plan to work on them as well?

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416667228 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] yanghua commented on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-28 Thread GitBox
yanghua commented on pull request #1100: URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-620691500 > @yanghua can you go through this PR and approve it ? Thanks for the great work. Will review it tomorrow.

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [HUDI-407] Adding Simple Index

2020-04-28 Thread GitBox
vinothchandar commented on a change in pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r416354734 ## File path: hudi-client/src/main/java/org/apache/hudi/index/HoodieIndex.java ## @@ -128,9 +133,26 @@ protected

[jira] [Closed] (HUDI-810) Migrate HoodieClientTestHarness to JUnit 5

2020-04-28 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-810. - Resolution: Done Done via master branch: 06dae30297ea02ab122c9029a54f7927e8212039 > Migrate

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1522: [HUDI-702]Add test for HoodieLogFileCommand

2020-04-28 Thread GitBox
yanghua commented on a change in pull request #1522: URL: https://github.com/apache/incubator-hudi/pull/1522#discussion_r416723032 ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestHoodieLogFileCommand.java ## @@ -0,0 +1,220 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] pratyakshsharma commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-04-28 Thread GitBox
pratyakshsharma commented on pull request #1566: URL: https://github.com/apache/incubator-hudi/pull/1566#issuecomment-620661941 > I think it is not enough. Example: when consuming from kafka, schema might change midway. Example: we are reading 1 messages, schema will be fetched on

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [WIP][HUDI-476] Add hudi-examples module

2020-04-28 Thread GitBox
vinothchandar commented on a change in pull request #1151: URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r416731242 ## File path: hudi-examples/src/main/scripts/delta-streamer-cluster ## @@ -0,0 +1,33 @@ +#!/usr/bin/env bash + +# Licensed to the Apache

[GitHub] [incubator-hudi] vinothchandar commented on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
vinothchandar commented on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620776648 @nandini57 It just has to be ordered, increasing/decreasing does not matter.. can be non-contiguous.

[GitHub] [incubator-hudi] vinothchandar commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-28 Thread GitBox
vinothchandar commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-620781382 @harshi2506 On master, there is no `CopyOnWriteLazyInsertIterable`

[GitHub] [incubator-hudi] nandini57 commented on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 commented on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620786548 Great,thanks Vinoth.Is Murmurhash of my businesskeys a good choice then? This is an automated message from

[GitHub] [incubator-hudi] nandini57 edited a comment on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 edited a comment on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620749380 Thanks Balaji. Yesterday , i did change the parameter to retain 40 commits and changed the _hoodie_record_key to include my business batch id column along with one

[GitHub] [incubator-hudi] bvaradar commented on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
bvaradar commented on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620806109 @nandini57, You can prefix with a timestamp like "" to get ordering benefits. From your description, it looks like you essentially want the table to be a log of all

[GitHub] [incubator-hudi] nandini57 edited a comment on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 edited a comment on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620749380 Thanks Balaji. Yesterday , i did change the parameter to retain 40 commits and changed the _hoodie_record_key to include my business batch id column along with one

[GitHub] [incubator-hudi] nandini57 edited a comment on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 edited a comment on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620749380 Thanks Balaji. Yesterday , i did change the parameter to retain 40 commits and changed the _hoodie_record_key to include my business batch id column along with one

[GitHub] [incubator-hudi] vinothchandar commented on issue #1228: No FileSystem for scheme: abfss

2020-04-28 Thread GitBox
vinothchandar commented on issue #1228: URL: https://github.com/apache/incubator-hudi/issues/1228#issuecomment-620775308 Closing in favor of JIRA This is an automated message from the Apache Git Service. To respond to the

[GitHub] [incubator-hudi] vinothchandar commented on issue #1531: run example

2020-04-28 Thread GitBox
vinothchandar commented on issue #1531: URL: https://github.com/apache/incubator-hudi/issues/1531#issuecomment-620775850 could you try master or 0.5.2? This is an automated message from the Apache Git Service. To respond to

[GitHub] [incubator-hudi] nandini57 commented on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 commented on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620749380 Thanks Balaji. Yesterday , i did change the parameter to retain 40 commits and changed the record key to include my business batch id column along with one of the other

[GitHub] [incubator-hudi] bvaradar commented on issue #1531: run example

2020-04-28 Thread GitBox
bvaradar commented on issue #1531: URL: https://github.com/apache/incubator-hudi/issues/1531#issuecomment-620798845 @c-f-cooper : You can use any of 0.5.1/0.5.2 or master to see the fix. This is an automated message from the

[GitHub] [incubator-hudi] bvaradar commented on issue #1564: update hudi meta in hive with no partition

2020-04-28 Thread GitBox
bvaradar commented on issue #1564: URL: https://github.com/apache/incubator-hudi/issues/1564#issuecomment-620808367 @zhangxia1030 : IIUC, you are seeing issues when hive-syncing non-partition table. Please look at this issue

[jira] [Updated] (HUDI-843) Support different time units in TimestampBasedKeyGenerator

2020-04-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-843: Labels: pull-request-available (was: ) > Support different time units in TimestampBasedKeyGenerator

[GitHub] [incubator-hudi] afilipchik commented on pull request #1541: [HUDI-843] Add ability to specify time unit for TimestampBasedKeyGenerator

2020-04-28 Thread GitBox
afilipchik commented on pull request #1541: URL: https://github.com/apache/incubator-hudi/pull/1541#issuecomment-620719213 addressed comments This is an automated message from the Apache Git Service. To respond to the

[jira] [Created] (HUDI-844) Store Avro schema string as first-level entity in commit metadata

2020-04-28 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-844: --- Summary: Store Avro schema string as first-level entity in commit metadata Key: HUDI-844 URL: https://issues.apache.org/jira/browse/HUDI-844 Project: Apache

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
bvaradar commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r416891281 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -145,23 +146,37 @@ public MessageType

[GitHub] [incubator-hudi] bvaradar commented on pull request #1524: [HUDI-801] Adding a way to post process schema after it is fetched

2020-04-28 Thread GitBox
bvaradar commented on pull request #1524: URL: https://github.com/apache/incubator-hudi/pull/1524#issuecomment-620846353 @afilipchik : Doesn't look like the rebase worked fine as I see other folk's commits in the PR. Can you rebase again ?

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1564: update hudi meta in hive with no partition

2020-04-28 Thread GitBox
lamber-ken edited a comment on issue #1564: URL: https://github.com/apache/incubator-hudi/issues/1564#issuecomment-620876765 > @lamber-ken : Will take care of this ticket.  no problem This is an automated message

[GitHub] [incubator-hudi] lamber-ken commented on issue #1564: update hudi meta in hive with no partition

2020-04-28 Thread GitBox
lamber-ken commented on issue #1564: URL: https://github.com/apache/incubator-hudi/issues/1564#issuecomment-620876765 > @lamber-ken : Will take care of this ticket.  This is an automated message from the Apache Git

[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-28 Thread GitBox
lamber-ken commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-620893610 BUG status | bug | status | way | | :-| : | :: | | upsert long time first time | fixed | upgrate version (0.5.0 to master) | |

[GitHub] [incubator-hudi] nandini57 commented on issue #1569: [SUPPORT] Audit Feature In A PartitionPath

2020-04-28 Thread GitBox
nandini57 commented on issue #1569: URL: https://github.com/apache/incubator-hudi/issues/1569#issuecomment-620824966 Yes ,so far requirement is to keep all record changes.In future ,may need to upsert as well.Thanks guys for the help!

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1532: [HUDI-794]: implemented optional use of --config-folder option in HoodieDeltaStreamer

2020-04-28 Thread GitBox
bvaradar commented on a change in pull request #1532: URL: https://github.com/apache/incubator-hudi/pull/1532#discussion_r416909810 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/DeltaStreamerUtility.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] bvaradar commented on issue #1564: update hudi meta in hive with no partition

2020-04-28 Thread GitBox
bvaradar commented on issue #1564: URL: https://github.com/apache/incubator-hudi/issues/1564#issuecomment-620814862 @lamber-ken : Will take care of this ticket. This is an automated message from the Apache Git Service. To

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1567: [HUDI-840]Clean blank file created by HoodieLogFormatWriter

2020-04-28 Thread GitBox
bvaradar commented on a change in pull request #1567: URL: https://github.com/apache/incubator-hudi/pull/1567#discussion_r416923914 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java ## @@ -210,6 +210,14 @@ public void close()

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

2020-04-28 Thread GitBox
bvaradar commented on a change in pull request #1516: URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r416935621 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -79,6 +79,11 @@ this.inputStream

[GitHub] [incubator-hudi] lamber-ken commented on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-28 Thread GitBox
lamber-ken commented on pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-620880356 @vinothchandar I think the work can be finished this week. : ) > the fetchRecordLocation() API and global indexing is not implemented..Do you plan to work on

[GitHub] [incubator-hudi] lamber-ken edited a comment on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-28 Thread GitBox
lamber-ken edited a comment on pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-620880356 @vinothchandar I think the work can be finished this week, will ping you when finished : ) > the fetchRecordLocation() API and global indexing is not

[GitHub] [incubator-hudi] lamber-ken commented on issue #1563: [SUPPORT] When I package according to the package command in GitHub, I always report an error, such as

2020-04-28 Thread GitBox
lamber-ken commented on issue #1563: URL: https://github.com/apache/incubator-hudi/issues/1563#issuecomment-620894265 hello, what's your maven version? @GSHF This is an automated message from the Apache Git Service. To

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1563: [SUPPORT] When I package according to the package command in GitHub, I always report an error, such as

2020-04-28 Thread GitBox
lamber-ken edited a comment on issue #1563: URL: https://github.com/apache/incubator-hudi/issues/1563#issuecomment-620894265 hello, what's your maven version? @GSHF I tested mvn from `3.3.9` to `3.5.3`, all worked fine. > in idea of Windows version. Unix system is

[jira] [Created] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-04-28 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-845: --- Summary: Allow parallel writing and move the pending rollback work into cleaner Key: HUDI-845 URL: https://issues.apache.org/jira/browse/HUDI-845 Project: Apache Hudi

[incubator-hudi] branch master updated: [HUDI-814] Migrate hudi-client tests to JUnit 5 (#1570)

2020-04-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new 69b1630 [HUDI-814] Migrate

[jira] [Closed] (HUDI-814) Migrate hudi-client tests to JUnit 5

2020-04-28 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-814. - Resolution: Done Done via master branch: 69b16309c8c46f831c8b9be42de8b2e29c74f03e > Migrate hudi-client tests to

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
bvaradar commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r416879633 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -145,23 +146,37 @@ public MessageType

[jira] [Updated] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-04-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-845: Description: Things to think about  * Commit time has to be unique across writers  * Parallel

[jira] [Updated] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-04-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-845: Description: Things to think about  * Commit time has to be unique across writers  * Parallel

[jira] [Commented] (HUDI-842) Implementation plan for RFC 15 (File Listing and Query Planning Improvements))

2020-04-28 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094944#comment-17094944 ] Prashant Wason commented on HUDI-842: - [~vinoth]  This is the master ticket for the RFC 15. >

[jira] [Closed] (HUDI-827) Translation error

2020-04-28 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-827. -- > Translation error > - > > Key: HUDI-827 > URL:

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [HUDI-407] Adding Simple Index

2020-04-28 Thread GitBox
nsivabalan commented on a change in pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r417046515 ## File path: hudi-client/src/main/java/org/apache/hudi/index/HoodieGlobalSimpleIndex.java ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
umehrot2 commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r417020450 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -178,6 +193,17 @@ public Schema

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
umehrot2 commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r417020636 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -145,23 +146,37 @@ public MessageType

[jira] [Updated] (HUDI-812) Migrate hudi-common tests to JUnit 5

2020-04-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-812: Status: In Progress (was: Open) > Migrate hudi-common tests to JUnit 5 >

[jira] [Updated] (HUDI-813) Migrate hudi-utilities tests to JUnit 5

2020-04-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-813: Status: In Progress (was: Open) > Migrate hudi-utilities tests to JUnit 5 >

[GitHub] [incubator-hudi] xushiyan commented on pull request #1570: [HUDI-814] Migrate hudi-client tests to JUnit 5

2020-04-28 Thread GitBox
xushiyan commented on pull request #1570: URL: https://github.com/apache/incubator-hudi/pull/1570#issuecomment-620965031 @yanghua This PR migrates all remaining test cases in hudi-client that are not subclasses of HoodieClientTestHarness. It is ready for review.

[jira] [Updated] (HUDI-812) Migrate hudi-common tests to JUnit 5

2020-04-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-812: Status: Open (was: New) > Migrate hudi-common tests to JUnit 5 > > >

[jira] [Updated] (HUDI-813) Migrate hudi-utilities tests to JUnit 5

2020-04-28 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-813: Status: Open (was: New) > Migrate hudi-utilities tests to JUnit 5 > ---

[GitHub] [incubator-hudi] nsivabalan commented on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-28 Thread GitBox
nsivabalan commented on pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-620971906 > @vinothchandar I think the work can be finished this week, will ping you when finished : ) > > > the fetchRecordLocation() API and global indexing is not

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
umehrot2 commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r417019322 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -145,23 +146,37 @@ public MessageType

[GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1559: [HUDI-838] Support schema from HoodieCommitMetadata for HiveSync

2020-04-28 Thread GitBox
umehrot2 commented on a change in pull request #1559: URL: https://github.com/apache/incubator-hudi/pull/1559#discussion_r417019457 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -145,23 +146,37 @@ public MessageType

[jira] [Assigned] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-04-28 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reassigned HUDI-558: -- Assignee: liwei > Introduce ability to compress bloom filters while storing in parquet >

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [HUDI-407] Adding Simple Index

2020-04-28 Thread GitBox
nsivabalan commented on a change in pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r417046515 ## File path: hudi-client/src/main/java/org/apache/hudi/index/HoodieGlobalSimpleIndex.java ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [HUDI-407] Adding Simple Index

2020-04-28 Thread GitBox
nsivabalan commented on a change in pull request #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r417046732 ## File path: hudi-client/src/main/java/org/apache/hudi/index/HoodieGlobalSimpleIndex.java ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] xushiyan opened a new pull request #1570: [HUDI-814] Migrate hudi-client tests to JUnit 5

2020-04-28 Thread GitBox
xushiyan opened a new pull request #1570: URL: https://github.com/apache/incubator-hudi/pull/1570 Migrate the test cases in hudi-client to JUnit 5. Follows #1553 ### Migration status (after merging) | Package | JUnit 5 lib | API migration | Restructure packages | |

[jira] [Updated] (HUDI-814) Migrate hudi-client tests to JUnit 5

2020-04-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-814: Labels: pull-request-available (was: ) > Migrate hudi-client tests to JUnit 5 >

[GitHub] [incubator-hudi] hddong commented on a change in pull request #1567: [HUDI-840]Clean blank file created by HoodieLogFormatWriter

2020-04-28 Thread GitBox
hddong commented on a change in pull request #1567: URL: https://github.com/apache/incubator-hudi/pull/1567#discussion_r417047280 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java ## @@ -210,6 +210,14 @@ public void close()

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #262

2020-04-28 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.36 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
afilipchik commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416344601 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@ +/* +

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1558: URL: https://github.com/apache/incubator-hudi/pull/1558#discussion_r416462293 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java ## @@ -64,11 +64,15 @@ public String deduplicate(

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1522: [HUDI-702]Add test for HoodieLogFileCommand

2020-04-28 Thread GitBox
yanghua commented on a change in pull request #1522: URL: https://github.com/apache/incubator-hudi/pull/1522#discussion_r416402996 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java ## @@ -173,7 +176,11 @@ public String

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1558: URL: https://github.com/apache/incubator-hudi/pull/1558#discussion_r416469107 ## File path: hudi-cli/src/main/scala/org/apache/hudi/cli/DedupeSparkJob.scala ## @@ -103,24 +105,51 @@ class DedupeSparkJob(basePath:

[GitHub] [incubator-hudi] hddong commented on a change in pull request #1522: [HUDI-702]Add test for HoodieLogFileCommand

2020-04-28 Thread GitBox
hddong commented on a change in pull request #1522: URL: https://github.com/apache/incubator-hudi/pull/1522#discussion_r416482802 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java ## @@ -173,7 +176,11 @@ public String

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-04-28 Thread GitBox
afilipchik commented on a change in pull request #1562: URL: https://github.com/apache/incubator-hudi/pull/1562#discussion_r416350367 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java ## @@ -0,0 +1,78 @@ +/* +

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1516: [HUDI-784] Adressing issue with log reader on GCS

2020-04-28 Thread GitBox
afilipchik commented on a change in pull request #1516: URL: https://github.com/apache/incubator-hudi/pull/1516#discussion_r411560192 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -79,6 +79,7 @@ this.inputStream

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

2020-04-28 Thread GitBox
afilipchik commented on a change in pull request #1518: URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r416381330 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -460,8 +471,17 @@ private void

[GitHub] [incubator-hudi] bhasudha commented on issue #1568: [SUPPORT] java.lang.reflect.InvocationTargetException when upsert

2020-04-28 Thread GitBox
bhasudha commented on issue #1568: URL: https://github.com/apache/incubator-hudi/issues/1568#issuecomment-620407768 @tieke1121 are you setting these configs ``` --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat \ --hiveconf hive.stats.autogather=false

[jira] [Created] (HUDI-843) Support different time units in TimestampBasedKeyGenerator

2020-04-28 Thread Alexander Filipchik (Jira)
Alexander Filipchik created HUDI-843: Summary: Support different time units in TimestampBasedKeyGenerator Key: HUDI-843 URL: https://issues.apache.org/jira/browse/HUDI-843 Project: Apache Hudi

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416503101 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416489207 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416489455 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] harshi2506 commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-28 Thread GitBox
harshi2506 commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-620512620 @vinothchandar, I tried building jar from mater branch and loaded a snapshot, it is failing every time saying ``` 20/04/28 09:40:14 WARN TaskSetManager: Lost

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416505404 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416503101 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1562: URL: https://github.com/apache/incubator-hudi/pull/1562#discussion_r416527261 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java ## @@ -0,0 +1,78 @@

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-28 Thread GitBox
lamber-ken commented on a change in pull request #1512: URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r416452010 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -142,6 +143,16 @@ object DataSourceWriteOptions { val

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1565: URL: https://github.com/apache/incubator-hudi/pull/1565#discussion_r416505404 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,97 @@

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1562: [HUDI-837]: implemented custom deserializer for AvroKafkaSource

2020-04-28 Thread GitBox
pratyakshsharma commented on a change in pull request #1562: URL: https://github.com/apache/incubator-hudi/pull/1562#discussion_r416563158 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java ## @@ -0,0 +1,78 @@