[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

2020-10-19 Thread GitBox
rahulpoptani commented on issue #2180: URL: https://github.com/apache/hudi/issues/2180#issuecomment-712585912 I used a different environment where I used Spark 2.4.5 with Scala 2.12 and I was able to successfully perform Insert/Upsert/Deletes and Read on Merge-On-Read table type.

[GitHub] [hudi] SteNicholas commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-19 Thread GitBox
SteNicholas commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-712566489 > @SteNicholas @leesf : Does this essentially mean we no longer support small file handling for "inserts" ? > If user doesn't essentially care about duplicates, I agree that

[GitHub] [hudi] bvaradar commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-19 Thread GitBox
bvaradar commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-712564198 @SteNicholas @leesf : Does this essentially mean we no longer support small file handling for "inserts" ? If user doesn't essentially care about duplicates, I agree that we

[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-19 Thread GitBox
lw309637554 commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712536085 > @lw309637554 looks like comments from @pratyakshsharma were addressed. sorry about the delay. merging now. Thank you @lw309637554 for adding the cases! Thanks

[GitHub] [hudi] vinothchandar commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-10-19 Thread GitBox
vinothchandar commented on pull request #1760: URL: https://github.com/apache/hudi/pull/1760#issuecomment-712505539 >we want to make Hudi compile with spark 2 and then run with spark3? this was the intention. but as @bschell pointed out some classes have changed and we need to make

[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217177#comment-17217177 ] Vinoth Chandar commented on HUDI-303: - [~309637554] this task is about exploring all possibilities and

[jira] [Updated] (HUDI-1321) Support properties for metadata table via a properties.file

2020-10-19 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1321: - Status: Open (was: New) > Support properties for metadata table via a properties.file >

[jira] [Updated] (HUDI-1321) Support properties for metadata table via a properties.file

2020-10-19 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1321: - Status: In Progress (was: Open) > Support properties for metadata table via a properties.file >

[GitHub] [hudi] umehrot2 commented on issue #2057: [SUPPORT] AWSDmsAvroPayload not processing Deletes correctly + IOException when reading log file

2020-10-19 Thread GitBox
umehrot2 commented on issue #2057: URL: https://github.com/apache/hudi/issues/2057#issuecomment-712454567 > @umehrot2 It looks like for 0.6.0 where this issue is fixed, @WTa-hash is seeing the exception `java.lang.NoSuchMethodError:

[GitHub] [hudi] umehrot2 commented on issue #2057: [SUPPORT] AWSDmsAvroPayload not processing Deletes correctly + IOException when reading log file

2020-10-19 Thread GitBox
umehrot2 commented on issue #2057: URL: https://github.com/apache/hudi/issues/2057#issuecomment-712453460 > > @umehrot2 Could the IOException be due to #2089 ? > > I'm not entirely sure if it's related to this issue as the steps to reproduce is different, but the thing I see in

[GitHub] [hudi] prashantwason commented on pull request #2189: Some more updates to the rfc-15 implementation

2020-10-19 Thread GitBox
prashantwason commented on pull request #2189: URL: https://github.com/apache/hudi/pull/2189#issuecomment-712449097 @vinothchandar Some more updates from my side. PTAL. This is an automated message from the Apache Git

[GitHub] [hudi] prashantwason opened a new pull request #2189: Some more updates to the rfc-15 implementation

2020-10-19 Thread GitBox
prashantwason opened a new pull request #2189: URL: https://github.com/apache/hudi/pull/2189 ## Brief change log Please see individual commits and the tagged JIRA items for details. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [

[GitHub] [hudi] umehrot2 merged pull request #2185: [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle

2020-10-19 Thread GitBox
umehrot2 merged pull request #2185: URL: https://github.com/apache/hudi/pull/2185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-1345] Remove Hbase and htrace relocation from utilities bundle (#2185)

2020-10-19 Thread uditme
This is an automated email from the ASF dual-hosted git repository. uditme pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6490b02 [HUDI-1345] Remove Hbase and htrace

[GitHub] [hudi] ashishmgofficial edited a comment on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial edited a comment on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at

[GitHub] [hudi] ashishmgofficial edited a comment on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial edited a comment on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at

[GitHub] [hudi] zhedoubushishi commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-10-19 Thread GitBox
zhedoubushishi commented on pull request #1760: URL: https://github.com/apache/hudi/pull/1760#issuecomment-712391147 @bschell @vinothchandar to make clear, just wondering what is the exact goal for this pr? Do we want to make Hudi support both compile & run with spark 3 or we want to make

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712377184 @bvaradar I can provide all the SQL's in Postgres which I'm using to reproduce this though : ``` DROP TABLE public.motor_crash_violation_incidents; CREATE

[GitHub] [hudi] bvaradar closed issue #2108: [SUPPORT]Submit rollback -->Pending job --> kill YARN --> lost data

2020-10-19 Thread GitBox
bvaradar closed issue #2108: URL: https://github.com/apache/hudi/issues/2108 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (HUDI-1340) Not able to query real time table when rows contains nested elements

2020-10-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216913#comment-17216913 ] Balaji Varadarajan commented on HUDI-1340: -- [~bdighe]: Did you use --conf

[GitHub] [hudi] bvaradar commented on issue #2108: [SUPPORT]Submit rollback -->Pending job --> kill YARN --> lost data

2020-10-19 Thread GitBox
bvaradar commented on issue #2108: URL: https://github.com/apache/hudi/issues/2108#issuecomment-712304151 Closing this due to inactivity. This is an automated message from the Apache Git Service. To respond to the message,

[jira] [Updated] (HUDI-1340) Not able to query real time table when rows contains nested elements

2020-10-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1340: - Status: Open (was: New) > Not able to query real time table when rows contains nested

[GitHub] [hudi] bvaradar commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields

2020-10-19 Thread GitBox
bvaradar commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-712289095 @liujinhui1994 : Adding > Can work, but if the default value is not null, it will not work > { > "name": "adnetDesc", > "type": ["null", "long"], > "default": -1

[GitHub] [hudi] xushiyan merged pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-19 Thread GitBox
xushiyan merged pull request #2127: URL: https://github.com/apache/hudi/pull/2127 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-284] add more test for UpdateSchemaEvolution (#2127)

2020-10-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4d80e1e [HUDI-284] add more test for

[GitHub] [hudi] xushiyan commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-19 Thread GitBox
xushiyan commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712208378 @lw309637554 looks like comments from @pratyakshsharma were addressed. sorry about the delay. merging now. Thank you @lw309637554 for adding the cases!

[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216732#comment-17216732 ] liwei commented on HUDI-303: [~uditme]    , [~vinoth]   what do you think about this  :D** > Avro schema case

[jira] [Reopened] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reopened HUDI-303: > Avro schema case sensitivity testing > > > Key: HUDI-303 >

[jira] [Commented] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216728#comment-17216728 ] liwei commented on HUDI-303: i do not think this should fix. because hive meta column is case insensitive. if

[jira] [Assigned] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reassigned HUDI-303: -- Assignee: liwei (was: Udit Mehrotra) > Avro schema case sensitivity testing >

[jira] [Resolved] (HUDI-303) Avro schema case sensitivity testing

2020-10-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-303. Resolution: Fixed > Avro schema case sensitivity testing > > >

[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-19 Thread GitBox
lw309637554 commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-712104176 @pratyakshsharma @xushiyan @vinothchandar hello, Is there anything that needs to be fixed? This is an

[GitHub] [hudi] liujinhui1994 commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields

2020-10-19 Thread GitBox
liujinhui1994 commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-711759479 I am late in reply, sorry. I have passed the verification in the production environment, and I am currently writing unit tests

[GitHub] [hudi] liujinhui1994 commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields

2020-10-19 Thread GitBox
liujinhui1994 commented on issue #2162: URL: https://github.com/apache/hudi/issues/2162#issuecomment-711757869 Can work, but if the default value is not null, it will not work { "name": "adnetDesc", "type": ["null", "long"], "default": -1 } @bvaradar

[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2020-10-19 Thread GitBox
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-711645166 @nsivabalan @vinothchandar Thank you so much for all the explanations. If I think about it, having 10MB worth of index data may not be an issue as long as the file contains