Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Shiyan Xu
I should have documented this...(which I will soon) When run from terminal, could you please try running with maven profile like `mvn -Punit-tests test` `mvn -Pfunctional-tests test` which should work.. Best, Raymond On Fri, Aug 21, 2020 at 9:44 PM Gary Li wrote: > +1 (non binding) > -

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Gary Li
+1 (non binding) - Complied successfully - Ran validation script successfully - Ran tests from IntelliJ successfully Seeing the same issue as Siva. The tests were passed in IDE. Best Regards, Gary Li On 8/21/20, 2:29 PM, "Sivabalan" wrote: +1 (non binding) - Compilation successful

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-21 Thread Nishith
+1 for spotless, automating the formatting will definitely help productivity and turn around time for PRs. -Nishith Sent from my iPhone > On Aug 21, 2020, at 11:53 AM, Sivabalan wrote: > > totally +1 for spotless. > > >> On Thu, Aug 20, 2020 at 8:53 AM leesf wrote: >> >> +1 on using mvn

Re: Incremental query on partition column

2020-08-21 Thread Balaji Varadarajan
Thanks for the detailed email David. We had discussed this in last week community meeting and Vinoth had ideas on how to implement this. This is something that can be supported by the timeline layout that Hudi has. It would be a new feature (new write operation) that basically appends the

Re: Null-value for required field Error

2020-08-21 Thread selvaraj periyasamy
Thanks Balaji. could you please provide more info on how to get it done and pass it to hudi? Thanks, Selva On Fri, Aug 21, 2020 at 12:33 PM Balaji Varadarajan wrote: > Hi Selvaraj, > Even though the incoming batch has non null values for the new column, > existing data do not have this

Re: Null-value for required field Error

2020-08-21 Thread Balaji Varadarajan
Hi Selvaraj, Even though the incoming batch has non null values for the new column, existing data do not have this column. So, you need to make sure the avro schema has the new column to be nullable and be backwards compatible. Balaji.V On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-21 Thread Abhishek Modi
@sivabalan the current plan is to only add this for hoodie_record_key. But I'm hoping to make the implementation general enough to add other columns as well going forward :) On Fri, Aug 21, 2020 at 11:49 AM Sivabalan wrote: > +1 for virtual record keys. Do you also propose to generalize this

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-21 Thread Sivabalan
totally +1 for spotless. On Thu, Aug 20, 2020 at 8:53 AM leesf wrote: > +1 on using mvn spotless:apply to fix the codestyle. > > Bhavani Sudha 于2020年8月19日周三 下午12:31写道: > > > +1 on auto code formatting. I also think it should be okay to be even > more > > restrictive by failing builds when the

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-21 Thread Sivabalan
+1 for virtual record keys. Do you also propose to generalize this for partition path as well ? On Fri, Aug 21, 2020 at 4:20 AM Pratyaksh Sharma wrote: > This is a good option to have. :) > > On Thu, Aug 20, 2020 at 11:25 PM Vinoth Chandar wrote: > > > IIRC _hoodie_record_key was supposed to

Null-value for required field Error

2020-08-21 Thread selvaraj periyasamy
Hi, with 0.5.0 version of Hudi, I am using COW table type, which is partitioned by mmdd format . We already have a table with Array type columns and data populated. And then we are now trying to add a new column ("rule_profile_id_list") in dataframes and while trying to write , getting below

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Bhavani Sudha
Vino yang, I am working on the release blog. While the RC is in progress, the doc and site updates are happening this week. Thanks, Sudha On Fri, Aug 21, 2020 at 4:23 AM vino yang wrote: > +1 from my side > > I checked: > > - ran `mvn clean package` [OK] > - ran `mvn test` in my local [OK] >

Re: Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Trevor Zhang
I checked: - ran `mvn clean package` [OK] - ran `mvn test` in my local [OK] Trevor 于2020年8月21日周五 下午7:39写道: > > +1 > -- > Trevor > > > *From:* vino yang > *Date:* 2020-08-21 19:23 > *To:* dev > *Subject:* Re: [VOTE] Release 0.6.0, release candidate #1 > +1 from my

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-21 Thread Pratyaksh Sharma
This is a good option to have. :) On Thu, Aug 20, 2020 at 11:25 PM Vinoth Chandar wrote: > IIRC _hoodie_record_key was supposed to this standardized key field. :) > Anyways, it's good to provide this option to the user. > So +1 for. RFC/further discussion. > > To level set, I want to also share

Re: [Question] How to use Hudi for migrating a historical mysql table?

2020-08-21 Thread Pratyaksh Sharma
Hi Gurudatt, You can use Debezium for migrating historical data as well. Using Debezium will enable you to migrate existing as well as new data using DeltaStreamer. I have used it in my previous org for the same use case. On Fri, Aug 21, 2020 at 12:30 PM wowtua...@gmail.com wrote: > > You can

Re: [Question] How to use Hudi for migrating a historical mysql table?

2020-08-21 Thread wowtua...@gmail.com
You can use kafka to subscribe mysql binlog ,then consume historical data directly. For details, please refer to [1] [1] http://hudi.apache.org/docs/writing_data.html wowtua...@gmail.com From: Gurudatt Kulkarni Date: 2020-08-21 15:18 To: dev Subject: [Question] How to use Hudi for

[Question] How to use Hudi for migrating a historical mysql table?

2020-08-21 Thread Gurudatt Kulkarni
Hi All, I have a use case where there is historical data available in MySQL table which is being populated by a Kafka topic. My plan is to create a spark job that will migrate data from MySQL using Hudi Datasource. Once the migration of historical data is done from MySQL, use Deltastreamer to