Re: PR 7984 implements hash partitioning

2023-02-16 Thread Alexey Kudinkin
Thanks for your contribution, Lvhu! I think we should actually kick-start this effort with an small RFC outlining proposed changes first, as this is modifying the core read-flow for all Hudi tables and we want to make sure our approach there is rock-solid. On Thu, Feb 16, 2023 at 6:34 AM 吕虎

Re: [VOTE] Release 0.12.2, release candidate #1

2022-12-23 Thread Alexey Kudinkin
+1 (non-binding) [OK] Built successfully for Spark 2.4, 3.x [OK] Run Spark SQL tests On Fri, Dec 23, 2022 at 12:19 PM Y Ethan Guo wrote: > +1 non-binding > > [OK] checksums and signatures > [OK] ran release validation script > [OK] built successfully (Spark 2.4, 3.3) > [OK] Spark 3.3.1

Re: RFC-46 Status Update

2022-12-06 Thread Alexey Kudinkin
t and ask RM to > cherry-pick the commits until Friday. Sorry, I also don't want to drag > RFC-46 anymore, open to hear others thoughts. > > On Mon, 5 Dec 2022 at 09:56, Alexey Kudinkin wrote: > > > Hey, everyone! > > > > Long-awaited RFC-46 update: It took the c

Re: RFC-46 Status Update

2022-12-05 Thread Alexey Kudinkin
the code-freeze as soon as PR lands, but reserve 24h for us to have ample buffer just in case. Let me know if you have any questions or concerns with this plan Alexey, on behalf of RFC-46 team On Wed, Sep 28, 2022 at 9:17 AM Alexey Kudinkin wrote: > @Ken yes, that's the plan eventua

[RFC] RFC-64: New APIs to facilitate faster Query Engine integrations

2022-11-10 Thread Alexey Kudinkin
Hello, everyone! Recently we've been hard at work holistically evaluating how we can streamline our current integration model and enable faster turnaround time for building new Query Integrations. Unequivocally, we already have some impressive portfolio of integrations natively supporting Hudi

Re: [Discuss] SCD-2 Payload

2022-10-24 Thread Alexey Kudinkin
Hey, hey, Fengjian! With the landing of the RFC-46 we'll be kick-starting a process of phasing out HoodieRecordPayload as an abstraction and instead migrating to HoodieRecordMerger interface. I'd recommend to base your design considerations off the new HoodieRecordMerger interface instead of

Re: [ANNOUNCE] Apache Hudi 0.12.1 released

2022-10-19 Thread Alexey Kudinkin
Thanks Zhaojing for masterfully navigating this release! On Wed, Oct 19, 2022 at 7:46 AM Vinoth Chandar wrote: > Great job everyone! > > On Wed, Oct 19, 2022 at 07:11 zhaojing yu wrote: > > > The Apache Hudi team is pleased to announce the release of Apache Hudi > > 0.12.1. > > > > Apache Hudi

Re: [DISCUSS] Hudi data TTL

2022-10-18 Thread Alexey Kudinkin
That's a very interesting idea. Do you want to take a stab at writing a full proposal (in the form of RFC) for it? On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang wrote: > Hi all, > > Do we have plan to integrate data TTL into HUDI, so we don't have to > schedule a offline spark job to delete

[Action Required] Spark Bloom Index Metadata Regression in 0.12

2022-10-11 Thread Alexey Kudinkin
Hello, everyone! Recently a regression in Hudi 0.12 release was discovered related to Bloom Index metadata persisted w/in Parquet footers (HUDI-4992 ). Crux of the problem was that min/max statistics for the record keys were computed incorrectly

Re: [VOTE] Release 0.12.1, release candidate #1

2022-10-05 Thread Alexey Kudinkin
e validation script > - [OK] built successfully > - [OK] error injection tests > - [OK] table upgrade and downgrade tests > > On Tue, Oct 4, 2022 at 11:06 PM zhaojing yu wrote: > > > This commit has been reverted in version 0.12.1. > > > > Alex

Re: [VOTE] Release 0.12.1, release candidate #1

2022-10-04 Thread Alexey Kudinkin
-1 Unfortunately, we will have to revert commit 830e35c3f1d5663c9e96d36da4af67928e9d598b, as it plants a performance regression that the author is currently working on to address. On Tue, Oct 4, 2022 at 10:08 AM Sivabalan wrote: > Sorry about that. Raymond referred me to apache policy around

Re: [DISCUSS] Build tool upgrade

2022-10-03 Thread Alexey Kudinkin
I think full project build slowly gravitates towards 15min already (it’s about 12-14min on my 2021 Macbook). @Vinoth the most important aspect that Maven couldn’t provide us with are local incremental builds. Currently you have to build full dependency hierarchy of the project whenever you’re

Re: RFC-46 Status Update

2022-09-28 Thread Alexey Kudinkin
ating RowData records? > > > > — Ken > > > > > > > On Sep 27, 2022, at 2:08 PM, Alexey Kudinkin > wrote: > > > > > > Hello, everyone! > > > > > > As you might be aware, community has been very busy at work on RFC-46 > >

RFC-46 Status Update

2022-09-27 Thread Alexey Kudinkin
Hello, everyone! As you might be aware, community has been very busy at work on RFC-46 aiming to bring long-awaited cutting edge level of performance to Hudi by avoiding using Avro as an intermediate representation, instead relying on individual engines to host data in their own formats

Re: 0.12.1 release timeline

2022-09-27 Thread Alexey Kudinkin
> >> > > Hi Zhaojing, > >> > > > >> > > It would be good if we can land the following bootstrap fixes for > >> 0.12.1 > >> > > release. I'm working on getting them merged. > >> > > > >> > > HUD

Re: 0.12.1 release timeline

2022-09-20 Thread Alexey Kudinkin
There are also a few critical issues we want to address before cutting the 0.12.1 release: HUDI-4760 HUDI-3636 HUDI-4885 HUDI-2780

Survey around using Apache Orc in Hudi

2022-09-16 Thread Alexey Kudinkin
Hello, everyone! We have recently discovered that Apache Orc support is unfortunately broken in Spark 3.x modules of Apache Hudi, due to the fact that Spark 3.x switched from "nohive " flavor of Apache Orc to conventional one (which

Re: [VOTE] Release 0.12.0, release candidate #1

2022-08-01 Thread Alexey Kudinkin
Hello, everyone! We've found that Orc support is broken for Spark >= 3.1 , and we'd really like to make sure this makes it into 0.12. -1, from my end. On Sun, Jul 31, 2022 at 11:51 PM Danny Chan wrote: > Hi,

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-21 Thread Alexey Kudinkin
Hey, folks! I feel there's quite a bit of confusion in this thread, so let's try to clear it: my understanding (please correct me if I'm wrong) is that Lake Manager was referred to as a service in a similar interpretation of how we call compaction, clustering and cleaning a* table services.* So,

Re: [VOTE] Release 0.11.0, release candidate #2

2022-04-18 Thread Alexey Kudinkin
-1 Found pretty substantial perf degradation in 0.11 RC2 as compared to vanilla Parquet table in Spark (which is being addressed currently). More details could be found HUDI-3891 On Mon, Apr 18, 2022 at 4:31 PM Y Ethan Guo wrote: > -1 > The

Re: [DISCUSS] New RFC to create LogCompaction action for MOR tables?

2022-03-21 Thread Alexey Kudinkin
Hello, everyone! @Surya, first of all, wanted to say that i think this is a great proposal! > A new compaction strategy can be added, but we thought it might > complicate the existing logic and need to rely on some hacks, especially > since Compaction action writes to a base file and places a

Unbundling "spark-avro" dependency

2022-03-08 Thread Alexey Kudinkin
Hello, everyone! While working on HUDI-3549 , we've surprisingly discovered that Hudi actually bundles "spark-avro" dependency *by default*. This is problematic b/c "spark-avro" is tightly coupled with some of the other Spark components making up