Re: [DISCUSS] Hudi is the data lake platform

2021-08-04 Thread Vinoth Chandar
Folks, I have been digesting some feedback on what we show on the home page itself. While the blog explains the vision, it might be good to bubble up sub-areas that are more relevant to our users today. transactions, updates, deletes. So, i have raised a PR moving stuff around. Now we lead

Re: [DISCUSS] Hudi is the data lake platform

2021-08-02 Thread Vinoth Chandar
Thanks! Will work on it this week. Also redoing some images based on feedback. On Fri, Jul 30, 2021 at 2:06 AM vino yang wrote: > +1 > > Pratyaksh Sharma 于2021年7月30日周五 上午1:47写道: > > > Guess we should rebrand Hudi on README.md file as well - > > https://github.com/apache/hudi#readme? > > > >

Re: [DISCUSS] Hudi is the data lake platform

2021-07-30 Thread vino yang
+1 Pratyaksh Sharma 于2021年7月30日周五 上午1:47写道: > Guess we should rebrand Hudi on README.md file as well - > https://github.com/apache/hudi#readme? > > This page still mentions the following - > > "Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and > Incrementals. Hudi manages

Re: [DISCUSS] Hudi is the data lake platform

2021-07-29 Thread Pratyaksh Sharma
Guess we should rebrand Hudi on README.md file as well - https://github.com/apache/hudi#readme? This page still mentions the following - "Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud

Re: [DISCUSS] Hudi is the data lake platform

2021-07-23 Thread Vinoth Chandar
Thanks Vino! Got a bunch of emoticons on the PR as well. Will land this monday, giving it more time over the weekend as well. On Wed, Jul 21, 2021 at 7:36 PM vino yang wrote: > Thanks vc > > Very good blog, in-depth and forward-looking. Learned! > > Best, > Vino > > Vinoth Chandar

Re: [DISCUSS] Hudi is the data lake platform

2021-07-21 Thread vino yang
Thanks vc Very good blog, in-depth and forward-looking. Learned! Best, Vino Vinoth Chandar 于2021年7月22日周四 上午3:58写道: > Expanding to users@ as well. > > Hi all, > > Since this discussion, I started to pen down a coherent strategy and convey > these ideas via a blog post. > I have also done my

Re: [DISCUSS] Hudi is the data lake platform

2021-07-21 Thread Vinoth Chandar
Expanding to users@ as well. Hi all, Since this discussion, I started to pen down a coherent strategy and convey these ideas via a blog post. I have also done my own research, talked to (ex)colleagues I respect to get their take and refine it. Here's a blog that hopefully explains this vision.

Re: [DISCUSS] Hudi is the data lake platform

2021-04-21 Thread wei li
+1 , Cannot agree more. *aux metadata* and metatable, can make hudi have large preformance optimization on query end. Can continuous develop. cache service may the necessary component in cloud native environment. On 2021/04/13 05:29:55, Vinoth Chandar wrote: > Hello all, > > Reading one

Re: [DISCUSS] Hudi is the data lake platform

2021-04-19 Thread Vinoth Chandar
Looks like we have consensus here! Will share the blog PR here once ready. Thanks all! On Fri, Apr 16, 2021 at 8:43 PM Sivabalan wrote: > totally +1 on clarifying Hudi's vision. > > On Wed, Apr 14, 2021 at 3:43 AM nishith agarwal > wrote: > > > +1 > > > > I also believe Hudi is a Data

Re: [DISCUSS] Hudi is the data lake platform

2021-04-16 Thread Sivabalan
totally +1 on clarifying Hudi's vision. On Wed, Apr 14, 2021 at 3:43 AM nishith agarwal wrote: > +1 > > I also believe Hudi is a Data Platform technology providing many different > functionalities to build modern data lakes, Hudi's table format being just > one of them. I've been using this

Re: [DISCUSS] Hudi is the data lake platform

2021-04-14 Thread nishith agarwal
+1 I also believe Hudi is a Data Platform technology providing many different functionalities to build modern data lakes, Hudi's table format being just one of them. I've been using this perspective in some of the conference talks already ;) With this rebranding (and hopefully some code/package

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Vinoth Chandar
Thanks everyone for the feedback, so far! On the incremental aspects, that's actually Hudi's core design differentiation. While I believe the ETL today is still largely batch oriented, the way forward for everyone's benefit is indeed - incremental processing. We have already taken a giant step

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Danny Chan
+1 for the vision, personally i'm promising the incremental ETL part, with engine like Apache Flink we can do intermediate aggregation in streaming style. Best, Danny Chan leesf 于2021年4月14日周三 上午9:52写道: > +1. Cool and promising. > > Mehrotra, Udit 于2021年4月14日周三 上午2:57写道: > > > Agree with the

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread leesf
+1. Cool and promising. Mehrotra, Udit 于2021年4月14日周三 上午2:57写道: > Agree with the rebranding Vinoth. Hudi is not just a "table format" and we > need to do justice to all the cool auxiliary features/services we have > built. > > Also, timeline metadata service in particular would be a really big

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Mehrotra, Udit
Agree with the rebranding Vinoth. Hudi is not just a "table format" and we need to do justice to all the cool auxiliary features/services we have built. Also, timeline metadata service in particular would be a really big win if we move towards something like that. On 4/13/21, 11:01 AM,

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Pratyaksh Sharma
Definitely we are doing much more than only ingesting and managing data over DFS. +1 from my side as well. :) On Tue, Apr 13, 2021 at 10:02 PM Susu Dong wrote: > I love this rebranding. Totally agree. +1 > > On Wed, Apr 14, 2021 at 1:25 AM Raymond Xu > wrote: > > > +1 The vision looks

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Susu Dong
I love this rebranding. Totally agree. +1 On Wed, Apr 14, 2021 at 1:25 AM Raymond Xu wrote: > +1 The vision looks fantastic. > > On Tue, Apr 13, 2021 at 7:45 AM Gary Li wrote: > > > Awesome summary of Hudi! +1 as well. > > > > Gary Li > > On 2021/04/13 14:13:24, Rubens Rodrigues > > wrote: >

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Raymond Xu
+1 The vision looks fantastic. On Tue, Apr 13, 2021 at 7:45 AM Gary Li wrote: > Awesome summary of Hudi! +1 as well. > > Gary Li > On 2021/04/13 14:13:24, Rubens Rodrigues > wrote: > > Excellent, I agree > > > > Em ter, 13 de abr de 2021 07:23, vino yang > escreveu: > > > > > +1 Excited by

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread vbal...@apache.org
++1. The rewording makes total sense Balaji.V On Tuesday, April 13, 2021, 07:45:16 AM PDT, Gary Li wrote: Awesome summary of Hudi! +1 as well. Gary Li On 2021/04/13 14:13:24, Rubens Rodrigues wrote: > Excellent, I agree > > Em ter, 13 de abr de 2021 07:23, vino yang escreveu: >

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Gary Li
Awesome summary of Hudi! +1 as well. Gary Li On 2021/04/13 14:13:24, Rubens Rodrigues wrote: > Excellent, I agree > > Em ter, 13 de abr de 2021 07:23, vino yang escreveu: > > > +1 Excited by this new vision! > > > > Best, > > Vino > > > > Dianjin Wang 于2021年4月13日周二 下午3:53写道: > > > > > +1

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Rubens Rodrigues
Excellent, I agree Em ter, 13 de abr de 2021 07:23, vino yang escreveu: > +1 Excited by this new vision! > > Best, > Vino > > Dianjin Wang 于2021年4月13日周二 下午3:53写道: > > > +1 The new brand is straightforward, a better description of Hudi. > > > > Best, > > Dianjin Wang > > > > > > On Tue, Apr

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread vino yang
+1 Excited by this new vision! Best, Vino Dianjin Wang 于2021年4月13日周二 下午3:53写道: > +1 The new brand is straightforward, a better description of Hudi. > > Best, > Dianjin Wang > > > On Tue, Apr 13, 2021 at 1:41 PM Bhavani Sudha > wrote: > > > +1 . Cannot agree more. I think this makes total

Re: [DISCUSS] Hudi is the data lake platform

2021-04-13 Thread Dianjin Wang
+1 The new brand is straightforward, a better description of Hudi. Best, Dianjin Wang On Tue, Apr 13, 2021 at 1:41 PM Bhavani Sudha wrote: > +1 . Cannot agree more. I think this makes total sense and will provide for > a much better representation of the project. > > On Mon, Apr 12, 2021 at

Re: [DISCUSS] Hudi is the data lake platform

2021-04-12 Thread Bhavani Sudha
+1 . Cannot agree more. I think this makes total sense and will provide for a much better representation of the project. On Mon, Apr 12, 2021 at 10:30 PM Vinoth Chandar wrote: > Hello all, > > Reading one more article today, positioning Hudi, as just a table format, > made me wonder, if we have

[DISCUSS] Hudi is the data lake platform

2021-04-12 Thread Vinoth Chandar
Hello all, Reading one more article today, positioning Hudi, as just a table format, made me wonder, if we have done enough justice in explaining what we have built together here. I tend to think of Hudi as the data lake platform, which has the following components, of which - one if a table