Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Rajesh Mahindra
Great. We also need it for use cases of loading data into warehouses, and
would love to help.

On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma 
wrote:

> Hi,
>
> I missed this email earlier. Sure let me start an RFC this week and we can
> take it from there.
>
> On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris 
> wrote:
>
> > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> > case to do hudi => Kafka and would enjoy building a more general tool.
> >
> > However we need a rfc basis to start some effort in the right way
> >
> > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > mail.vinoth.chan...@gmail.com> wrote:
> > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> one,
> > >given you expressed interest?
> > >
> > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi 
> > wrote:
> > >
> > >> +1
> > >> This would be great!
> > >>
> > >> Cheers,
> > >>
> > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> pratyaks...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Vinoth,
> > >> >
> > >> > I am aligned with the first reason that you mentioned. Better to
> have
> > a
> > >> > separate tool to take care of this.
> > >> >
> > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > >> > mail.vinoth.chan...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > I was thinking that we add a new utility and NOT extend
> > DeltaStreamer
> > >> by
> > >> > > adding a Sink interface, for the following reasons
> > >> > >
> > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > which is
> > >> > > actually not our intention to support on Hudi. There are plenty of
> > good
> > >> > > tools for that out there.
> > >> > > - the config management can get bit hard to understand, since we
> > >> overload
> > >> > > ingest and reverse ETL into a single tool. So break it off at
> > use-case
> > >> > > level?
> > >> > >
> > >> > > Thoughts?
> > >> > >
> > >> > > David:  PMC does not have control over that. Please see
> unsubscribe
> > >> > > instructions here. https://hudi.apache.org/community/get-involved
> > >> > > Love to keep this thread about reverse streamer discussion. So
> > kindly
> > >> > fork
> > >> > > another thread if you want to discuss unsubscribing.
> > >> > >
> > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam  >
> > >> > wrote:
> > >> > >
> > >> > > > Hello Vinoth,
> > >> > > >
> > >> > > > Can you please unsubscribe me?  I have been trying to
> unsubscribe
> > for
> > >> > > > months without success.
> > >> > > >
> > >> > > > Kind Regards,
> > >> > > > David
> > >> > > >
> > >> > > > Sent from Outlook for Android
> > >> > > > 
> > >> > > > From: Vinoth Chandar 
> > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > >> > > > To: dev 
> > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > Any interest in building a reverse streaming tool, that does the
> > >> > reverse
> > >> > > of
> > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > >> incrementally
> > >> > > > (only source) and write out the data to a variety of sinks -
> > Kafka,
> > >> > JDBC
> > >> > > > Databases, DFS.
> > >> > > >
> > >> > > > This has come up many times with data warehouse users. Often
> > times,
> > >> > they
> > >> > > > want to use Hudi to speed up or reduce costs on their data
> > ingestion
> > >> > and
> > >> > > > ETL (using Spark/Flink), but want to move the derived data back
> > into
> > >> a
> > >> > > data
> > >> > > > warehouse or an operational database for serving.
> > >> > > >
> > >> > > > What do you all think?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Vinoth
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> *Léo Biscassi*
> > >> Blog - https://leobiscassi.com
> > >>
> > >>-
> > >>
> >
>


-- 
Take Care,
Rajesh Mahindra


Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Pratyaksh Sharma
Hi,

I missed this email earlier. Sure let me start an RFC this week and we can
take it from there.

On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris 
wrote:

> Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> case to do hudi => Kafka and would enjoy building a more general tool.
>
> However we need a rfc basis to start some effort in the right way
>
> On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> mail.vinoth.chan...@gmail.com> wrote:
> >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
> >given you expressed interest?
> >
> >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi 
> wrote:
> >
> >> +1
> >> This would be great!
> >>
> >> Cheers,
> >>
> >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
> >> wrote:
> >>
> >> > Hi Vinoth,
> >> >
> >> > I am aligned with the first reason that you mentioned. Better to have
> a
> >> > separate tool to take care of this.
> >> >
> >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> >> > mail.vinoth.chan...@gmail.com>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > I was thinking that we add a new utility and NOT extend
> DeltaStreamer
> >> by
> >> > > adding a Sink interface, for the following reasons
> >> > >
> >> > > - It will make it look like a generic Source => Sink ETL tool,
> which is
> >> > > actually not our intention to support on Hudi. There are plenty of
> good
> >> > > tools for that out there.
> >> > > - the config management can get bit hard to understand, since we
> >> overload
> >> > > ingest and reverse ETL into a single tool. So break it off at
> use-case
> >> > > level?
> >> > >
> >> > > Thoughts?
> >> > >
> >> > > David:  PMC does not have control over that. Please see unsubscribe
> >> > > instructions here. https://hudi.apache.org/community/get-involved
> >> > > Love to keep this thread about reverse streamer discussion. So
> kindly
> >> > fork
> >> > > another thread if you want to discuss unsubscribing.
> >> > >
> >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
> >> > wrote:
> >> > >
> >> > > > Hello Vinoth,
> >> > > >
> >> > > > Can you please unsubscribe me?  I have been trying to unsubscribe
> for
> >> > > > months without success.
> >> > > >
> >> > > > Kind Regards,
> >> > > > David
> >> > > >
> >> > > > Sent from Outlook for Android
> >> > > > 
> >> > > > From: Vinoth Chandar 
> >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> >> > > > To: dev 
> >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > Any interest in building a reverse streaming tool, that does the
> >> > reverse
> >> > > of
> >> > > > what the DeltaStreamer tool does? It will read Hudi table
> >> incrementally
> >> > > > (only source) and write out the data to a variety of sinks -
> Kafka,
> >> > JDBC
> >> > > > Databases, DFS.
> >> > > >
> >> > > > This has come up many times with data warehouse users. Often
> times,
> >> > they
> >> > > > want to use Hudi to speed up or reduce costs on their data
> ingestion
> >> > and
> >> > > > ETL (using Spark/Flink), but want to move the derived data back
> into
> >> a
> >> > > data
> >> > > > warehouse or an operational database for serving.
> >> > > >
> >> > > > What do you all think?
> >> > > >
> >> > > > Thanks
> >> > > > Vinoth
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> *Léo Biscassi*
> >> Blog - https://leobiscassi.com
> >>
> >>-
> >>
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Nicolas Paris
Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use case 
to do hudi => Kafka and would enjoy building a more general tool. 

However we need a rfc basis to start some effort in the right way

On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar 
 wrote:
>Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
>given you expressed interest?
>
>On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi  wrote:
>
>> +1
>> This would be great!
>>
>> Cheers,
>>
>> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
>> wrote:
>>
>> > Hi Vinoth,
>> >
>> > I am aligned with the first reason that you mentioned. Better to have a
>> > separate tool to take care of this.
>> >
>> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
>> > mail.vinoth.chan...@gmail.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > I was thinking that we add a new utility and NOT extend DeltaStreamer
>> by
>> > > adding a Sink interface, for the following reasons
>> > >
>> > > - It will make it look like a generic Source => Sink ETL tool, which is
>> > > actually not our intention to support on Hudi. There are plenty of good
>> > > tools for that out there.
>> > > - the config management can get bit hard to understand, since we
>> overload
>> > > ingest and reverse ETL into a single tool. So break it off at use-case
>> > > level?
>> > >
>> > > Thoughts?
>> > >
>> > > David:  PMC does not have control over that. Please see unsubscribe
>> > > instructions here. https://hudi.apache.org/community/get-involved
>> > > Love to keep this thread about reverse streamer discussion. So kindly
>> > fork
>> > > another thread if you want to discuss unsubscribing.
>> > >
>> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
>> > wrote:
>> > >
>> > > > Hello Vinoth,
>> > > >
>> > > > Can you please unsubscribe me?  I have been trying to unsubscribe for
>> > > > months without success.
>> > > >
>> > > > Kind Regards,
>> > > > David
>> > > >
>> > > > Sent from Outlook for Android
>> > > > 
>> > > > From: Vinoth Chandar 
>> > > > Sent: Friday, March 31, 2023 5:09:52 AM
>> > > > To: dev 
>> > > > Subject: [DISCUSS] Hudi Reverse Streamer
>> > > >
>> > > > Hi all,
>> > > >
>> > > > Any interest in building a reverse streaming tool, that does the
>> > reverse
>> > > of
>> > > > what the DeltaStreamer tool does? It will read Hudi table
>> incrementally
>> > > > (only source) and write out the data to a variety of sinks - Kafka,
>> > JDBC
>> > > > Databases, DFS.
>> > > >
>> > > > This has come up many times with data warehouse users. Often times,
>> > they
>> > > > want to use Hudi to speed up or reduce costs on their data ingestion
>> > and
>> > > > ETL (using Spark/Flink), but want to move the derived data back into
>> a
>> > > data
>> > > > warehouse or an operational database for serving.
>> > > >
>> > > > What do you all think?
>> > > >
>> > > > Thanks
>> > > > Vinoth
>> > > >
>> > >
>> >
>>
>>
>> --
>> *Léo Biscassi*
>> Blog - https://leobiscassi.com
>>
>>-
>>


Re: [DISCUSS] Should we support a service to manage all deltastreamer jobs?

2023-06-14 Thread Pratyaksh Sharma
Hi,

Personally I am in favour of creating such a UI where monitoring and
managing configurations is just a click away. That makes life a lot easier
for users. So +1 on the proposal.

I remember the work for it had started long back around 2019. You can check
this RFC

for your reference. I am not sure why this work could not continue though.

On Wed, Jun 14, 2023 at 4:28 PM 孔维 <18701146...@163.com> wrote:

> Hi, team,
>
>
> Background:
> More and more hudi accesses use deltastreamer, resulting in a large number
> of deltastreamer jobs that need to be managed. In our company, we also
> manage a large number of deltastreamer jobs by ourselves, and there is a
> lot of operation and maintenance management and monitoring work.
> If we can provide such a deltastreamer service to create, manage, and
> monitor all tasks in a unified manner, it can greatly reduce the management
> pressure of deltastreamer, and at the same time lower the threshold for
> using deltastreamer, which is conducive to the promotion and use of
> deltastreamer.
> At the same time, considering that deltastreamer already supports
> configuration hot update capability [
> https://github.com/apache/hudi/pull/8807], we can offer configuration hot
> update capability based on the feature, and make configuration changes
> without restarting the job.
>
>
> We hope to provide:
> Provides a web UI to support creation, management and monitoring of
> deltastreamer tasks
> Using configuration hot update capability to provide timely configuration
> change capability
>
>
> I don't know whether such a service is in line with the evolution of the
> community, and I hope to receive your reply!
>
>
> Best Regards


[DISCUSS] Should we support a service to manage all deltastreamer jobs?

2023-06-14 Thread 孔维
Hi, team,


Background:
More and more hudi accesses use deltastreamer, resulting in a large number of 
deltastreamer jobs that need to be managed. In our company, we also manage a 
large number of deltastreamer jobs by ourselves, and there is a lot of 
operation and maintenance management and monitoring work. 
If we can provide such a deltastreamer service to create, manage, and monitor 
all tasks in a unified manner, it can greatly reduce the management pressure of 
deltastreamer, and at the same time lower the threshold for using 
deltastreamer, which is conducive to the promotion and use of deltastreamer.
At the same time, considering that deltastreamer already supports configuration 
hot update capability [https://github.com/apache/hudi/pull/8807], we can offer 
configuration hot update capability based on the feature, and make 
configuration changes without restarting the job.


We hope to provide:
Provides a web UI to support creation, management and monitoring of 
deltastreamer tasks
Using configuration hot update capability to provide timely configuration 
change capability


I don't know whether such a service is in line with the evolution of the 
community, and I hope to receive your reply!


Best Regards