Re: [DISCUSS] Hudi Reverse Streamer

2023-08-21 Thread Pratyaksh Sharma
Hi Vinoth,

I have raised a PR here - https://github.com/apache/hudi/pull/9492.
Let us continue the discussion there.

On Wed, Aug 16, 2023 at 4:43 PM Vinoth Chandar <
mail.vinoth.chan...@gmail.com> wrote:

> Hi Pratyaksh,
>
> Are you still actively driving this?
>
> On Tue, Jul 11, 2023 at 2:18 PM Pratyaksh Sharma 
> wrote:
>
> > Update: I will be raising the initial draft of RFC in the next couple of
> > days.
> >
> > On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra 
> > wrote:
> >
> > > Great. We also need it for use cases of loading data into warehouses,
> and
> > > would love to help.
> > >
> > > On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma <
> pratyaks...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I missed this email earlier. Sure let me start an RFC this week and
> we
> > > can
> > > > take it from there.
> > > >
> > > > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <
> > nicolas.pa...@riseup.net>
> > > > wrote:
> > > >
> > > > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have
> a
> > > use
> > > > > case to do hudi => Kafka and would enjoy building a more general
> > tool.
> > > > >
> > > > > However we need a rfc basis to start some effort in the right way
> > > > >
> > > > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > > > mail.vinoth.chan...@gmail.com> wrote:
> > > > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to
> start
> > > > one,
> > > > > >given you expressed interest?
> > > > > >
> > > > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <
> > leo.bisca...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1
> > > > > >> This would be great!
> > > > > >>
> > > > > >> Cheers,
> > > > > >>
> > > > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > > > pratyaks...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Vinoth,
> > > > > >> >
> > > > > >> > I am aligned with the first reason that you mentioned. Better
> to
> > > > have
> > > > > a
> > > > > >> > separate tool to take care of this.
> > > > > >> >
> > > > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > > > >> > mail.vinoth.chan...@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > +1
> > > > > >> > >
> > > > > >> > > I was thinking that we add a new utility and NOT extend
> > > > > DeltaStreamer
> > > > > >> by
> > > > > >> > > adding a Sink interface, for the following reasons
> > > > > >> > >
> > > > > >> > > - It will make it look like a generic Source => Sink ETL
> tool,
> > > > > which is
> > > > > >> > > actually not our intention to support on Hudi. There are
> > plenty
> > > of
> > > > > good
> > > > > >> > > tools for that out there.
> > > > > >> > > - the config management can get bit hard to understand,
> since
> > we
> > > > > >> overload
> > > > > >> > > ingest and reverse ETL into a single tool. So break it off
> at
> > > > > use-case
> > > > > >> > > level?
> > > > > >> > >
> > > > > >> > > Thoughts?
> > > > > >> > >
> > > > > >> > > David:  PMC does not have control over that. Please see
> > > > unsubscribe
> > > > > >> > > instructions here.
> > > https://hudi.apache.org/community/get-involved
> > > > > >> > > Love to keep this thread about reverse streamer discussion.
> So
> > > > > kindly
> > > > > >> > fork
> > > > > >> > > another thread if you want to discuss unsubscribing.
> > > > > >> > >
> > > > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> > > david.rosa...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hello Vinoth,
> > > > > >> > > >
> > > > > >> > > > Can you please unsubscribe me?  I have been trying to
> > > > unsubscribe
> > > > > for
> > > > > >> > > > months without success.
> > > > > >> > > >
> > > > > >> > > > Kind Regards,
> > > > > >> > > > David
> > > > > >> > > >
> > > > > >> > > > Sent from Outlook for Android
> > > > > >> > > > 
> > > > > >> > > > From: Vinoth Chandar 
> > > > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > > >> > > > To: dev 
> > > > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > > > >> > > >
> > > > > >> > > > Hi all,
> > > > > >> > > >
> > > > > >> > > > Any interest in building a reverse streaming tool, that
> does
> > > the
> > > > > >> > reverse
> > > > > >> > > of
> > > > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > > > >> incrementally
> > > > > >> > > > (only source) and write out the data to a variety of
> sinks -
> > > > > Kafka,
> > > > > >> > JDBC
> > > > > >> > > > Databases, DFS.
> > > > > >> > > >
> > > > > >> > > > This has come up many times with data warehouse users.
> Often
> > > > > times,
> > > > > >> > they
> > > > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > > > ingestion
> > > > > >> > and
> > > > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> > > back
> > > > > into
> > > > > >> a
> > > > > >> > > data
> > > > > >> > > > warehouse or an 

Re: [DISCUSS] Hudi Reverse Streamer

2023-08-16 Thread Vinoth Chandar
Hi Pratyaksh,

Are you still actively driving this?

On Tue, Jul 11, 2023 at 2:18 PM Pratyaksh Sharma 
wrote:

> Update: I will be raising the initial draft of RFC in the next couple of
> days.
>
> On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra 
> wrote:
>
> > Great. We also need it for use cases of loading data into warehouses, and
> > would love to help.
> >
> > On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma 
> > wrote:
> >
> > > Hi,
> > >
> > > I missed this email earlier. Sure let me start an RFC this week and we
> > can
> > > take it from there.
> > >
> > > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> > > wrote:
> > >
> > > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a
> > use
> > > > case to do hudi => Kafka and would enjoy building a more general
> tool.
> > > >
> > > > However we need a rfc basis to start some effort in the right way
> > > >
> > > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > > mail.vinoth.chan...@gmail.com> wrote:
> > > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> > > one,
> > > > >given you expressed interest?
> > > > >
> > > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <
> leo.bisca...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> +1
> > > > >> This would be great!
> > > > >>
> > > > >> Cheers,
> > > > >>
> > > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > > pratyaks...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Vinoth,
> > > > >> >
> > > > >> > I am aligned with the first reason that you mentioned. Better to
> > > have
> > > > a
> > > > >> > separate tool to take care of this.
> > > > >> >
> > > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > > >> > mail.vinoth.chan...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > +1
> > > > >> > >
> > > > >> > > I was thinking that we add a new utility and NOT extend
> > > > DeltaStreamer
> > > > >> by
> > > > >> > > adding a Sink interface, for the following reasons
> > > > >> > >
> > > > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > > > which is
> > > > >> > > actually not our intention to support on Hudi. There are
> plenty
> > of
> > > > good
> > > > >> > > tools for that out there.
> > > > >> > > - the config management can get bit hard to understand, since
> we
> > > > >> overload
> > > > >> > > ingest and reverse ETL into a single tool. So break it off at
> > > > use-case
> > > > >> > > level?
> > > > >> > >
> > > > >> > > Thoughts?
> > > > >> > >
> > > > >> > > David:  PMC does not have control over that. Please see
> > > unsubscribe
> > > > >> > > instructions here.
> > https://hudi.apache.org/community/get-involved
> > > > >> > > Love to keep this thread about reverse streamer discussion. So
> > > > kindly
> > > > >> > fork
> > > > >> > > another thread if you want to discuss unsubscribing.
> > > > >> > >
> > > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> > david.rosa...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hello Vinoth,
> > > > >> > > >
> > > > >> > > > Can you please unsubscribe me?  I have been trying to
> > > unsubscribe
> > > > for
> > > > >> > > > months without success.
> > > > >> > > >
> > > > >> > > > Kind Regards,
> > > > >> > > > David
> > > > >> > > >
> > > > >> > > > Sent from Outlook for Android
> > > > >> > > > 
> > > > >> > > > From: Vinoth Chandar 
> > > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > >> > > > To: dev 
> > > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > Any interest in building a reverse streaming tool, that does
> > the
> > > > >> > reverse
> > > > >> > > of
> > > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > > >> incrementally
> > > > >> > > > (only source) and write out the data to a variety of sinks -
> > > > Kafka,
> > > > >> > JDBC
> > > > >> > > > Databases, DFS.
> > > > >> > > >
> > > > >> > > > This has come up many times with data warehouse users. Often
> > > > times,
> > > > >> > they
> > > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > > ingestion
> > > > >> > and
> > > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> > back
> > > > into
> > > > >> a
> > > > >> > > data
> > > > >> > > > warehouse or an operational database for serving.
> > > > >> > > >
> > > > >> > > > What do you all think?
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > > Vinoth
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Léo Biscassi*
> > > > >> Blog - https://leobiscassi.com
> > > > >>
> > > > >>-
> > > > >>
> > > >
> > >
> >
> >
> > --
> > Take Care,
> > Rajesh Mahindra
> >
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-07-11 Thread Pratyaksh Sharma
Update: I will be raising the initial draft of RFC in the next couple of
days.

On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra  wrote:

> Great. We also need it for use cases of loading data into warehouses, and
> would love to help.
>
> On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma 
> wrote:
>
> > Hi,
> >
> > I missed this email earlier. Sure let me start an RFC this week and we
> can
> > take it from there.
> >
> > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris 
> > wrote:
> >
> > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a
> use
> > > case to do hudi => Kafka and would enjoy building a more general tool.
> > >
> > > However we need a rfc basis to start some effort in the right way
> > >
> > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > mail.vinoth.chan...@gmail.com> wrote:
> > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> > one,
> > > >given you expressed interest?
> > > >
> > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi 
> > > wrote:
> > > >
> > > >> +1
> > > >> This would be great!
> > > >>
> > > >> Cheers,
> > > >>
> > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > pratyaks...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Vinoth,
> > > >> >
> > > >> > I am aligned with the first reason that you mentioned. Better to
> > have
> > > a
> > > >> > separate tool to take care of this.
> > > >> >
> > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > >> > mail.vinoth.chan...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > +1
> > > >> > >
> > > >> > > I was thinking that we add a new utility and NOT extend
> > > DeltaStreamer
> > > >> by
> > > >> > > adding a Sink interface, for the following reasons
> > > >> > >
> > > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > > which is
> > > >> > > actually not our intention to support on Hudi. There are plenty
> of
> > > good
> > > >> > > tools for that out there.
> > > >> > > - the config management can get bit hard to understand, since we
> > > >> overload
> > > >> > > ingest and reverse ETL into a single tool. So break it off at
> > > use-case
> > > >> > > level?
> > > >> > >
> > > >> > > Thoughts?
> > > >> > >
> > > >> > > David:  PMC does not have control over that. Please see
> > unsubscribe
> > > >> > > instructions here.
> https://hudi.apache.org/community/get-involved
> > > >> > > Love to keep this thread about reverse streamer discussion. So
> > > kindly
> > > >> > fork
> > > >> > > another thread if you want to discuss unsubscribing.
> > > >> > >
> > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> david.rosa...@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hello Vinoth,
> > > >> > > >
> > > >> > > > Can you please unsubscribe me?  I have been trying to
> > unsubscribe
> > > for
> > > >> > > > months without success.
> > > >> > > >
> > > >> > > > Kind Regards,
> > > >> > > > David
> > > >> > > >
> > > >> > > > Sent from Outlook for Android
> > > >> > > > 
> > > >> > > > From: Vinoth Chandar 
> > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > >> > > > To: dev 
> > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > >> > > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > Any interest in building a reverse streaming tool, that does
> the
> > > >> > reverse
> > > >> > > of
> > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > >> incrementally
> > > >> > > > (only source) and write out the data to a variety of sinks -
> > > Kafka,
> > > >> > JDBC
> > > >> > > > Databases, DFS.
> > > >> > > >
> > > >> > > > This has come up many times with data warehouse users. Often
> > > times,
> > > >> > they
> > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > ingestion
> > > >> > and
> > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> back
> > > into
> > > >> a
> > > >> > > data
> > > >> > > > warehouse or an operational database for serving.
> > > >> > > >
> > > >> > > > What do you all think?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > > Vinoth
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> *Léo Biscassi*
> > > >> Blog - https://leobiscassi.com
> > > >>
> > > >>-
> > > >>
> > >
> >
>
>
> --
> Take Care,
> Rajesh Mahindra
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Rajesh Mahindra
Great. We also need it for use cases of loading data into warehouses, and
would love to help.

On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma 
wrote:

> Hi,
>
> I missed this email earlier. Sure let me start an RFC this week and we can
> take it from there.
>
> On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris 
> wrote:
>
> > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> > case to do hudi => Kafka and would enjoy building a more general tool.
> >
> > However we need a rfc basis to start some effort in the right way
> >
> > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > mail.vinoth.chan...@gmail.com> wrote:
> > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> one,
> > >given you expressed interest?
> > >
> > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi 
> > wrote:
> > >
> > >> +1
> > >> This would be great!
> > >>
> > >> Cheers,
> > >>
> > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> pratyaks...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Vinoth,
> > >> >
> > >> > I am aligned with the first reason that you mentioned. Better to
> have
> > a
> > >> > separate tool to take care of this.
> > >> >
> > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > >> > mail.vinoth.chan...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > I was thinking that we add a new utility and NOT extend
> > DeltaStreamer
> > >> by
> > >> > > adding a Sink interface, for the following reasons
> > >> > >
> > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > which is
> > >> > > actually not our intention to support on Hudi. There are plenty of
> > good
> > >> > > tools for that out there.
> > >> > > - the config management can get bit hard to understand, since we
> > >> overload
> > >> > > ingest and reverse ETL into a single tool. So break it off at
> > use-case
> > >> > > level?
> > >> > >
> > >> > > Thoughts?
> > >> > >
> > >> > > David:  PMC does not have control over that. Please see
> unsubscribe
> > >> > > instructions here. https://hudi.apache.org/community/get-involved
> > >> > > Love to keep this thread about reverse streamer discussion. So
> > kindly
> > >> > fork
> > >> > > another thread if you want to discuss unsubscribing.
> > >> > >
> > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam  >
> > >> > wrote:
> > >> > >
> > >> > > > Hello Vinoth,
> > >> > > >
> > >> > > > Can you please unsubscribe me?  I have been trying to
> unsubscribe
> > for
> > >> > > > months without success.
> > >> > > >
> > >> > > > Kind Regards,
> > >> > > > David
> > >> > > >
> > >> > > > Sent from Outlook for Android
> > >> > > > 
> > >> > > > From: Vinoth Chandar 
> > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > >> > > > To: dev 
> > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > Any interest in building a reverse streaming tool, that does the
> > >> > reverse
> > >> > > of
> > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > >> incrementally
> > >> > > > (only source) and write out the data to a variety of sinks -
> > Kafka,
> > >> > JDBC
> > >> > > > Databases, DFS.
> > >> > > >
> > >> > > > This has come up many times with data warehouse users. Often
> > times,
> > >> > they
> > >> > > > want to use Hudi to speed up or reduce costs on their data
> > ingestion
> > >> > and
> > >> > > > ETL (using Spark/Flink), but want to move the derived data back
> > into
> > >> a
> > >> > > data
> > >> > > > warehouse or an operational database for serving.
> > >> > > >
> > >> > > > What do you all think?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Vinoth
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> *Léo Biscassi*
> > >> Blog - https://leobiscassi.com
> > >>
> > >>-
> > >>
> >
>


-- 
Take Care,
Rajesh Mahindra


Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Pratyaksh Sharma
Hi,

I missed this email earlier. Sure let me start an RFC this week and we can
take it from there.

On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris 
wrote:

> Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> case to do hudi => Kafka and would enjoy building a more general tool.
>
> However we need a rfc basis to start some effort in the right way
>
> On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> mail.vinoth.chan...@gmail.com> wrote:
> >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
> >given you expressed interest?
> >
> >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi 
> wrote:
> >
> >> +1
> >> This would be great!
> >>
> >> Cheers,
> >>
> >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
> >> wrote:
> >>
> >> > Hi Vinoth,
> >> >
> >> > I am aligned with the first reason that you mentioned. Better to have
> a
> >> > separate tool to take care of this.
> >> >
> >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> >> > mail.vinoth.chan...@gmail.com>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > I was thinking that we add a new utility and NOT extend
> DeltaStreamer
> >> by
> >> > > adding a Sink interface, for the following reasons
> >> > >
> >> > > - It will make it look like a generic Source => Sink ETL tool,
> which is
> >> > > actually not our intention to support on Hudi. There are plenty of
> good
> >> > > tools for that out there.
> >> > > - the config management can get bit hard to understand, since we
> >> overload
> >> > > ingest and reverse ETL into a single tool. So break it off at
> use-case
> >> > > level?
> >> > >
> >> > > Thoughts?
> >> > >
> >> > > David:  PMC does not have control over that. Please see unsubscribe
> >> > > instructions here. https://hudi.apache.org/community/get-involved
> >> > > Love to keep this thread about reverse streamer discussion. So
> kindly
> >> > fork
> >> > > another thread if you want to discuss unsubscribing.
> >> > >
> >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
> >> > wrote:
> >> > >
> >> > > > Hello Vinoth,
> >> > > >
> >> > > > Can you please unsubscribe me?  I have been trying to unsubscribe
> for
> >> > > > months without success.
> >> > > >
> >> > > > Kind Regards,
> >> > > > David
> >> > > >
> >> > > > Sent from Outlook for Android
> >> > > > 
> >> > > > From: Vinoth Chandar 
> >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> >> > > > To: dev 
> >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > Any interest in building a reverse streaming tool, that does the
> >> > reverse
> >> > > of
> >> > > > what the DeltaStreamer tool does? It will read Hudi table
> >> incrementally
> >> > > > (only source) and write out the data to a variety of sinks -
> Kafka,
> >> > JDBC
> >> > > > Databases, DFS.
> >> > > >
> >> > > > This has come up many times with data warehouse users. Often
> times,
> >> > they
> >> > > > want to use Hudi to speed up or reduce costs on their data
> ingestion
> >> > and
> >> > > > ETL (using Spark/Flink), but want to move the derived data back
> into
> >> a
> >> > > data
> >> > > > warehouse or an operational database for serving.
> >> > > >
> >> > > > What do you all think?
> >> > > >
> >> > > > Thanks
> >> > > > Vinoth
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> *Léo Biscassi*
> >> Blog - https://leobiscassi.com
> >>
> >>-
> >>
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-06-14 Thread Nicolas Paris
Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use case 
to do hudi => Kafka and would enjoy building a more general tool. 

However we need a rfc basis to start some effort in the right way

On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar 
 wrote:
>Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
>given you expressed interest?
>
>On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi  wrote:
>
>> +1
>> This would be great!
>>
>> Cheers,
>>
>> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
>> wrote:
>>
>> > Hi Vinoth,
>> >
>> > I am aligned with the first reason that you mentioned. Better to have a
>> > separate tool to take care of this.
>> >
>> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
>> > mail.vinoth.chan...@gmail.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > I was thinking that we add a new utility and NOT extend DeltaStreamer
>> by
>> > > adding a Sink interface, for the following reasons
>> > >
>> > > - It will make it look like a generic Source => Sink ETL tool, which is
>> > > actually not our intention to support on Hudi. There are plenty of good
>> > > tools for that out there.
>> > > - the config management can get bit hard to understand, since we
>> overload
>> > > ingest and reverse ETL into a single tool. So break it off at use-case
>> > > level?
>> > >
>> > > Thoughts?
>> > >
>> > > David:  PMC does not have control over that. Please see unsubscribe
>> > > instructions here. https://hudi.apache.org/community/get-involved
>> > > Love to keep this thread about reverse streamer discussion. So kindly
>> > fork
>> > > another thread if you want to discuss unsubscribing.
>> > >
>> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
>> > wrote:
>> > >
>> > > > Hello Vinoth,
>> > > >
>> > > > Can you please unsubscribe me?  I have been trying to unsubscribe for
>> > > > months without success.
>> > > >
>> > > > Kind Regards,
>> > > > David
>> > > >
>> > > > Sent from Outlook for Android
>> > > > 
>> > > > From: Vinoth Chandar 
>> > > > Sent: Friday, March 31, 2023 5:09:52 AM
>> > > > To: dev 
>> > > > Subject: [DISCUSS] Hudi Reverse Streamer
>> > > >
>> > > > Hi all,
>> > > >
>> > > > Any interest in building a reverse streaming tool, that does the
>> > reverse
>> > > of
>> > > > what the DeltaStreamer tool does? It will read Hudi table
>> incrementally
>> > > > (only source) and write out the data to a variety of sinks - Kafka,
>> > JDBC
>> > > > Databases, DFS.
>> > > >
>> > > > This has come up many times with data warehouse users. Often times,
>> > they
>> > > > want to use Hudi to speed up or reduce costs on their data ingestion
>> > and
>> > > > ETL (using Spark/Flink), but want to move the derived data back into
>> a
>> > > data
>> > > > warehouse or an operational database for serving.
>> > > >
>> > > > What do you all think?
>> > > >
>> > > > Thanks
>> > > > Vinoth
>> > > >
>> > >
>> >
>>
>>
>> --
>> *Léo Biscassi*
>> Blog - https://leobiscassi.com
>>
>>-
>>


Re: [DISCUSS] Hudi Reverse Streamer

2023-04-11 Thread Vinoth Chandar
Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
given you expressed interest?

On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi  wrote:

> +1
> This would be great!
>
> Cheers,
>
> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
> wrote:
>
> > Hi Vinoth,
> >
> > I am aligned with the first reason that you mentioned. Better to have a
> > separate tool to take care of this.
> >
> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > mail.vinoth.chan...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > I was thinking that we add a new utility and NOT extend DeltaStreamer
> by
> > > adding a Sink interface, for the following reasons
> > >
> > > - It will make it look like a generic Source => Sink ETL tool, which is
> > > actually not our intention to support on Hudi. There are plenty of good
> > > tools for that out there.
> > > - the config management can get bit hard to understand, since we
> overload
> > > ingest and reverse ETL into a single tool. So break it off at use-case
> > > level?
> > >
> > > Thoughts?
> > >
> > > David:  PMC does not have control over that. Please see unsubscribe
> > > instructions here. https://hudi.apache.org/community/get-involved
> > > Love to keep this thread about reverse streamer discussion. So kindly
> > fork
> > > another thread if you want to discuss unsubscribing.
> > >
> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
> > wrote:
> > >
> > > > Hello Vinoth,
> > > >
> > > > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > > > months without success.
> > > >
> > > > Kind Regards,
> > > > David
> > > >
> > > > Sent from Outlook for Android
> > > > 
> > > > From: Vinoth Chandar 
> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > To: dev 
> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > >
> > > > Hi all,
> > > >
> > > > Any interest in building a reverse streaming tool, that does the
> > reverse
> > > of
> > > > what the DeltaStreamer tool does? It will read Hudi table
> incrementally
> > > > (only source) and write out the data to a variety of sinks - Kafka,
> > JDBC
> > > > Databases, DFS.
> > > >
> > > > This has come up many times with data warehouse users. Often times,
> > they
> > > > want to use Hudi to speed up or reduce costs on their data ingestion
> > and
> > > > ETL (using Spark/Flink), but want to move the derived data back into
> a
> > > data
> > > > warehouse or an operational database for serving.
> > > >
> > > > What do you all think?
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > >
> >
>
>
> --
> *Léo Biscassi*
> Blog - https://leobiscassi.com
>
>-
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-04-10 Thread Léo Biscassi
+1
This would be great!

Cheers,

On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma 
wrote:

> Hi Vinoth,
>
> I am aligned with the first reason that you mentioned. Better to have a
> separate tool to take care of this.
>
> On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> mail.vinoth.chan...@gmail.com>
> wrote:
>
> > +1
> >
> > I was thinking that we add a new utility and NOT extend DeltaStreamer by
> > adding a Sink interface, for the following reasons
> >
> > - It will make it look like a generic Source => Sink ETL tool, which is
> > actually not our intention to support on Hudi. There are plenty of good
> > tools for that out there.
> > - the config management can get bit hard to understand, since we overload
> > ingest and reverse ETL into a single tool. So break it off at use-case
> > level?
> >
> > Thoughts?
> >
> > David:  PMC does not have control over that. Please see unsubscribe
> > instructions here. https://hudi.apache.org/community/get-involved
> > Love to keep this thread about reverse streamer discussion. So kindly
> fork
> > another thread if you want to discuss unsubscribing.
> >
> > On Fri, Mar 31, 2023 at 1:47 AM Davidiam 
> wrote:
> >
> > > Hello Vinoth,
> > >
> > > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > > months without success.
> > >
> > > Kind Regards,
> > > David
> > >
> > > Sent from Outlook for Android
> > > 
> > > From: Vinoth Chandar 
> > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > To: dev 
> > > Subject: [DISCUSS] Hudi Reverse Streamer
> > >
> > > Hi all,
> > >
> > > Any interest in building a reverse streaming tool, that does the
> reverse
> > of
> > > what the DeltaStreamer tool does? It will read Hudi table incrementally
> > > (only source) and write out the data to a variety of sinks - Kafka,
> JDBC
> > > Databases, DFS.
> > >
> > > This has come up many times with data warehouse users. Often times,
> they
> > > want to use Hudi to speed up or reduce costs on their data ingestion
> and
> > > ETL (using Spark/Flink), but want to move the derived data back into a
> > data
> > > warehouse or an operational database for serving.
> > >
> > > What do you all think?
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


-- 
*Léo Biscassi*
Blog - https://leobiscassi.com

   -


Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Pratyaksh Sharma
Hi Vinoth,

I am aligned with the first reason that you mentioned. Better to have a
separate tool to take care of this.

On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar 
wrote:

> +1
>
> I was thinking that we add a new utility and NOT extend DeltaStreamer by
> adding a Sink interface, for the following reasons
>
> - It will make it look like a generic Source => Sink ETL tool, which is
> actually not our intention to support on Hudi. There are plenty of good
> tools for that out there.
> - the config management can get bit hard to understand, since we overload
> ingest and reverse ETL into a single tool. So break it off at use-case
> level?
>
> Thoughts?
>
> David:  PMC does not have control over that. Please see unsubscribe
> instructions here. https://hudi.apache.org/community/get-involved
> Love to keep this thread about reverse streamer discussion. So kindly fork
> another thread if you want to discuss unsubscribing.
>
> On Fri, Mar 31, 2023 at 1:47 AM Davidiam  wrote:
>
> > Hello Vinoth,
> >
> > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > months without success.
> >
> > Kind Regards,
> > David
> >
> > Sent from Outlook for Android
> > 
> > From: Vinoth Chandar 
> > Sent: Friday, March 31, 2023 5:09:52 AM
> > To: dev 
> > Subject: [DISCUSS] Hudi Reverse Streamer
> >
> > Hi all,
> >
> > Any interest in building a reverse streaming tool, that does the reverse
> of
> > what the DeltaStreamer tool does? It will read Hudi table incrementally
> > (only source) and write out the data to a variety of sinks - Kafka, JDBC
> > Databases, DFS.
> >
> > This has come up many times with data warehouse users. Often times, they
> > want to use Hudi to speed up or reduce costs on their data ingestion and
> > ETL (using Spark/Flink), but want to move the derived data back into a
> data
> > warehouse or an operational database for serving.
> >
> > What do you all think?
> >
> > Thanks
> > Vinoth
> >
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Vinoth Chandar
+1

I was thinking that we add a new utility and NOT extend DeltaStreamer by
adding a Sink interface, for the following reasons

- It will make it look like a generic Source => Sink ETL tool, which is
actually not our intention to support on Hudi. There are plenty of good
tools for that out there.
- the config management can get bit hard to understand, since we overload
ingest and reverse ETL into a single tool. So break it off at use-case
level?

Thoughts?

David:  PMC does not have control over that. Please see unsubscribe
instructions here. https://hudi.apache.org/community/get-involved
Love to keep this thread about reverse streamer discussion. So kindly fork
another thread if you want to discuss unsubscribing.

On Fri, Mar 31, 2023 at 1:47 AM Davidiam  wrote:

> Hello Vinoth,
>
> Can you please unsubscribe me?  I have been trying to unsubscribe for
> months without success.
>
> Kind Regards,
> David
>
> Sent from Outlook for Android
> 
> From: Vinoth Chandar 
> Sent: Friday, March 31, 2023 5:09:52 AM
> To: dev 
> Subject: [DISCUSS] Hudi Reverse Streamer
>
> Hi all,
>
> Any interest in building a reverse streaming tool, that does the reverse of
> what the DeltaStreamer tool does? It will read Hudi table incrementally
> (only source) and write out the data to a variety of sinks - Kafka, JDBC
> Databases, DFS.
>
> This has come up many times with data warehouse users. Often times, they
> want to use Hudi to speed up or reduce costs on their data ingestion and
> ETL (using Spark/Flink), but want to move the derived data back into a data
> warehouse or an operational database for serving.
>
> What do you all think?
>
> Thanks
> Vinoth
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-03-31 Thread Davidiam
Hello Vinoth,

Can you please unsubscribe me?  I have been trying to unsubscribe for months 
without success.

Kind Regards,
David

Sent from Outlook for Android

From: Vinoth Chandar 
Sent: Friday, March 31, 2023 5:09:52 AM
To: dev 
Subject: [DISCUSS] Hudi Reverse Streamer

Hi all,

Any interest in building a reverse streaming tool, that does the reverse of
what the DeltaStreamer tool does? It will read Hudi table incrementally
(only source) and write out the data to a variety of sinks - Kafka, JDBC
Databases, DFS.

This has come up many times with data warehouse users. Often times, they
want to use Hudi to speed up or reduce costs on their data ingestion and
ETL (using Spark/Flink), but want to move the derived data back into a data
warehouse or an operational database for serving.

What do you all think?

Thanks
Vinoth


Re: [DISCUSS] Hudi Reverse Streamer

2023-03-31 Thread Pratyaksh Sharma
+1 to this.

I can help drive some of this work.

On Fri, Mar 31, 2023 at 10:09 AM Prashant Wason 
wrote:

> Could be useful. Also, may be useful for backup / replication scenario
> (keeping a copy of data in alternate/cloud DC).
>
> HoodieDeltaStreamer already has the concept of "sources". This can be
> implemented as a "sink" concept.
>
> On Thu, Mar 30, 2023 at 8:12 PM Vinoth Chandar  wrote:
>
> > Essentially.
> >
> > Old architecture :(operational database) ==> some tool ==> (data
> > warehouse raw data) ==> SQL ETL ==> (data warehouse derived data)
> >
> > New architecture : (operational database) ==> Hudi delta Streamer ==>
> (Hudi
> > raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi
> Reverse
> > Streamer ==> (Data Warehouse/Kafka/Operational Database)
> >
> > On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar 
> wrote:
> >
> > > Hi all,
> > >
> > > Any interest in building a reverse streaming tool, that does the
> reverse
> > > of what the DeltaStreamer tool does? It will read Hudi table
> > incrementally
> > > (only source) and write out the data to a variety of sinks - Kafka,
> JDBC
> > > Databases, DFS.
> > >
> > > This has come up many times with data warehouse users. Often times,
> they
> > > want to use Hudi to speed up or reduce costs on their data ingestion
> and
> > > ETL (using Spark/Flink), but want to move the derived data back into a
> > data
> > > warehouse or an operational database for serving.
> > >
> > > What do you all think?
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-03-30 Thread Prashant Wason
Could be useful. Also, may be useful for backup / replication scenario
(keeping a copy of data in alternate/cloud DC).

HoodieDeltaStreamer already has the concept of "sources". This can be
implemented as a "sink" concept.

On Thu, Mar 30, 2023 at 8:12 PM Vinoth Chandar  wrote:

> Essentially.
>
> Old architecture :(operational database) ==> some tool ==> (data
> warehouse raw data) ==> SQL ETL ==> (data warehouse derived data)
>
> New architecture : (operational database) ==> Hudi delta Streamer ==> (Hudi
> raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi Reverse
> Streamer ==> (Data Warehouse/Kafka/Operational Database)
>
> On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar  wrote:
>
> > Hi all,
> >
> > Any interest in building a reverse streaming tool, that does the reverse
> > of what the DeltaStreamer tool does? It will read Hudi table
> incrementally
> > (only source) and write out the data to a variety of sinks - Kafka, JDBC
> > Databases, DFS.
> >
> > This has come up many times with data warehouse users. Often times, they
> > want to use Hudi to speed up or reduce costs on their data ingestion and
> > ETL (using Spark/Flink), but want to move the derived data back into a
> data
> > warehouse or an operational database for serving.
> >
> > What do you all think?
> >
> > Thanks
> > Vinoth
> >
>


Re: [DISCUSS] Hudi Reverse Streamer

2023-03-30 Thread Vinoth Chandar
Essentially.

Old architecture :(operational database) ==> some tool ==> (data
warehouse raw data) ==> SQL ETL ==> (data warehouse derived data)

New architecture : (operational database) ==> Hudi delta Streamer ==> (Hudi
raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi Reverse
Streamer ==> (Data Warehouse/Kafka/Operational Database)

On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar  wrote:

> Hi all,
>
> Any interest in building a reverse streaming tool, that does the reverse
> of what the DeltaStreamer tool does? It will read Hudi table incrementally
> (only source) and write out the data to a variety of sinks - Kafka, JDBC
> Databases, DFS.
>
> This has come up many times with data warehouse users. Often times, they
> want to use Hudi to speed up or reduce costs on their data ingestion and
> ETL (using Spark/Flink), but want to move the derived data back into a data
> warehouse or an operational database for serving.
>
> What do you all think?
>
> Thanks
> Vinoth
>