Re: [DISCUSS] Hudi with Nifi

2019-09-24 Thread Vinoth Chandar
Sg, lets capture these discussions in the JIRA (link to the discussion thread should suffice) and we can revisit one by one.. On Mon, Sep 23, 2019 at 8:31 PM Taher Koitawala wrote: > Sure Vinoth, I think we need to try this out and check how it fits together > and how deployable it is. > > On

Re: [DISCUSS] Hudi with Nifi

2019-09-23 Thread Taher Koitawala
Sure Vinoth, I think we need to try this out and check how it fits together and how deployable it is. On Sun, Sep 22, 2019, 7:01 PM Vinoth Chandar wrote: > See a lot of Spark Streaming receiver based approach code there, which > makes me a bit worried about scalability. > > Nonetheless. API

Re: [DISCUSS] Hudi with Nifi

2019-09-21 Thread Taher Koitawala
Hi Vinoth, Nifi has the capability to pass data to a custom spark job. However that is done through a StreamingContext, not sure if we can build something on this. I'm trying to wrap my head around how to fit the StreamingContext in our existing code. Here is an example:

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Taher Koitawala
I think we will have to make a Nifi Processor. The Nifi processor should host all what do with Spark to write data. We will have to scope out the work on this and compactions. Regards, Taher Koitawala On Wed, Sep 18, 2019, 8:30 PM Suneel Marthi wrote: > Adding Nifi dev@ to this thread. > > >

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Suneel Marthi
Adding Nifi dev@ to this thread. On Wed, Sep 18, 2019 at 10:57 AM Vinoth Chandar wrote: > Not too familiar wth Nifi myself. Would this still target an use-case like > what pratyaksh mentioned? > For delta streamer specifically, we are moving more and more towards > continuous mode, where >

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Vinoth Chandar
Not too familiar wth Nifi myself. Would this still target an use-case like what pratyaksh mentioned? For delta streamer specifically, we are moving more and more towards continuous mode, where Hudi writing and compaction are amanged by a single long running spark application. Would Nifi

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Taher Koitawala
That's another way of doing things. I want to know if someone wrote something like PutParquet. Which directly can write data to Hudi. AFAIK I don't think anyone has. That will really be powerful. On Wed, Sep 18, 2019, 1:37 PM Pratyaksh Sharma wrote: > Hi Taher, > > In the initial phase of our

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Pratyaksh Sharma
Hi Taher, In the initial phase of our CDC pipeline, we were using Hudi with Nifi. Nifi was being used to read Binlog file of mysql and to push that data to some Kafka topic. This topic was then getting consumed by DeltaStreamer. So Nifi was indirectly involved in that flow. On Wed, Sep 18, 2019

[DISCUSS] Hudi with Nifi

2019-09-17 Thread Taher Koitawala
Hi All, Just wanted to know has anyone tried to write data to Hudi with a Nifi flow? Perhaps may be just a csv file on local to Hudi dataset? If not then lets try that! Regards, Taher Koitawala