Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Bhavani Sudha Saktheeswaran
+1

On Tue, Apr 21, 2020 at 10:23 PM tison  wrote:

> Hi Vinoth,
>
> +1 for moving blogs.
>
> cwiki looks belong to developer's scope and the first experience of users
> is more likely our website.
>
> Best,
> tison.
>
>
> Vinoth Chandar  于2020年4月22日周三 下午1:09写道:
>
> > Hi community,
> >
> > What does everyone feel about moving blogs we have on cwiki now over to
> > site so they are better discovered?
> >
> > Thanks
> > Vinoth
> >
>


Re: Apache Hudi on AWS EMR

2020-02-19 Thread Bhavani Sudha Saktheeswaran
Got it. Thanks Udit!

On Wed, Feb 19, 2020 at 2:12 PM Mehrotra, Udit 
wrote:

> Hi Sudha,
>
> Yes EMR Presto since 5.28.0 release comes with presto jars present in the
> classpath. If you launch a cluster with Presto you should see it at:
>
> /usr/lib/presto/plugin/hive-hadoop2/hudi-presto-bundle.jar
>
> Thanks,
> Udit
>
>
> On 2/19/20, 1:53 PM, "Bhavani Sudha"  wrote:
>
> Hi Udit,
>
> Just a quick question on Presto EMR. Does EMR Presto support Hudi jars
> in
> its classpath ?
>
> On Tue, Feb 18, 2020 at 12:03 PM Mehrotra, Udit
> 
> wrote:
>
> > Workaround provided by Gary can help querying Hudi tables through
> Athena
> > for Copy On Write tables by basically querying only the latest
> commit files
> > as standard parquet. It would definitely be worth documenting, as
> several
> > people have asked for it and I remember providing the same
> suggestion on
> > slack earlier. I can add if I have the perms.
> >
> > >> if I connect to the Hive catalog on EMR, which is able to provide
> the
> > Hudi views correctly, I should be able to get correct results on
> Athena
> >
> > As Vinoth mentioned, just connecting to metastore is not enough.
> Athena
> > would still use its own Presto which does not support Hudi.
> >
> > As for Hudi support for Athena:
> > Athena does use Presto, but it's their own custom version and I don't
> > think they yet have the code that Hudi guys contributed to presto
> i.e. the
> > split annotations etc. Also they don’t have Hudi jars in presto
> classpath.
> > We are not sure of any timelines for this support, but I have heard
> that
> > work should start soon.
> >
> > Thanks,
> > Udit
> >
> > On 2/18/20, 11:27 AM, "Vinoth Chandar"  wrote:
> >
> > Thanks everyone for chiming in. Esp Gary for the detailed
> workaround..
> > (should we FAQ this workaround.. food for thought)
> >
> > >> if I connect to the Hive catalog on EMR, which is able to
> provide
> > the
> > Hudi views correctly, I should be able to get correct results on
> Athena
> >
> > Knowing how the Presto/Hudi integration works, simply being able
> to
> > read
> > from Hive metastore is not enough. Presto has code to specially
> > recognize
> > Hudi tables and does an additional filtering step, which lets it
> query
> > the
> > data in there correctly. (Gary's workaround above keeps just 1
> version
> > around for a given file (group))..
> >
> > On Mon, Feb 17, 2020 at 11:28 PM Gary Li <
> yanjia.gary...@gmail.com>
> > wrote:
> >
> > > Hello, I don't have any experience working with Athena but I
> can
> > share my
> > > experience working with Impala. There is a workaround.
> > > By setting Hudi config:
> > >
> > >- hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
> > >- hoodie.cleaner.fileversions.retained=1
> > >
> > > You will have your Hudi dataset as same as plain parquet
> files. You
> > can
> > > create a table just like regular parquet. Hudi will write a new
> > commit
> > > first then delete the older files that have two versions. You
> need to
> > > refresh the table metadata store as soon as the Hudi Upsert job
> > finishes.
> > > For impala, it's simply REFRESH TABLE xxx. After Hudi vacuumed
> the
> > older
> > > files and before refresh the table metastore, the table will be
> > unavailable
> > > for query(1-5 mins in my case).
> > >
> > > How can we process S3 parquet files(hourly partitioned) through
> > Apache
> > > Hudi? Is there any streaming layer we need to introduce?
> > > ---
> > > Hudi Delta streamer support parquet file. You can do a
> bulkInsert
> > for the
> > > first job then use delta streamer for the Upsert job.
> > >
> > > 3 - What should be the parquet file size and row group size for
> > better
> > > performance on querying Hudi Dataset?
> > > --
> > > That depends on the query engine you are using and it should be
> > documented
> > > somewhere. For impala, the optimal size for query performance
> is
> > 256MB, but
> > > the larger file size will make upsert more expensive. The size
> I
> > personally
> > > choose is 100MB to 128MB.
> > >
> > > Thanks,
> > > Gary
> > >
> > >
> > >
> > > On Mon, Feb 17, 2020 at 9:46 PM Dubey, Raghu
> > 
> > > wrote:
> > >
> > > > Athena is indeed Presto inside, but there is lot of custom
> code
> > which has
> > > > gone on top of Presto there.
> > > > Couple months back I tried running a glue crawler to catalog
> a
> > Hudi data
>  

Re: [DISCUSS] Jira integration in slack ?

2020-02-06 Thread Bhavani Sudha Saktheeswaran
I am looking at these Slack integrations to see if it would work

- Zapier
- Pigeon bot
- Mail clark
- IFTT
If these dont work, I suppose we will have to build a kava script web hook
that needs to be integrated with slack channel to send email for new
threads.

Thanks,
Sudha


On Thu, Feb 6, 2020 at 10:12 PM Vinoth Chandar  wrote:

> Looks like everyone feels mirroring slack to ML seems good? It also helps
> us show real engagement numbers on the project.
>
> On Wed, Feb 5, 2020 at 3:25 PM leesf  wrote:
>
> > Hi sudha,
> >
> > Thanks for bringing this discussion up.
> >
> > +1, we could create issues from slack if it is a bug or feature, and it
> > convenient to do since jira and slack integrates very well.
> >
> > and +1 to the idea send threads(dicussion or problems) to ML regarding
> the
> > threads are often with high valuability, and will make it more searchable
> > for users.
> >
> > Best,
> > Leesf
> >
> > vino yang  于2020年2月5日周三 下午10:32写道:
> >
> > > Hi Vinoth,
> > >
> > > >> I was actually thinking about a slack to mailing list (only for
> > > #general)
> > > >> integration, where threads are mirrored to the mailing list for ease
> > of
> > > >> discovery.. Not sure if thats doable or even a good idea..
> > >
> > > +1 for this idea
> > >
> > > I personally think we should try our best to guide users to use the
> > mailing
> > > list, which is a practice followed by other Apache open source
> projects.
> > > One of the great benefits of mailing lists is the archiving and
> > > precipitation of some knowledge. Some threads will be indexed by search
> > > engines and will provide users with guidance on finding similar issues.
> > In
> > > addition, as an Apache project, ASF will count the activity of the
> > mailing
> > > list (dev, users), which will be very helpful for the successful
> > graduation
> > > of a project.
> > >
> > > Best,
> > > Vino
> > >
> > > Vinoth Chandar  于2020年2月5日周三 下午12:38写道:
> > >
> > > > Thanks for raising this!
> > > >
> > > > I think a lot of conversations on slack are just user support or
> > > questions.
> > > > Not sure if all of them would result in a valid JIRA..  So far, we
> have
> > > > created JIRAs when finding worthy issues there..
> > > >
> > > > I was actually thinking about a slack to mailing list (only for
> > #general)
> > > > integration, where threads are mirrored to the mailing list for ease
> of
> > > > discovery.. Not sure if thats doable or even a good idea..
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Feb 4, 2020 at 2:38 PM Bhavani Sudha <
> bhavanisud...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I was wondering if we should look into Slack Jira integration to
> > > capture
> > > > > activities in Slack. Often times slack has a lot of useful
> > > conversations
> > > > > that can be tracked in a Jira for future reference. Any ideas on
> how
> > > > other
> > > > > projects handle this ?
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Rename the name of hudi-hadoop-mr module

2020-01-16 Thread Bhavani Sudha Saktheeswaran
+1 to generally renaming the packages. Since this is about renaming for the
purpose of making it user friendly, I am concerned if we make this as
hudi-query-bundle, users might get confused with other modules like
hudi-hive and hudi-spark. And inside packaging module, we further have
bundles specific to spark, hive and presto.

Any suggestions on how to rename broadly to avoid these confusions? Let me
also think and get back.

Thanks,
Sudha

On Wed, Jan 15, 2020 at 9:56 PM vino yang  wrote:

> Hi guys,
>
> I want to start a proposal about refactoring the naming of the
> "hudi-hadoop-mr" module.
>
> IMHO, this module name is not user-friendly. It may make users confused.
> Because it looks like that it's about integrating with MapReduce( although
> I know it referenced parquet-mr[1] project).
>
> Based on the purpose of this module (contains InputFormat implementations
> for ReadOptimized, Incremental, Realtime views).
>
> I suggest that we can rename it to "*hudi-query-common*". Then, we can also
> rename "hudi-hadoop-mr-bundle" to "*hudi-query-bundle*".
>
> What do you think?
>
> Any thoughts and suggestions are welcome and appreciated.
>
> Best,
> Vino
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dmr&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=dmZJgDEuo5sZCNsoyMRQUpiJoBP7u4r2i8cdHDMmQic&s=4CnBhu54QxDqAWdCb3NXUdQg9beV2xEmgx-N0yhTr9Y&e=
>


Re: Re: Re: Re: Re: Re: Re: Re:Re: Re: Re:Re: Re: Re: [DISCUSS] Rework of new web site

2020-01-08 Thread Bhavani Sudha Saktheeswaran
Sorry for the late response. Just catching up on mailing list thread after
vacation.
@lamber-ken The new site looks cool. Thanks for the time and effort you
have put into this.

Thanks,
Sudha



On Tue, Jan 7, 2020 at 11:45 PM lamberken  wrote:

>
>
> Hi @Y Ethan Guo,
>
>
> Thanks, I've already been in touch with ApacheCN.
> https://urldefense.proofpoint.com/v2/url?u=http-3A__hudi.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=gJyseoGhOPV9H5GouGaXHxsbKLvxsku_7Z9SqOlAmK0&e=
> is coming.
>
>
> Best,
> Lamber-Ken
>
> At 2020-01-08 15:21:51, "Y Ethan Guo"  wrote:
> >@lamber-ken
> >
> >Got it.  It would be great if the ApacheCN organization can also help
> >translation and promotion.
> >
> >The reason I'm asking about the Chinese docs is that the pages under
> >"Documentation" (e.g.,
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__hudi.apache.org_newsite-2Dcontent_docs_writing-5Fdata.html&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=ZR16ZVJRPPS4lUbdX70bp15_nOsGiIfizlDOTVVpDHU&e=
> ) already have
> >the companion Chinese version on the old website (e.g.,
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__hudi.apache.org_cn_writing-5Fdata.html&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=fFTrXIduD5fGSt2DF2RFdF8bytizFfaWQVRuWjsMdQ0&e=
> ).  So if it's not hard to port
> >them to the new website, they are still useful for the users.
> >
> >Best,
> >- Ethan
> >
> >On Tue, Jan 7, 2020 at 11:05 PM lamberken  wrote:
> >
> >>
> >>
> >> Hi @Y Ethan Guo,
> >>
> >>
> >> Thank you very much for your advice, I'll consider adjusting the font
> >> size.
> >>
> >>
> >> For Chinese docs, I talked with @leesf about the chinese docs before,
> our
> >> initial aim is to help user to learn hudi quickly, we should not
> translate
> >> the whole site, it doesn't work very well.
> >>
> >>
> >> We can discuss about chinese docs in a new thread, btw we can work with
> >> ApacheCN organization to translate and promote the hudi project.
> Apachecn
> >> organization has already translate manay popular projects, like kafka,
> >> flink, spark and etc.
> >>
> >>
> >> ApacheCN & Projects
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apachecn&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=ViJF5LL7QpBRHXivRf5OBLWhhVr4JMMpkrCM7uU0Ua8&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=medAR5LGJxTR8BDiMlszOpQVuKXcIcithelbvc1SK_Y&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__kafka.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=qa1stm_1K7Oib3ZO3aNZGDPKhjCBy6LwZfWqINrTae0&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__flink.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=i7OmLy8BjLNcxxt2okNp1VSlufHvf7M_r9D2yyfLEKc&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__storm.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=aq4lFFhpvsR2L07HjtOSni3-osz7mjv5eelj-W0aFEY&e=
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apachecn.org&d=DwIGbg&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aCjSoCFupGfS7MZcvAg7nG5Dwm57SggFa42uPFaBdP4&s=2nm7_4Xx6ze2CqVMN6I2DLJPmuaE_rbYUTRot6qTOmY&e=
> >>
> >>
> >> Best,
> >> Lamber-Ken
> >>
> >>
> >>
> >> At 2020-01-08 14:05:41, "Y Ethan Guo"  wrote:
> >> >@lamber-ken,  Thanks for the great effort!  The new website looks
> slick,
> >> >with a much better browsing experience.
> >> >
> >> >One thing I noticed is that there seems to be no link to the Chinese
> >> >version of the docs on the new website.  Wondering where I can find
> them.
> >> >
> >> >Another minor thing is that the font size of the docs is bigger than
> the
> >> >old one, so it takes more scrolls to the end of the page.  IMHO, one
> point
> >> >smaller might be better.
> >> >
> >> >- Ethan
> >> >
> >> >On Tue, Jan 7, 2020 at 3:11 PM lamberken  wrote:
> >> >
> >> >>
> >> >>
> >> >> Hi Pratyaksh Sharma,
> >> >>
> >> >>
> >> >> Good catch!
> >> >>
> >> >> Best,
> >> >> Lamber-ken
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2020-01-07 21:50:54, "Pratyaksh Sharma" 
> >> wrote:
> >> >>
> >> >> Hi lamberken,
> >> >>
> >> >>
> >> >> Thank you for your efforts. The new website definitely looks a lot
> >> better.
> >> >>
> >> >>
> >> >> I found a minor issue. At the top where we are

Re: EMR + HUDI

2019-11-15 Thread Bhavani Sudha Saktheeswaran
This is great news. Kudos to all contributors.

On Fri, Nov 15, 2019 at 10:22 AM Vinoth Chandar  wrote:

> Hello all,
>
> In case you did not notice, AWS EMR now has Hudi support, which should make
> life easier for folks on AWS.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_emr_latest_ReleaseGuide_emr-2Dhudi.html&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=1m4UXNjVbRS_Qp6qjI1zUHFGQ8u3bUZyCQo-KkrDgLM&s=_B0PqvZladwQZwEFjj1rwFU1hkjz65hKZaY3lywdTv8&e=
>
> Thanks to our wonderful contributors from AWS (Udit & team) for making it
> happen
>
> Thanks
> Vinoth
>


Re: [Discuss] Convenient time for weekly sync meeting

2019-11-12 Thread Bhavani Sudha Saktheeswaran
@Sivabalan Yes. Tuesday 9 - 10 pm PST still continues to be the 1st slot.

On Tue, Nov 12, 2019 at 10:33 AM Sivabalan  wrote:

> As we work out details for 2nd slot, did we narrow down the slot for 1st
> one? Do we have a meeting later today?
>
> On Mon, Nov 11, 2019 at 3:49 PM Vinoth Chandar  wrote:
>
> > yes. sounds good. As of now, its just Kabeer.@kabeer wdyt?
> > @nishith Personally, timing is an issue for me, if you are willing to
> > drive, please go ahead! I ll try to make it if possible
> >
> > On Mon, Nov 11, 2019 at 8:25 AM nishith agarwal 
> > wrote:
> >
> > > Vinoth,
> > >
> > > To meet mid way, how about once in 3 weeks for Europe and other time
> > zones
> > > ? That works fine for me. In the interest of making the meetings useful
> > for
> > > everyone, we can see how productive the meetings are/% attendance for
> the
> > > meetings for the initial few ones, and then may be we can follow a
> > process
> > > where we can have an email conversation 3-4 days before the meeting to
> > see
> > > if there are any open items to discuss. If not, we don't necessarily
> need
> > > the meeting. What do folks think ?
> > >
> > > Thanks,
> > > Nishith
> > >
> > > On Sun, Nov 10, 2019 at 5:41 PM Vinoth Chandar 
> > wrote:
> > >
> > > > @kabeer I can additionally join a bi-weekly/monthly call that works
> for
> > > > Europe and other time zones. Weekly would be hard. Any of the 2
> people
> > > who
> > > > we could not accommodate interested in this?
> > > >
> > > > On Sat, Nov 9, 2019 at 7:03 AM Kabeer Ahmed 
> > > wrote:
> > > >
> > > > > Dear Sudha
> > > > >
> > > > > It looks like it is going to be an early call for those in Europe
> or
> > > > > follow the weekly minutes of the meeting email. Looking at the poll
> > it
> > > is
> > > > > quite obvious that 9pm to 10pm PST wins the choice.
> > > > > Thank you so much for running the poll and reporting the stats.
> > > > > Kabeer.
> > > > >
> > > > > On Nov 8 2019, at 6:38 pm, Bhavani Sudha 
> > > > wrote:
> > > > > > Thank you all for the prompt response! I realized I dint add my
> > > > preferred
> > > > > > times.These are the times that work for me.
> > > > > >
> > > > > > Mon,Tue,Thu - 9pm - 11pm PST
> > > > > > Mon-Thu - 5 am - 6:30 am PST
> > > > > >
> > > > > > Here is the summary from responses:
> > > > > > - From the 11 responses received so far, 9 of 11 people
> (including
> > > all
> > > > > > committers) can attend 9-10 pm PST on Tuesday or Thursday.
> > > > > > - For Tuesday/Thursday 9:30 pm - 10:30 pm slot 8 of 11 (not all
> > > > > > committers) can attend the full meeting and 2 of rest of the 3
> can
> > > join
> > > > > > either first half or second half.
> > > > > > - There is also a proposal about adding another meeting for
> > covering
> > > > the
> > > > > > US/EU times. We can try that, but am not able to find a
> > overlapping 1
> > > > > hour
> > > > > > slot that would cover the people who cannot attend the first
> > meeting.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Sudha
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 7, 2019 at 5:24 AM Kabeer Ahmed <
> kab...@linuxmail.org>
> > > > > wrote:
> > > > > > > Dear Sudha
> > > > > > > Really appreciate the initiative to promptly start this thread.
> > My
> > > > > > > preferences are as below:
> > > > > > > Any weekday:
> > > > > > > 10PM PST to 11PM PST OR
> > > > > > >
> > > > > > > 10AM PST TO 2PM PST
> > > > > > > thank you
> > > > > > > On Nov 7 2019, at 6:46 am, Pratyaksh Sharma <
> > pratyaks...@gmail.com
> > > >
> > > > > wrote:
> > > > > > > > Interested.
> > > > > > > >
> > > > > > > > Timings:
> > > > > > > > Mon-Fri 6AM-7.30AM PST
> > > > > > > >
> > > > > > > > On Thu, Nov 7, 2019 at 11:33 AM Gurudatt Kulkarni <
> > > > > guruak...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Interested.
> > > > > > > > > Mon-Thu 5AM-6:30AM PST
> > > > > > > > > Mon-Thu 9PM-10:30PM PST
> > > > > > > > >
> > > > > > > > > These timings work for me.
> > > > > > > > > On Thu, Nov 7, 2019 at 10:20 AM Gary Li <
> > > > yanjia.gary...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > Interested.
> > > > > > > > > > Mon-Thu 8 PM-11 PM PST.
> > > > > > > > > > It's very difficult to cover America, Europe, and Asia in
> > the
> > > > > same
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > meeting.
> > > > > > > > > > Maybe we can have US&EU and US&CN two sessions and make
> > them
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > biweekly?
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 6, 2019 at 7:12 PM Taher Koitawala <
> > > > > taher...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > > Mon-Thu 5AM-6:30AM PST
> > > > > > > > > > > Mon-Thu 9PM-10:30PM PST
> > > > > > > > > > >
> > > > > > > > > > > Works for me
> > > > > > > > > > > On Thu, Nov 7, 2019, 7:26 AM Nishith <
> > n3.nas...@gmail.com>
> > > > > wrote:
> > > > > > > > > > > > Following times work for me
> > > > > 

Re: [VOTE] Release 0.5.0-incubating, release candidate #6

2019-10-16 Thread Bhavani Sudha Saktheeswaran
+1 (non-binding)

Thanks,
Sudha

Get Outlook for iOS

From: Suneel Marthi 
Sent: Wednesday, October 16, 2019 11:46 AM
To: dev@hudi.apache.org
Subject: Re: [VOTE] Release 0.5.0-incubating, release candidate #6

+1 binding

On Wed, Oct 16, 2019 at 2:28 PM Vinoth Chandar  wrote:

> +1 (Binding)
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_vinothchandar_b558d3a86ffe1e733c54d1305a44ec38&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=xQ4BdWth8XXxUgfZ1le1Y-M1KJz350dG5ZZS6YbX5ds&e=
>  for
> checks
>
> On Wed, Oct 16, 2019 at 11:03 AM vbal...@apache.org 
> wrote:
>
> >
> > Forgot to mention that this release candidate addresses the licensing
> > concerns that came up during voting in general@incubator. The email
> > thread is in :
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_02d40e3dbababc069c5210928aa4dd335c41ab1837d5a894954f5c9f-40-253Cgeneral.incubator.apache.org-253E&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=i0tbzhhXDoHVoqBRiu_oUcB-D6XT9jKBFougC49UABM&e=
> >
> > The PR which addresses it :
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_pull_953&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=59yts94iWzGujuVXaD_LA-_uN1o_lzOMlRbdAAIF95w&e=
> >
> > Balaji.V
> >
> > On Wednesday, October 16, 2019, 10:20:35 AM PDT, vbal...@apache.org
> <
> > vbal...@apache.org> wrote:
> >
> > Hi everyone,We have a new release candidate for first release of Apache
> > Hudi (incubating). The version is : 0.5.0-incubating-rc6. To run
> automated
> > source release validation script, please follow the below steps
> > - If you have not checkout out hudi, please do
> > - git clone g...@github.com:apache/incubator-hudi.git;
> >
> > - If you already have incubator-hudi, please do
> >
> > - git checkout master && git pull origin master
> >
> > - cd incubator-hudi/scripts;
> > - 
> > ./release/https://urldefense.proofpoint.com/v2/url?u=http-3A__validate-5Fstaged-5Frelease.sh&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=y74kuj3415Kzyl7zIM_NOLPJd3fvhtFk48Ztw97zY9M&e=
> >  --release=0.5.0 --rc_num=6
> >
> > To compile, run "mvn compile". To run unit-test, run "mvn test"Please
> > review and vote on the release candidate #6 for the version 0.5.0, as
> > follows:[ ] +1, Approve the release [ ] 0 I don't feel strongly about
> it,
> > but I'm okay with the release
> > [ ] -1, Do not approve the release (please provide specific comments)The
> > complete staging area is available for your review, which includes:
> > - JIRA release notes [1]
> > - The official Apache source release and binary convenience releases to
> > be deployed to 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__dist.apache.org&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=9HTBhZkaDYKLukS024UJ9Ydu20nPcUlp843ADaiWdxA&e=
> >  [2], which are signed with the key with
> > fingerprint AF9BAF79D311A3D3288E583F24A499037262AAA4 [3],
> >
> > - all artifacts to be deployed to the Maven Central Repository [4]
> >
> > - source code tag "release-0.5.0-incubating-rc6" [5]
> >
> > The vote will be open for at least 72 hours.
> > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > -
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12322822-26version-3D12346087&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=0WJi4zUw2umhTY4Yhx9AG3DK5FDeI1Ly-QCpPRKW-lA&e=
> > -
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_incubator_hudi_hudi-2D0.5.0-2Dincubating-2Drc6_&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=rtRzm8aHkjEshJ05_toP-N4FlUJOPIRyjWmXjwMTvuw&e=
> > - 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_incubator_hudi_KEYS&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=9i85NiBtL14uuzJcFgaX2pn1WctE98h45EigFo1qzRY&e=
> > -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachehudi-2D1006_&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=QuIVwSyUJ5C7U1oLI7g1shnJS4vwvJzGx3k2JA9j45w&s=t_MfoKX7OHHkkoMKC6584MzBoCqOVi8bBUra7iGQNbo&e=
> > -
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2

Re: [VOTE] Release 0.5.0-incubating, release candidate #5

2019-10-05 Thread Bhavani Sudha Saktheeswaran
+1 (non-binding)
- verified checksums and signatures [SUCCESS]
- verified RAT check [SUCCESS]
- built from source release (mvn clean install -DskipTests) [SUCCESS]
- ran local docker tests [SUCCESS]
- ran some IDE tests [SUCCESS]

Thanks,
Sudha

On Sat, Oct 5, 2019 at 5:47 AM Gurudatt Kulkarni 
wrote:

> +1 (non-binding)
>
> Ran the script ./release/
> https://urldefense.proofpoint.com/v2/url?u=http-3A__validate-5Fstaged-5Frelease.sh&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=xmx5cI98rLNdqqP5Eoz2S1nONOsaZMc3b9OeKTqmDIc&e=
> --release=0.5.0
> --rc_num=5
>
> Checksum Check of Source Release - [OK]
>
> Signature Check - [OK]
>
> No Binary Files in Source Release? - [OK]
>
> DISCLAIMER file exists ? [OK]
>
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Licensing Check Passed [OK]
>
> RAT Check Passed [OK]
>
> Regards,
> Gurudatt
>
>
> On Sat, Oct 5, 2019 at 10:52 AM leesf  wrote:
>
> > +1(non-binding).
> > Since i got the exception(svn: E170013: Unable to connect to a repository
> > at URL '
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_incubator_hudi&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=MNRtawGdg5It73U2RXDOQNXbqShKn6qN0OCoO24Jugo&e=
> ') while
> > running   ./release/
> https://urldefense.proofpoint.com/v2/url?u=http-3A__validate-5Fstaged-5Frelease.sh&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=xmx5cI98rLNdqqP5Eoz2S1nONOsaZMc3b9OeKTqmDIc&e=
> --release=0.5.0
> > --rc_num=5,  So i check it manually.
> >
> > - verified checksums and signatures - OK
> > - mvn test install - OK
> > - ran some tests in IDE - OK
> >
> > Best,
> > Leesf
> >
> > vbal...@apache.org  于2019年10月5日周六 上午9:15写道:
> >
> > > Hi everyone,We have a new release candidate for first release of Apache
> > > Hudi (incubating). The version is : 0.5.0-incubating-rc5. Please note
> > that
> > > previous release candidates RC#3 and RC#4 were not sent for voting as
> we
> > > discovered compliance issues before we could send them for voting.
> These
> > > issues were subsequently fixed as part of PR-935 and  PR-939 and RC#5
> has
> > > been builtWe also have a new release validation script available in
> > master
> > > to automate the usual checks.  To run this
> > >- If you have not checkout out hudi, please do
> > >   - git clone g...@github.com:apache/incubator-hudi.git;
> > >
> > >- If you already have hudi, please do
> > >
> > >- git checkout master  && git pull origin master
> > >
> > >- cd incubator-hudi/scripts;
> > >- ./release/
> https://urldefense.proofpoint.com/v2/url?u=http-3A__validate-5Fstaged-5Frelease.sh&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=xmx5cI98rLNdqqP5Eoz2S1nONOsaZMc3b9OeKTqmDIc&e=
> --release=0.5.0 --rc_num=5
> > >
> > > Please review and vote on the release candidate #5 for the version
> 0.5.0,
> > > as follows:[ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > > The complete staging area is available for your review, which includes:
> > >- JIRA release notes [1]
> > >- The official Apache source release and binary convenience releases
> > to
> > > be deployed to
> https://urldefense.proofpoint.com/v2/url?u=http-3A__dist.apache.org&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=eDqmNq-Qn2JGPXFluaqv-e6nRK_BOPmI40ZQO2aqi8A&e=
> [2], which are signed with the key with
> > > fingerprint AF9BAF79D311A3D3288E583F24A499037262AAA4  [3],
> > >
> > >- all artifacts to be deployed to the Maven Central Repository [4]
> > >
> > >- source code tag "release-0.5.0-incubating-rc5" [5]
> > >
> > > The vote will be open for at least 72 hours.
> > > Please cast your votes before *Oct. 9 2019, 19:00 PST*.
> > >
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > >-
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12322822-26version-3D12346087&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=l3ujmWQRdaoVFi1R2no1eu41GN4i3zalvdHV58PiM8w&e=
> > >-
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_incubator_hudi_hudi-2D0.5.0-2Dincubating-2Drc5_&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=E4CQ5a-6zY7ehZcZmPdLaqNb1w-c-lykBh1SgoYxSpc&s=0gQFUxomtnNByoz5PaRSj3Ope7Lk_Y8eNmU_HZZi2zU&e=
> > >-
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_incubator_hudi_KEYS&d=DwIFaQ

Re: [DISCUSS] cleaning up git history from Notice/License changes

2019-10-03 Thread Bhavani Sudha Saktheeswaran
+1 . Thats a good idea.



On Thu, Oct 3, 2019 at 2:32 PM vbal...@apache.org 
wrote:

>
> +1 on both cleanup. This would keep the git history clean and consistent
> with contribution.
> Balaji.VOn Thursday, October 3, 2019, 09:53:46 AM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  Folks,
>
> As we iterate across the RCs, we have added and removed to the
> NOTICE/LICENSE files a lot. Does anyone feel the need to clean up the
> history and do a one time force push? There is also an issue with github
> contribution stats not showing up everyone's commit (due to email changes
> etc). We could also tackle that
>
> thanks
> vinoth
>


Re: Hudi Parquet Storage Basic Question

2019-09-28 Thread Bhavani Sudha Saktheeswaran
Hi Umesh,

Let me try to answer this. At a very high level, if this table is of type
COPY_ON_WRITE another version of parquet file will be written with all keys
- a,b,c and d. However, if the table is of MERGE_ON_READ type then updates
are stored as avro files allowing for read-time reconciliation.

Depending on number of entries in your inputDF and file size configs, there
could be one or more parquet files produced instead of just 1 parquet file.
You can refer to documentation on File Management here -
https://hudi.apache.org/concepts.html#file-management and different storage
types here - https://hudi.apache.org/concepts.html#file-management

Hope this helps.

Thanks,
Sudha

On Fri, Sep 27, 2019 at 11:38 PM Umesh Kacha  wrote:

> Hi, I have a very basic question regarding how Hudi writes parquet files
> when it finds duplicates/updates/deletes in the daily feed data. Lets say
> we have the following dataframes
>
> val feedDay1DF = Seq(
>   Data("a", "0"),
>   Data("b", "1"),
>   Data("c", "2"),
>   Data("d", "3")
> ).toDF()
>
> I assume when Hudi stores above feedDay1DF as parquet file lets assume just
> one parquet file with 4 records with keys a,b,c,d
>
> //c and d keys values changed
> val feedDay2DF = Seq(
>   Data("a", "0"),
>   Data("b", "1"),
>   Data("c", "200"),
>   Data("d", "300")
> ).toDF()
>
> Now when we try to store feedDay2DF assume it will again store one more
> parquet file now question is will it store it with only two updated records
> c and d keys or it will store all keys a,b,c,d in a parquet file? Please
> guide.
>


Re: FAQ page

2019-09-24 Thread Bhavani Sudha Saktheeswaran
This is really cool. Thanks for putting this page together Vinoth !


On Tue, Sep 24, 2019 at 7:39 AM Nishith  wrote:

> The FAQ looks awesome Vinoth! Answers most of the questions that folks are
> confused about.
> Hoping folks can contribute more as we uncover more frequently asked
> questions.
>
> - Nishith
>
> Sent from my iPhone
>
> > On Sep 23, 2019, at 5:51 PM, vino yang  wrote:
> >
> > Thanks for your great work, Vinoth and Nishith. Will have a look soon.
> On 09/24/2019 07:51, vbal...@apache.org wrote: +1 Awesome job Vinoth and
> Nishith for compiling the initial version of FAQ. Agree on the idea of
> replying using FAQ.  Balaji.VOn Monday, September 23, 2019, 04:41:03 PM
> PDT, Vinoth Chandar  wrote:   First version of the
> page is now fully completed.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D113709185&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=TL83bPj9VSaQsFAaw-u4U4sPgfehW03c6c1JIPLV8XQ&s=rN50nugTyvvax4iJAZVfBMt5dH4SnO_yy9F1PX60QGo&e=
> Please try to use the FAQs when answering questions on ML and GH. It will
> only get better if we manage this effectively and keep improving it. On
> Sun, Sep 15, 2019 at 9:41 PM Vinoth Chandar  wrote: >
> Thanks! Will work this week to fill out most answers! > Your help reviewing
> would also be much appreciated. > Will keep this thread posted.. > > On
> Tue, Sep 10, 2019 at 6:10 PM vino yang  wrote: >
> >> Hi Vinoth, >> >> Great job! Thanks for your efforts! >> I think this
> page is good for users and developers to let them know Hudi >> well. >> >>
> Best, >> Vino >> >> >> >> Vinoth Chandar 
> 于2019年9月11日周三 上午2:27写道: >> >> > Hi all, >> > >> > I wrote a list of
> questions based on mailing list conversations and >> issues. >> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D113709185&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=TL83bPj9VSaQsFAaw-u4U4sPgfehW03c6c1JIPLV8XQ&s=rN50nugTyvvax4iJAZVfBMt5dH4SnO_yy9F1PX60QGo&e=
> >> > < >> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D113709185&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=TL83bPj9VSaQsFAaw-u4U4sPgfehW03c6c1JIPLV8XQ&s=rN50nugTyvvax4iJAZVfBMt5dH4SnO_yy9F1PX60QGo&e=
> >> > > >> > >> > While I am still working through answers, I thought this
> can be a good >> > community driven process. >> > >> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D113709185-23Frequentlyaskedquestions-28FAQ-29-2DContributingtoFAQ&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=TL83bPj9VSaQsFAaw-u4U4sPgfehW03c6c1JIPLV8XQ&s=z9N-muShuD6vVio51IJSiTssAaxJ5OdkN97slqOo8qU&e=
> >> > < >> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_pages_viewpage.action-3FpageId-3D113709185-23Frequentlyaskedquestions-28FAQ-29-2DContributingtoFAQ&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=TL83bPj9VSaQsFAaw-u4U4sPgfehW03c6c1JIPLV8XQ&s=z9N-muShuD6vVio51IJSiTssAaxJ5OdkN97slqOo8qU&e=
> >> > > >> > >> > Please help by contributing answers or new questions if
> you can! >> > >> > thanks >> > vinoth >> > >> >
>


Re: [PROPOSAL] Hudi Web UI

2019-09-21 Thread Bhavani Sudha Saktheeswaran
+1 for adding web ui. The web ui viz for table configs would be pretty
useful for easy debugging.




On Sat, Sep 21, 2019 at 7:35 PM Vinoth Chandar  wrote:

> +1 will take a look at the doc for specifics in a few days.
>
> On Sat, Sep 21, 2019 at 7:18 PM vino yang  wrote:
>
> > +1 to introduce Hudi web UI. Great suggestion! On 09/21/2019 12:24, Minh
> > Pham wrote: +1. I think an admin UI will help with reusability alot. On
> > Fri, Sep 20, 2019 at 8:32 PM Vinay Patil 
> wrote:
> > > Hi Taher, > > I really liked this idea, these details will be valuable
> to
> > checkout on Web > UI. > > +1 > > Regards, > Vinay Patil > > > On Fri, Sep
> > 20, 2019 at 3:28 PM Taher Koitawala  > wrote: > > >
> > Hi All, > >  A HIP has been created by me proposing a Hudi Web UI
> > with a lot > of > > helpful features. The major motivation of this
> proposal
> > is that at the > > moment, users have depend on Spark's configuration to
> a
> > metrics > collection > > system to see Hudi metrics. > > > >
> Also,
> > a lot of metadata about Hudi that the user doesn't find > > easily after
> > creating tables will be put up on the UI. Please check the > > document
> for
> > a detailed explanation. > > > > > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1oEjukuaK2ltqiD0sjVs5IUzvDzilWF0viwMASXTdEd4_edit-3Fusp-3Dsharing&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=aL8qpmWcgJ5Kv3y-xHJIfAXDqt-1yIXEwTidVJTcqHI&s=cF3VNnlSaNRsN4t2bSmA8XcNfZefvxZhd8KCbtxpqyE&e=
> > > > > > Regards, > > Taher Koitawala > > >
>


Re: [DISCUSS][VOTE] DyanamoDB Streams support in Hudi

2019-09-21 Thread Bhavani Sudha Saktheeswaran
+1 to adding more connectors to DeltStreamer and making them as much
pluggable modules as possible like Vino Yang suggested.


On Sat, Sep 21, 2019 at 7:12 PM vino yang  wrote:

> + 1 to introduce these connectors. It's nice to see that Hudi's ecosystem
> is growing. As Hudi connects to more and more systems, it is necessary to
> introduce separate modules to place these connectors. This can lead to
> module relayout or code refactoring. Of course, all this needs to be
> discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay Patil wrote:
> Hi Taher, Basically this can be proposal to support Kinesis and DynamoDb
> stream support can be enabled by reusing this source code. Flink has
> provided support for DynamoDb Streams by reusing Kinesis Streams classes.
> Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher Koitawala <
> taher...@gmail.com> wrote: > That would be a great addition Vinay. How
> about adding Kinesis as well? > > Regards, > Taher Koitawala > > On Sat,
> Sep 21, 2019, 4:20 PM Vinay Patil  wrote: > > >
> Hi Team, > > > > The DynamoDb streams contains the CDC data when enabled on
> a DynamoDb > > table, we can add a source for DeltaStreamer which will
> enable us to read > > this data and write it back either to Hudi dataset or
> to another sink. > > > > > > Thoughts on adding this support in Hudi ? > >
> > > > > Regards, > > Vinay Patil > > >


Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-09-17 Thread Bhavani Sudha Saktheeswaran
+1 (non-binding)
- verified checksums and signatures [SUCCESS]
- built from source release (mvn clean install -DskipTests) [SUCCESS]
- ran local docker tests [SUCCESS]
- ran some IDE tests [SUCCESS]

Thanks,
Sudha

Get Outlook for iOS


From: Vinoth Chandar 
Sent: Tuesday, September 17, 2019 8:26 PM
To: dev@hudi.apache.org
Subject: Re: [VOTE] Release 0.5.0-incubating, release candidate #2

+1 binding

## CheckSum (OK)
$ shasum -a 512 hudi-0.5.0-incubating-rc2.src.tgz > sha512
$ diff sha512 hudi-0.5.0-incubating-rc2.src.tgz.sha512.txt | wc -l
0
## Tests (OK)
$ mvn clean install # passed!


## Signature (OK)
$ gpg --import hudi-0.5.0-incubating-rc2/KEYS
...
gpg: Total number processed: 5
gpg: imported: 5

$ gpg --verify hudi-0.5.0-incubating-rc2.src.tgz.asc.txt
hudi-0.5.0-incubating-rc2.src.tgz
gpg: Signature made Tue Sep 17 12:44:16 2019 PDT
gpg: using RSA key AF9BAF79D311A3D3288E583F24A499037262AAA4
gpg: Good signature from "Balaji Varadarajan " [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the
owner.
Primary key fingerprint: AF9B AF79 D311 A3D3 288E 583F 24A4 9903 7262 AAA4


# Disclaimer exists (OK)
$ ls hudi-0.5.0-incubating-rc2/DISCLAIMER
hudi-0.5.0-incubating-rc2/DISCLAIMER

# Notice/License exists (OK)
$ ls hudi-0.5.0-incubating-rc2/NOTICE
hudi-0.5.0-incubating-rc2/NOTICE
$ ls hudi-0.5.0-incubating-rc2/LICENSE
hudi-0.5.0-incubating-rc2/LICENSE

# Already checked last time :
# 1. source files all have ASF license.
# 2. Tested rat plugin fails build if java/scala files don't have license.

On Tue, Sep 17, 2019 at 5:02 PM vbal...@apache.org 
wrote:

> Hi everyone,We have a new release candidate after addressing issues
> reported in first release candidate (see email thread)The new version is :
> 0.5.0-incubating-rc2. Please review and vote on the release candidate #2
> for version 0.5.0, as follows:
> [ ] +1, Approve the release[ ] -1, Do not approve the release (please
> provide specific comments)The complete staging area is available for your
> review, which includes:
> - JIRA release notes [1]
> - The official Apache source release and binary convenience releases to
> be deployed to 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__dist.apache.org&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=F3d_79gdrbg5SalC1Q6ScK9EASHqdIM6QleQLRV9EMg&e=
>  [2], which are signed with the key with
> fingerprint AF9BAF79D311A3D3288E583F24A499037262AAA4 [3],
>
> - all artifacts to be deployed to the Maven Central Repository [4]
>
> - source code tag "release-0.5.0-incubating-rc2" [5]
>
> The vote will be open for at least 72 hours.
> Please cast your votes before *Sep. 20 2019, 21:00 UTC*.
>
> It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12322822-26version-3D12346087&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=pobAzEJ6droMr3CMYrhS7HIvuPayst5ow0X1CbWUD30&e=
> -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_incubator_hudi_hudi-2D0.5.0-2Dincubating-2Drc2_&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=M_q4t4P8UtDxMkre3OR460EvkPnO4dRihd31sKwsOGQ&e=
> - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_incubator_hudi_KEYS&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=lcunzBZNNDsKUEmLGV4tQ51DeVOdfVMYecPeKB47DeA&e=
> -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachehudi-2D1002_&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=d7Nene5T6DVYnAxAV4q823VUMPhnpc90IdaIMtAuCyw&e=
> -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_tree_release-2D0.5.0-2Dincubating-2Drc2&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=frzXMfe4TbsXQW_6H9MXXLA58WYuXssMRxbyph2RVWY&e=
>
>
>
> P.S. : As this is a first time where Hudi community will be performing
> release voting, you can look at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_75e40ed5a6e0c3174728a0bcfe86cbcd99ae4778ebe94b839f0674cd-40-253Cdev.flink.apache.org-253E&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=DUUp8Wdi4tVqepH2zr4EhWZR0IhoI779Xr4GQ0BRCP0&s=yWOAsfxXDnl6UNt0FVb3O0CdMYNV9osrRMgRPDFHFF8&e=
>  for
> some understanding of validations community does to cast th

Help unblocking PR 896 to update site

2019-09-17 Thread Bhavani Sudha Saktheeswaran
I am trying to update the hudi site to reflect the latest doc changes since
last update. Since the paths to css and js scripts have been changed to
start with a '/' the styling is broken. I see this change coming from this
PR https://github.com/apache/incubator-hudi/pull/843 . For example
something like 
across multiple files.

Vino Yang,
Am not sure if this is any auto-generated ide change. Would you be able to
help fix this ? Once your change is in, I can rebase and update my PR.

Thanks,
Sudha


Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

2019-09-14 Thread Bhavani Sudha Saktheeswaran
+1 I  think adding new sources to DeltaStreamer is really valuable.

Thanks,
Sudha

On Sat, Sep 14, 2019 at 7:52 AM vino yang  wrote:

> Hi Taher,
>
> IMO, it's a good supplement to Hudi.
>
> So +1 from my side.
>
> Vinoth Chandar  于2019年9月14日周六 下午10:23写道:
>
> > Hi Taher,
> >
> > I am fully onboard on this. This is such a frequently asked question and
> > having it all doable with a simple DeltaStreamer command would be really
> > powerful.
> >
> > +1
> >
> > - Vinoth
> >
> > On 2019/09/14 05:51:05, Taher Koitawala  wrote:
> > > Hi All,
> > >  Currently, we are trying to pull data incrementally from our
> > RDBMS
> > > sources, however the way we are doing this is with HUDI is to create a
> > > spark table on top of the JDBC source using [1] which writes raw data
> to
> > an
> > > HDFS dir. We then use DeltaStreamer dfs-source to write that to a HUDI
> > > upsert COPY_ON_WRITE table.
> > >
> > >   However, I think it would be really helpful in such use cases
> > > where DeltaStreamer had something like a JDBC-source instead of sqoop
> or
> > > temp tables and then we could leave that in a continuous mode with a
> > > timestamp column and an interval which allows us to express how
> > frequently
> > > DeltaStreamer should check for new updates or inserts on RDBMS.
> > >
> > > 1: CREATE TABLE mysql_temp_table
> > > USING org.apache.spark.sql.jdbc
> > > OPTIONS (
> > >  url  "jdbc:mysql://
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__data.source.mysql.com&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=kd2JZkFO9u_nWk8s__l1rNlfZ0cQ_zXOjURNBNmmJo4&s=zIAG-Ct3xm-8XBHg7Gv4mxPF7YpQJ5wxWTarYnJlJDE&e=
> :3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL
> > > ",
> > >  dbtable "database.table_name",
> > >  fetchSize "100",
> > >  partitionColumn "contact_id", lowerBound "1",
> > > upperBound "2962429",
> > > numPartitions "62"
> > > );
> > >
> > > Regards,
> > > Taher Koitawala
> > >
> >
>


Re: [VOTE] Release 0.5.0-incubating, release candidate #1

2019-09-14 Thread Bhavani Sudha Saktheeswaran
+1 (non-binding) on other aspects.
- verified checksums and signatures [SUCCESS]
- built from source release (mvn clean install -DskipTests) [SUCCESS]
- ran local docker tests [SUCCESS]
- ran some IDE tests [SUCCESS]

Thanks,
Sudha


On Sat, Sep 14, 2019 at 6:46 PM Vinoth Chandar  wrote:

> -1 (binding)
>
> - Checksums & Signatures verify
> - Built the branch & tests pass
> - My own test jobs seem to work
>  - Checked pom for version
>  - NOTICE and LICENSE I think were updated right before RC was cut. Should
> be good to go
>  - Source files all have ASF license . Tested rat plugin fails build if
> java/scala files don't have license.
>
> But, checked other vote threads on general@incubator to understand any
> gaps
> [1] and have some concerns
> Most discussions mention DISCLAIMER. is this the disclaimer we have on
> site? or a separate file like this
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dheron_blob_master_DISCLAIMER-3F&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=luHVLuRJrqND1UCrOiTq176TCoQJ9SIfZGNRPyGRAc4&e=
> If
> latter, I think we need to add it.
>
> Release manager, kindly take note if we need to do anything to handle these
> before the general vote
>
> P.S: Found this to be a great resource for verifying the package
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_info_verification.html&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=VE-T7bsasyA-9IBP5mztzksc5FHjQTHK9sEySUco8wA&e=
>
> On Sat, Sep 14, 2019 at 9:17 AM Prasanna Rajaperumal 
> wrote:
>
> > +1 (binding)
> >
> > Great job getting the RC out!
> >
> > - verified checksums
> > - verified signatures
> > - Built the branch and my tests pass
> >
> >
> > On 2019/09/14 13:19:25, leesf  wrote:
> > > +1 (non-binding)
> > >
> > > - verified checksums and signatures - OK
> > > - checked that all pom.xml files point to the same
> > > version(0.5.0-incubating-rc1) - OK
> > > - built from source(mvn clean install -DskipTests) - OK
> > > - ran some tests in IDE - OK
> > >
> > > Best,
> > > Leesf
> > >
> > > vbal...@apache.org  于2019年9月14日周六 上午6:32写道:
> > >
> > > > Hi everyone, We have prepared the first apache release candidate for
> > > > Apache Hudi (incubating). The version is : 0.5.0-incubating-rc1.
> Please
> > > > review and vote on the release candidate #1 for the version 0.5.0, as
> > > > follows:[ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > > The complete staging area is available for your review, which
> includes:
> > > >- JIRA release notes [1]
> > > >- The official Apache source release and binary convenience
> > releases to
> > > > be deployed to
> https://urldefense.proofpoint.com/v2/url?u=http-3A__dist.apache.org&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=ecQHabJSqEgYr80kZcxsVRV6TD2cI1GC08OdVtbIHyo&e=
> [2], which are signed with the key with
> > > > fingerprint AF9BAF79D311A3D3288E583F24A499037262AAA4  [3],
> > > >
> > > >- all artifacts to be deployed to the Maven Central Repository [4]
> > > >
> > > >- source code tag "release-0.5.0-incubating-rc1" [5]
> > > >
> > > > The vote will be open for at least 72 hours.
> > > > Please cast your votes before *Sep. 18th 2019, 23:00 UTC*.
> > > >
> > > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > > votes.
> > > >-
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12322822-26version-3D12346087&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=k_heTdP8MHhsACoOqXUjIT3hmcfexCozZjPZM323SFk&e=
> > > >-
> > > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_hudi_hudi-2D0.5.0-2Dincubating-2Drc1_&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=33aQTSY9N5Nj6Vh3jU8_HPID8gP68ZrOaGpJinitle0&e=
> > > >-
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_hudi_KEYS&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=UgOsGoDP21Q5aja4vTcERgNpAtdE-AAozGDb934G4Vc&e=
> > > >-
> > > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachehudi-2D1001_&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=6OACEXf4lzybehn_i2Q0YBCpsZTN98wE4ii347BUKDU&s=PlxvoaiMFOpmQjoJUk8XE8xnZrb1BQP50oXmUIxnnXI&e=
> > > >-
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_tree_releas

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Bhavani Sudha Saktheeswaran
+1 for integrating Apache Pulsar.

On Wed, Sep 11, 2019 at 8:58 PM taher koitawala  wrote:

> Should we file a jira? If everyone agrees?
>
> On Thu, Sep 12, 2019, 6:30 AM vino yang  wrote:
>
> > +1 to welcome Pulsar connector
> >
> > Vinoth Chandar  于2019年9月12日周四 上午6:57写道:
> >
> > > +1 Always welcome new sources. Any takers for a PulsarSource in
> > > DeltaStreamer?
> > >
> > > On Tue, Sep 10, 2019 at 4:33 AM taher koitawala 
> > > wrote:
> > >
> > > > Hi Vinoth,
> > > >  Apache Pulsar is a pub/sub messaging system like Kafka,
> > > > however, it has a few more functions which makes it different like
> > > > serverless per record etl at Pulsar level. Pulsar auto service
> > discovery
> > > > etc.
> > > >As Pulsar is picking up pace should we bring that as a
> > > component
> > > > in DeltaStreamer?
> > > >
> > > > Thanks,
> > > > Taher Koitawala
> > > >
> > >
> >
>


Re: Dropping support for Spark 2.2 and lower

2019-09-10 Thread Bhavani Sudha Saktheeswaran
+1 will be very useful.


On Tue, Sep 10, 2019 at 10:30 AM Kim Hammar  wrote:

> +1, we are on Spark 2.4
>
> On Tue, Sep 10, 2019 at 6:49 PM Shiyan Xu 
> wrote:
>
> > +1
> >
> > On Tue, Sep 10, 2019 at 7:16 AM Vinoth Chandar 
> wrote:
> >
> > > Hello all,
> > >
> > > I am trying to gauge what spark version everyone is on. We would like
> to
> > > move the spark version to 2.4 and simplify a whole bunch of stuff. Any
> > > objections? As a best effort, we can try to make 2.3 work reliably. Any
> > > objections?
> > >
> > > Note that if you are using the RDD based hudi-client primarily, this
> > should
> > > not affect you per se.
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


Re: Help testing PR 873

2019-09-03 Thread Bhavani Sudha Saktheeswaran
Thats really cool. Will update if I come across any issues.

Thanks,
Sudha

On Mon, Sep 2, 2019 at 5:04 PM Vinoth Chandar  wrote:

> Folks,
>
> Finally the redesigned bundles are up at
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_pull_873&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=w5bVSHLA_CkbT4kbxUn7xJZyksOpHswTehT1f-BDL18&s=yeklth1DEfhjv8xsHqv2fGdd0lR6ld4eZyopK-CS2CI&e=
>
> This reduces the amount of classes in the bundles 6.5x and thus reduces
> probability of class/jar mismatches as well. Currently, all the integration
> tests and demo steps pass cleanly.
>
> I am looking into all issues aggregated at HUDI-159 and try to test more
> combinations. If you could test this with your setups or other environments
> as well, report issues as subtasks in HUDI-159, send PRs to this branch,
> that would be great!
>
>
> /thanks/vinoth
>


Re: Reg: Hudi Jira Ticket Conventions

2019-08-30 Thread Bhavani Sudha Saktheeswaran
+1 nice suggestion. We should definitely follow this.

-Sudha

On Thu, Aug 29, 2019 at 2:32 PM vbal...@apache.org 
wrote:

>  Yes, I just opened a new ticket (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.apache.org_jira_browse_HUDI-2D228&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=5FpRgeen1DY1iRzfniBnrvELJiIQj5H0ShF2gQycnvc&s=9VpHQSzxYYQu4FtvrZPkJiWbM3l4v9_GQBZnu-jAWZc&e=
> ) for this.
> Balaji.V
>
>
>
>
> On Thursday, August 29, 2019, 09:45:57 AM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  Pratyaksh, I am fine with that. Please go ahead.
>
> (Balaji, correct me if I am wrong. I dont think there is a task yet for
> this?)
>
> On Thu, Aug 29, 2019 at 8:26 AM vino yang  wrote:
>
> > +1 for the conventions
> >
> > Pratyaksh Sharma  于2019年8月29日周四 下午3:03写道:
> >
> > > Hi Vinoth,
> > >
> > > I would like to take up this task.
> > >
> > > On Thu, Aug 29, 2019 at 8:49 AM Vinoth Chandar 
> > wrote:
> > >
> > > > +1 can we add this to contributing/community pages. As well
> > > >
> > > > On Wed, Aug 28, 2019 at 2:33 PM vbal...@apache.org <
> vbal...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > To all contributors of Hudi:
> > > > > Dear folks,
> > > > > When filing or updating a JIRA for Apache Hudi, kindly make sure
> the
> > > > issue
> > > > > type and versions (when resolving the ticket) are set correctly.
> > Also,
> > > > the
> > > > > summary needs to be descriptive enough to catch the essence of the
> > > > > problem/features. This greatly helps in generating release notes.
> > > > > Thanks,Balaji.V
> > > >
> > >
> >


[DISCUSS] Suggestion for Docs UI

2019-08-22 Thread Bhavani Sudha Saktheeswaran
Hi all,

I was going through the documentation and thought, in some places, tab view
(like this:
https://ci.apache.org/projects/flink/flink-docs-master/getting-started/tutorials/local_setup.html#read-the-code)
can be adopted where we showcase how each query engine (Hive. Sparksql,
Presto) works. This would improve readability and also shorten the page
length. I am happy to work on it if we are okay with this change. Any
thoughts?


Thanks,
Sudha


Re: [DISCUSS] Hudi material and resources

2019-08-20 Thread Bhavani Sudha Saktheeswaran
+1 I think this is great idea to showcase Hudi.

Thanks,
Sudha

On Tue, Aug 20, 2019 at 4:02 AM vino yang  wrote:

> Hi guys,
>
> I am going to give a talk about Hudi in a meetup.
> However, I can not find the material and resource of Hudi, for example,
> logo images with(or without) text and icons.
>
> IMHO, it would be good to put these files on the official web site. Here is
> a good example.[1]
>
> What do you think?
>
> Best,
> Vino
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__flink.apache.org_material.html&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=-1N33JCyNCJF5bt2p2fut6LPCAYogvgH8pA_Z2rnx-E&s=cVSiVUfLTdIal89tKzI3CRmjLR89RVqvl-spAfelptQ&e=
>


Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-06 Thread Bhavani Sudha Saktheeswaran
+1 I think it would be useful

On Tue, Aug 6, 2019 at 9:45 AM Vinoth Chandar  wrote:

> This is what I see on the Notification settings . This sort of explains
> it.. I think we need to raise a ticket to change the scheme.
>
> Does everyone find it useful to receive emails as described in the thread?
>
> Notifications allow JIRA to send email notifications to specified people
> > regarding particular events in your project. Theyll receive a separate
> > notification for each event.
> >
> > The notification scheme defines how the notifications are configured for
> > this project. To change the notifications, contact your JIRA
> administrator.
> > Scheme used by this project:Empty Scheme
>
>
>- Email: j...@apache.org
>
>
> On Tue, Aug 6, 2019 at 9:29 AM Vinoth Chandar  wrote:
>
> > I am an administrator.. Even I don't get any emails for watches, assigned
> > tickets. :)
> >
> > I would imagine such a thing would be configurable at the user level?
> Have
> > you checked it out?
> >
> > Parallely, let me poke around the settings and see if I find something.
> >
> > On Tue, Aug 6, 2019 at 4:54 AM vino yang  wrote:
> >
> >> Hi guys,
> >>
> >> I can't get a JIRA notification email from the Hudi community.
> >>
> >> Under normal circumstances, I should be able to receive notification
> >> emails
> >> from JIRA in at least three of the following situations:
> >>
> >>
> >>- Any activity on the issue created by me
> >>- Whoever ping me on any issue
> >>- Any activity on the issue which I have watched
> >>
> >>
> >> I can receive it in other communities, such as Flink, Kylin, Calcite.
> >> However, no emails have been received in the Hudi community. I have
> >> already
> >> been active on some issues, e.g. HUDI-195[1], HUDI-153[2].
> >>
> >> Given that I can receive mail in other communities, I would like to know
> >> if
> >> it was caused by a setting from the Hudi JIRA contributor group?
> >>
> >> Anyone who only belongs to the contributor Jira group can tell me if you
> >> can receive notification emails under these circumstances?
> >>
> >> Thank you very much.
> >>
> >> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HUDI-2D195&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=p92Jh893LekoEPHHynMlBh8s2bOO_hMHC_5RzB0jPLc&s=JLyJvo7VbcPYrJTLLsG0ya-MHxMk3eI2e2CfgUurSlI&e=
> >> [2]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HUDI-2D153&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=p92Jh893LekoEPHHynMlBh8s2bOO_hMHC_5RzB0jPLc&s=0-bUYMuI1OILsqlp8sisAC-tmfIoXY8z-ul57Ooy2bo&e=
> >>
> >> Best,
> >> Vino
> >>
> >
>


Re: [VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-22 Thread Bhavani Sudha Saktheeswaran
+1

On Mon, Jul 22, 2019 at 11:12 AM Vinoth Chandar  wrote:

> Hello all,
>
> Pursuant to
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D18765&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=nCmYaZpAOHIxojEVJFxAp0i22zhEfs_B42fo8OTYzN0&s=bRNflTodZnRRZ7Y-F6KOoHyKgrXeyQkJH-WYnI6Qx-Q&e=
> , I would like
> to initiate a vote to clone the default workflow for Hudi. Specifically,
> this will allow us to make changes  like introducing new statuses, enabling
> anyone to change the ticket status etc and truly customize the JIRA
> experience based on the community's working style.
>
> Since we are new to this process, I will try to summarize
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.apache.org_foundation_voting.html-23binding-2Dvotes&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=nCmYaZpAOHIxojEVJFxAp0i22zhEfs_B42fo8OTYzN0&s=RbStb5dayTsTXbXwYtEl8ICLP9i7cnL7NAu9C93ntK0&e=
>
> To vote, you can respond to this thread with 0, +1, -1 (or the other
> weights listed on the page). If you are part of PMC, then append the phrase
> " binding" to the end as well.
>
> @mentors , please correct me if I am missing anything. Again newbie :)
>
> Vote will close in 72 hours.
>
> /thanks/vinoth
>


Re: Request help testing new pom/bundles

2019-07-11 Thread Bhavani Sudha Saktheeswaran
I was able to successfully test few Presto queries in the staging
environment. Looks good to me.

Thanks,
Sudha


Re: [DISCUSS] HIP-4: Faster Hive incremental pull queries

2019-05-20 Thread Bhavani Sudha Saktheeswaran
Create the HIP here - https://cwiki.apache.org/confluence/display/HUDI/HIP-4
Please share your thoughts.

Thanks,
Sudha

On Mon, May 20, 2019 at 11:31 AM Bhavani Sudha Saktheeswaran <
bhasu...@uber.com> wrote:

> Works now. Thanks Vinoth!
>
> -Sudha
>
> On Mon, May 20, 2019 at 11:05 AM Vinoth Chandar  wrote:
>
>> I just gave you wiki access. can you try again ?
>>
>> On Mon, May 20, 2019 at 10:53 AM Bhavani Sudha Saktheeswaran
>>  wrote:
>>
>> > Hi,
>> >
>> > I am trying to create a HIP in cwiki ( username: bhasudha) . Seems like
>> I
>> > need some access to create a HIP. Can you grant me permission ?
>> >
>> > Thanks,
>> > Sudha
>> >
>> > On Sun, May 19, 2019 at 5:04 PM Bhavani Sudha Saktheeswaran <
>> > bhasu...@uber.com> wrote:
>> >
>> > > Hello all,
>> > >
>> > > Hive Incremental queries on Hoodie currently suffer a limitation of
>> > > listing all partitions when a datestr is not present (lists .hoodie
>> and
>> > the
>> > > partitions) and end up throwing away a lot of the files (since
>> > `_*hoodie*_commit_time`
>> > > column values filters out those files) . This can be very expensive
>> and
>> > can
>> > > impact query planning time and sometime causes timeouts as well if the
>> > > table is large. https://issues.apache.org/jira/browse/HUDI-25  tracks
>> > the
>> > > issue.
>> > >
>> > > If we can leverage the timeline and partitions touched by the commits
>> > > involved in incremental pull, then we can avoid listing all partitions
>> > and
>> > > hence reduce the query planning time. I am planning to send a HIP to
>> > > discuss this further. Please share your thoughts.
>> > >
>> > > Thanks,
>> > > Sudha
>> > >
>> >
>>
>


Re: [DISCUSS] Faster Hive incremental pull queries

2019-05-20 Thread Bhavani Sudha Saktheeswaran
Works now. Thanks Vinoth!

-Sudha

On Mon, May 20, 2019 at 11:05 AM Vinoth Chandar  wrote:

> I just gave you wiki access. can you try again ?
>
> On Mon, May 20, 2019 at 10:53 AM Bhavani Sudha Saktheeswaran
>  wrote:
>
> > Hi,
> >
> > I am trying to create a HIP in cwiki ( username: bhasudha) . Seems like I
> > need some access to create a HIP. Can you grant me permission ?
> >
> > Thanks,
> > Sudha
> >
> > On Sun, May 19, 2019 at 5:04 PM Bhavani Sudha Saktheeswaran <
> > bhasu...@uber.com> wrote:
> >
> > > Hello all,
> > >
> > > Hive Incremental queries on Hoodie currently suffer a limitation of
> > > listing all partitions when a datestr is not present (lists .hoodie and
> > the
> > > partitions) and end up throwing away a lot of the files (since
> > `_*hoodie*_commit_time`
> > > column values filters out those files) . This can be very expensive and
> > can
> > > impact query planning time and sometime causes timeouts as well if the
> > > table is large. https://issues.apache.org/jira/browse/HUDI-25  tracks
> > the
> > > issue.
> > >
> > > If we can leverage the timeline and partitions touched by the commits
> > > involved in incremental pull, then we can avoid listing all partitions
> > and
> > > hence reduce the query planning time. I am planning to send a HIP to
> > > discuss this further. Please share your thoughts.
> > >
> > > Thanks,
> > > Sudha
> > >
> >
>


Re: [DISCUSS] Faster Hive incremental pull queries

2019-05-20 Thread Bhavani Sudha Saktheeswaran
Hi,

I am trying to create a HIP in cwiki ( username: bhasudha) . Seems like I
need some access to create a HIP. Can you grant me permission ?

Thanks,
Sudha

On Sun, May 19, 2019 at 5:04 PM Bhavani Sudha Saktheeswaran <
bhasu...@uber.com> wrote:

> Hello all,
>
> Hive Incremental queries on Hoodie currently suffer a limitation of
> listing all partitions when a datestr is not present (lists .hoodie and the
> partitions) and end up throwing away a lot of the files (since 
> `_*hoodie*_commit_time`
> column values filters out those files) . This can be very expensive and can
> impact query planning time and sometime causes timeouts as well if the
> table is large. https://issues.apache.org/jira/browse/HUDI-25  tracks the
> issue.
>
> If we can leverage the timeline and partitions touched by the commits
> involved in incremental pull, then we can avoid listing all partitions and
> hence reduce the query planning time. I am planning to send a HIP to
> discuss this further. Please share your thoughts.
>
> Thanks,
> Sudha
>


[DISCUSS] Faster Hive incremental pull queries

2019-05-19 Thread Bhavani Sudha Saktheeswaran
Hello all,

Hive Incremental queries on Hoodie currently suffer a limitation of listing
all partitions when a datestr is not present (lists .hoodie and the
partitions) and end up throwing away a lot of the files (since
`_*hoodie*_commit_time`
column values filters out those files) . This can be very expensive and can
impact query planning time and sometime causes timeouts as well if the
table is large. https://issues.apache.org/jira/browse/HUDI-25  tracks the
issue.

If we can leverage the timeline and partitions touched by the commits
involved in incremental pull, then we can avoid listing all partitions and
hence reduce the query planning time. I am planning to send a HIP to
discuss this further. Please share your thoughts.

Thanks,
Sudha


Re: NPE for Merge On Read use case in quickstart

2019-05-01 Thread Bhavani Sudha Saktheeswaran
Hi Tristan,
you might want to include "--schemaprovider-class
com.uber.hoodie.utilities.schema.FilebasedSchemaProvider" in the spark
submit command. I also faced similar issue when I tried the Docker demo. I
think there is a PR pending for Docs that includes this change.

Thanks,
Sudha

On Wed, May 1, 2019 at 1:33 PM Baker, Tristan 
wrote:

> Hi,
>
> Been working through the quickstart here:
> https://hudi.apache.org/docker_demo.html
>
> I get an NPE when running the merge on read spark job.
>
> Here’s the spark-submit command (copied from the quickstart instructions)
>
> https://gist.github.com/tcbakes/4a11cff217fb8a98205b4cc46cd29750
>
>
> Here’s the NPE:
>
> https://gist.github.com/tcbakes/021258638184ddcbde2b0320ec589fde
>
>
> I attached my debugger to the process and discovered that the
> schemaProvider is null in on line 65 here:
>
>
> https://github.com/apache/incubator-hudi/blob/3a0044216cb2f707639d48e2869f4ee6f25cfc19/hoodie-utilities/src/main/java/com/uber/hoodie/utilities/deltastreamer/SourceFormatAdapter.java#L65
>
> The Copy On Write spark job/example works fine, but this one doesn’t.
>
> Any pointers?
>
> Thanks,
> Tristan
>


Re: Docker demo throwing NoClassDefFoundError on NoClassDefFoundError when using delta streamer to ingest data into COW dataset

2019-04-09 Thread Bhavani Sudha Saktheeswaran
Thanks Balaji. Verified that is working! I ll send in the patch soon.

On Mon, Apr 8, 2019 at 2:06 PM Balaji Varadarajan
 wrote:

>
> Hi Sudha,
> It looks like the missing class is from a different jar which was not
> included in hoodie-utilities. Hoodie Utilities shading uses inclusion-type
> filter unlike shading for other bundles and hence this discrepancy.
> If you add the following line in hoodie-utilities pom, it should hopefully
> work.
>
>
>com.twitter:chill_2.11
>
> +  com.twitter:chill-java
>
> Balaji.V
>
>     On Thursday, April 4, 2019, 10:58:56 PM PDT, Bhavani Sudha
> Saktheeswaran  wrote:
>
>  Adding hoodie-spark bundle and hoodie-utilities bundle to Spark jars fixes
> this issue. I ll send a patch to fix this. Thanks everyone!
>
> -Sudha
>
>
>
> On Wed, Apr 3, 2019 at 3:19 PM Omkar Joshi  wrote:
>
> > Sudha,
> >
> > Try If this is passing for you "mvn clean integration-test".
> >
> > Thanks,
> > Omkar
> >
> > On Wed, Apr 3, 2019 at 2:57 PM Bhavani Sudha Saktheeswaran
> >  wrote:
> >
> > > Sure. Thanks! I ll update if I find anything!
> > >
> > > On Wed, Apr 3, 2019 at 2:54 PM Omkar Joshi 
> > wrote:
> > >
> > > > Hi Sudha,
> > > >
> > > > I haven't tried it via Docker. Let me try it sometime this week or
> > early
> > > > next week.
> > > >
> > > > On Wed, Apr 3, 2019 at 2:35 PM Bhavani Sudha Saktheeswaran
> > > >  wrote:
> > > >
> > > > > Hi Omkar,
> > > > >
> > > > > I am running the docker demo using the instructions here -
> > > > > https://hudi.apache.org/docker_demo.html. I get this exception
> when
> > > > doing
> > > > > Step5: Upsert of data using Delta Streamer. May be Docker set up is
> > > > picking
> > > > > old version of the jars ? You can reproduce it in master.
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > > > On Wed, Apr 3, 2019 at 11:43 AM om...@uber.com 
> > wrote:
> > > > >
> > > > > > Sudha,
> > > > > >
> > > > > > How are you using the hudi library? Are using bundled jar or
> > > something
> > > > > > else?
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > > > > >
> > > > > > omkar-C02T60PVG8WL:hoodie omkar$ jar -tvf
> > > > > >
> > > > >
> > > >
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > > > > > | grep "KryoInstantiator"
> > > > > >569 Tue Mar 26 18:44:50 PDT 2019
> > > > > >
> > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$$anon$1.class
> > > > > >  1561 Tue Mar 26 18:44:50 PDT 2019
> > > > > >
> com/uber/hoodie/com/twitter/chill/EmptyScalaKryoInstantiator.class
> > > > > >  1953 Tue Mar 26 18:44:50 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$.class
> > > > > >  1992 Tue Mar 26 18:44:50 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator.class
> > > > > >859 Tue Mar 26 18:44:52 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$1.class
> > > > > >845 Tue Mar 26 18:44:52 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$3.class
> > > > > >650 Tue Mar 26 18:44:52 PDT 2019
> > > > > >
> > > > >
> > > >
> > >
> >
> com/uber/hoodie/com/twitter/chill/config/ConfiguredInstantiator$CachedKryoInstantiator.class
> > > > > >  2107 Tue Mar 26 18:44:52 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator.class
> > > > > >863 Tue Mar 26 18:44:52 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$4.class
> > > > > >958 Tue Mar 26 18:44:52 PDT 2019
> > > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$2.class
> > > 

Re: Docker demo throwing NoClassDefFoundError on NoClassDefFoundError when using delta streamer to ingest data into COW dataset

2019-04-04 Thread Bhavani Sudha Saktheeswaran
Adding hoodie-spark bundle and hoodie-utilities bundle to Spark jars fixes
this issue. I ll send a patch to fix this. Thanks everyone!

-Sudha



On Wed, Apr 3, 2019 at 3:19 PM Omkar Joshi  wrote:

> Sudha,
>
> Try If this is passing for you "mvn clean integration-test".
>
> Thanks,
> Omkar
>
> On Wed, Apr 3, 2019 at 2:57 PM Bhavani Sudha Saktheeswaran
>  wrote:
>
> > Sure. Thanks! I ll update if I find anything!
> >
> > On Wed, Apr 3, 2019 at 2:54 PM Omkar Joshi 
> wrote:
> >
> > > Hi Sudha,
> > >
> > > I haven't tried it via Docker. Let me try it sometime this week or
> early
> > > next week.
> > >
> > > On Wed, Apr 3, 2019 at 2:35 PM Bhavani Sudha Saktheeswaran
> > >  wrote:
> > >
> > > > Hi Omkar,
> > > >
> > > > I am running the docker demo using the instructions here -
> > > > https://hudi.apache.org/docker_demo.html. I get this exception when
> > > doing
> > > > Step5: Upsert of data using Delta Streamer. May be Docker set up is
> > > picking
> > > > old version of the jars ? You can reproduce it in master.
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > > On Wed, Apr 3, 2019 at 11:43 AM om...@uber.com 
> wrote:
> > > >
> > > > > Sudha,
> > > > >
> > > > > How are you using the hudi library? Are using bundled jar or
> > something
> > > > > else?
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > > > >
> > > > > omkar-C02T60PVG8WL:hoodie omkar$ jar -tvf
> > > > >
> > > >
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > > > > | grep "KryoInstantiator"
> > > > >569 Tue Mar 26 18:44:50 PDT 2019
> > > > >
> com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$$anon$1.class
> > > > >   1561 Tue Mar 26 18:44:50 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/EmptyScalaKryoInstantiator.class
> > > > >   1953 Tue Mar 26 18:44:50 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$.class
> > > > >   1992 Tue Mar 26 18:44:50 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator.class
> > > > >859 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$1.class
> > > > >845 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$3.class
> > > > >650 Tue Mar 26 18:44:52 PDT 2019
> > > > >
> > > >
> > >
> >
> com/uber/hoodie/com/twitter/chill/config/ConfiguredInstantiator$CachedKryoInstantiator.class
> > > > >   2107 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator.class
> > > > >863 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$4.class
> > > > >958 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$2.class
> > > > >920 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$5.class
> > > > >975 Tue Mar 26 18:44:52 PDT 2019
> > > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$6.class
> > > > >
> > > > > On 2019/04/03 05:16:39, Bhavani Sudha Saktheeswaran
> > > > >  wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am getting this error when trying to ingest the second batch of
> > > data
> > > > (
> > > > > > upsets) into COW dataset. Looks like the KryoInstantiator is
> > missing
> > > in
> > > > > the
> > > > > > jars. Is this something that needs to be added to classpath
> > > separately
> > > > ?
> > > > > >
> > > > > > 2019-04-02 21:36:23 ERROR HoodieCopyOnWriteTable:274 - Error
> > > upserting
> > > > > > bucketType UPDATE for partition :0
> > > > > > java.lang.NoClassDefFoundError:
> > > > &

Re: Docker demo throwing NoClassDefFoundError on NoClassDefFoundError when using delta streamer to ingest data into COW dataset

2019-04-03 Thread Bhavani Sudha Saktheeswaran
Sure. Thanks! I ll update if I find anything!

On Wed, Apr 3, 2019 at 2:54 PM Omkar Joshi  wrote:

> Hi Sudha,
>
> I haven't tried it via Docker. Let me try it sometime this week or early
> next week.
>
> On Wed, Apr 3, 2019 at 2:35 PM Bhavani Sudha Saktheeswaran
>  wrote:
>
> > Hi Omkar,
> >
> > I am running the docker demo using the instructions here -
> > https://hudi.apache.org/docker_demo.html. I get this exception when
> doing
> > Step5: Upsert of data using Delta Streamer. May be Docker set up is
> picking
> > old version of the jars ? You can reproduce it in master.
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Apr 3, 2019 at 11:43 AM om...@uber.com  wrote:
> >
> > > Sudha,
> > >
> > > How are you using the hudi library? Are using bundled jar or something
> > > else?
> > >
> > >
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > >
> > > omkar-C02T60PVG8WL:hoodie omkar$ jar -tvf
> > >
> >
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> > > | grep "KryoInstantiator"
> > >569 Tue Mar 26 18:44:50 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$$anon$1.class
> > >   1561 Tue Mar 26 18:44:50 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/EmptyScalaKryoInstantiator.class
> > >   1953 Tue Mar 26 18:44:50 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$.class
> > >   1992 Tue Mar 26 18:44:50 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator.class
> > >859 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$1.class
> > >845 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$3.class
> > >650 Tue Mar 26 18:44:52 PDT 2019
> > >
> >
> com/uber/hoodie/com/twitter/chill/config/ConfiguredInstantiator$CachedKryoInstantiator.class
> > >   2107 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator.class
> > >863 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$4.class
> > >958 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$2.class
> > >920 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$5.class
> > >975 Tue Mar 26 18:44:52 PDT 2019
> > > com/uber/hoodie/com/twitter/chill/KryoInstantiator$6.class
> > >
> > > On 2019/04/03 05:16:39, Bhavani Sudha Saktheeswaran
> > >  wrote:
> > > > Hi,
> > > >
> > > > I am getting this error when trying to ingest the second batch of
> data
> > (
> > > > upsets) into COW dataset. Looks like the KryoInstantiator is missing
> in
> > > the
> > > > jars. Is this something that needs to be added to classpath
> separately
> > ?
> > > >
> > > > 2019-04-02 21:36:23 ERROR HoodieCopyOnWriteTable:274 - Error
> upserting
> > > > bucketType UPDATE for partition :0
> > > > java.lang.NoClassDefFoundError:
> > > > com/uber/hoodie/com/twitter/chill/KryoInstantiator
> > > > at java.lang.ClassLoader.defineClass1(Native Method)
> > > > at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> > > > at
> > > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> > > > ...
> > > > at
> > > >
> > >
> >
> com.uber.hoodie.common.util.SerializationUtils.serialize(SerializationUtils.java:50)
> > > > at
> > > >
> > >
> >
> com.uber.hoodie.common.util.collection.DiskBasedMap.put(DiskBasedMap.java:169)
> > > > at
> > > >
> > >
> >
> com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:169)
> > > > at
> > > >
> > >
> >
> com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:42)
> > > > at com.uber.hoodie.io
> > .HoodieMergeHandle.init(HoodieMergeHandle.java:159)
> > > > at com.uber.hoodie.io
> > > .HoodieMergeHandle.(HoodieMergeHandle.java:73)
> > > > at
> > > >
> > >
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:230)
> > > > at
>

Re: Docker demo throwing NoClassDefFoundError on NoClassDefFoundError when using delta streamer to ingest data into COW dataset

2019-04-03 Thread Bhavani Sudha Saktheeswaran
Hi Omkar,

I am running the docker demo using the instructions here -
https://hudi.apache.org/docker_demo.html. I get this exception when doing
Step5: Upsert of data using Delta Streamer. May be Docker set up is picking
old version of the jars ? You can reproduce it in master.

Thanks,
Sudha

On Wed, Apr 3, 2019 at 11:43 AM om...@uber.com  wrote:

> Sudha,
>
> How are you using the hudi library? Are using bundled jar or something
> else?
>
>
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
>
> omkar-C02T60PVG8WL:hoodie omkar$ jar -tvf
> packaging/hoodie-presto-bundle/target/hoodie-presto-bundle-0.4.6-SNAPSHOT.jar
> | grep "KryoInstantiator"
>569 Tue Mar 26 18:44:50 PDT 2019
> com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$$anon$1.class
>   1561 Tue Mar 26 18:44:50 PDT 2019
> com/uber/hoodie/com/twitter/chill/EmptyScalaKryoInstantiator.class
>   1953 Tue Mar 26 18:44:50 PDT 2019
> com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator$.class
>   1992 Tue Mar 26 18:44:50 PDT 2019
> com/uber/hoodie/com/twitter/chill/ScalaKryoInstantiator.class
>859 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$1.class
>845 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$3.class
>650 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/config/ConfiguredInstantiator$CachedKryoInstantiator.class
>   2107 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator.class
>863 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$4.class
>958 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$2.class
>920 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$5.class
>975 Tue Mar 26 18:44:52 PDT 2019
> com/uber/hoodie/com/twitter/chill/KryoInstantiator$6.class
>
> On 2019/04/03 05:16:39, Bhavani Sudha Saktheeswaran
>  wrote:
> > Hi,
> >
> > I am getting this error when trying to ingest the second batch of data (
> > upsets) into COW dataset. Looks like the KryoInstantiator is missing in
> the
> > jars. Is this something that needs to be added to classpath separately ?
> >
> > 2019-04-02 21:36:23 ERROR HoodieCopyOnWriteTable:274 - Error upserting
> > bucketType UPDATE for partition :0
> > java.lang.NoClassDefFoundError:
> > com/uber/hoodie/com/twitter/chill/KryoInstantiator
> > at java.lang.ClassLoader.defineClass1(Native Method)
> > at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> > at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> > ...
> > at
> >
> com.uber.hoodie.common.util.SerializationUtils.serialize(SerializationUtils.java:50)
> > at
> >
> com.uber.hoodie.common.util.collection.DiskBasedMap.put(DiskBasedMap.java:169)
> > at
> >
> com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:169)
> > at
> >
> com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:42)
> > at com.uber.hoodie.io.HoodieMergeHandle.init(HoodieMergeHandle.java:159)
> > at com.uber.hoodie.io
> .HoodieMergeHandle.(HoodieMergeHandle.java:73)
> > at
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:230)
> > at
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:184)
> > at
> >
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:267)
> > at
> >
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:440)
> > at
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
> > at
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> >
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> > at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)

Docker demo throwing NoClassDefFoundError on NoClassDefFoundError when using delta streamer to ingest data into COW dataset

2019-04-02 Thread Bhavani Sudha Saktheeswaran
Hi,

I am getting this error when trying to ingest the second batch of data (
upsets) into COW dataset. Looks like the KryoInstantiator is missing in the
jars. Is this something that needs to be added to classpath separately ?

2019-04-02 21:36:23 ERROR HoodieCopyOnWriteTable:274 - Error upserting
bucketType UPDATE for partition :0
java.lang.NoClassDefFoundError:
com/uber/hoodie/com/twitter/chill/KryoInstantiator
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
...
at
com.uber.hoodie.common.util.SerializationUtils.serialize(SerializationUtils.java:50)
at
com.uber.hoodie.common.util.collection.DiskBasedMap.put(DiskBasedMap.java:169)
at
com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:169)
at
com.uber.hoodie.common.util.collection.ExternalSpillableMap.put(ExternalSpillableMap.java:42)
at com.uber.hoodie.io.HoodieMergeHandle.init(HoodieMergeHandle.java:159)
at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:73)
at
com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:230)
at
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:184)
at
com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:267)
at
com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:440)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException:
com.uber.hoodie.com.twitter.chill.KryoInstantiator
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Thanks,
Sudha


Re: Hi

2019-03-08 Thread Bhavani Sudha Saktheeswaran
Thanks @vinoth. Its working fine now.

-Sudha



On Fri, Mar 8, 2019 at 6:28 PM Vinoth Chandar  wrote:

> @sudha  this should be fixed now.
>
> I assigned the tickets you owned on GH issues again back to you now.  If
> you can quickly test, if you have perms to reassign JIRA (eg:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HUDI-2D29&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=3271INYt8a3R8ud4Ki-gM1vZbyoAc9GqMkPQexh1hTw&s=9mhYLaVehsV9-LkZEl8M5hRdM7wkDZ7Eark8tYQF6SQ&e=).
> It would be awesome.
>
> On Thu, Mar 7, 2019 at 10:33 AM Vinoth Chandar  wrote:
>
> > Filed
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D17977&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=3271INYt8a3R8ud4Ki-gM1vZbyoAc9GqMkPQexh1hTw&s=HDRAXFAwkolaNMeT6Av4Jf87lzbXXpis5cEUX6uTCtI&e=
> >
> >
> >
> > On Wed, Mar 6, 2019 at 2:10 PM Thomas Weise  wrote:
> >
> >> The setup looks correct. User needs to be in the contributor role to be
> >> assigned a ticket.
> >>
> >> But I'm not able to assign either.
> >>
> >> Vinoth, please check with infra.
> >>
> >>
> >> On Wed, Mar 6, 2019 at 1:10 PM Bhavani Sudha Saktheeswaran
> >>  wrote:
> >>
> >> > Thank you!
> >> >
> >> > On Wed, Mar 6, 2019 at 10:54 AM Vinoth Chandar 
> >> wrote:
> >> >
> >> > > Thanks for reaching out. Added you.  But, seems like I still can't
> >> assign
> >> > > issues to you for e.g. I am an admin and I still can't edit these
> >> > > permissions, which is why this is happening.
> >> > > @mentors, any suggestions?
> >> > > PermissionGranted to
> >> > >
> >> > > Assignable User
> >> > >
> >> > > Users with this permission may be assigned to issues.
> >> > > project roleCommittersAdministratorsPMC
> >> > >
> >> > > On Wed, Mar 6, 2019 at 10:07 AM Bhavani Sudha Saktheeswaran
> >> > >  wrote:
> >> > >
> >> > > > Hey there,
> >> > > >
> >> > > > I have been working on some Presto related PRs and saw that
> recently
> >> > the
> >> > > > issues have been migrated from Github to Jira. Can you add me as a
> >> > > > contributor?
> >> > > >
> >> > > > Thanks,
> >> > > > Sudha
> >> > > >
> >> > >
> >> >
> >>
> >
>


Re: Hi

2019-03-06 Thread Bhavani Sudha Saktheeswaran
Thank you!

On Wed, Mar 6, 2019 at 10:54 AM Vinoth Chandar  wrote:

> Thanks for reaching out. Added you.  But, seems like I still can't assign
> issues to you for e.g. I am an admin and I still can't edit these
> permissions, which is why this is happening.
> @mentors, any suggestions?
> PermissionGranted to
>
> Assignable User
>
> Users with this permission may be assigned to issues.
> project roleCommittersAdministratorsPMC
>
> On Wed, Mar 6, 2019 at 10:07 AM Bhavani Sudha Saktheeswaran
>  wrote:
>
> > Hey there,
> >
> > I have been working on some Presto related PRs and saw that recently the
> > issues have been migrated from Github to Jira. Can you add me as a
> > contributor?
> >
> > Thanks,
> > Sudha
> >
>


Hi

2019-03-06 Thread Bhavani Sudha Saktheeswaran
Hey there,

I have been working on some Presto related PRs and saw that recently the
issues have been migrated from Github to Jira. Can you add me as a
contributor?

Thanks,
Sudha