Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread Nicholas
+1 on vinoyang as the release manager +1 on making a shorter 0.5.2 release. On 2020/02/19 00:55:34, leesf wrote: > +1 on vino to be RM, and will help him to release as I can. > > nishith agarwal 于2020年2月19日周三 上午7:28写道: > > > +1 on minor release focussing on Apache compliance. > > +1 on Vino

Re:Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread 蒋晓峰
+1 on vinoyang as the release manager +1 on making a shorter 0.5.2 release. At 2020-02-19 08:55:34, "leesf" wrote: >+1 on vino to be RM, and will help him to release as I can. > >nishith agarwal 于2020年2月19日周三 上午7:28写道: > >> +1 on minor release focussing on Apache compliance. >> +1 on Vino

Re: Discussion Thread: HUDI File Listing and Query Planning Improvements

2020-02-18 Thread leesf
+1 from me, query improvement will indeed make hudi more advanced. Vinoth Chandar 于2020年2月19日周三 上午3:17写道: > +1 on this as well. Also happy to collaborate on the RFC itself and help it > make progress.. > > >> For a column in the dataset, min/max range per Parquet file can be > maintained >

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread leesf
+1 on vino to be RM, and will help him to release as I can. nishith agarwal 于2020年2月19日周三 上午7:28写道: > +1 on minor release focussing on Apache compliance. > +1 on Vino yang to be Release Manager. > > -Nishith > > On Tue, Feb 18, 2020 at 11:53 AM vbal...@apache.org > wrote: > > > > > +1 on minor

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread nishith agarwal
+1 on minor release focussing on Apache compliance. +1 on Vino yang to be Release Manager. -Nishith On Tue, Feb 18, 2020 at 11:53 AM vbal...@apache.org wrote: > > +1 on minor release focussing on Apache compliance. > +1 on Vino yang to be Release Manager. > The compliance issues reported on

Re: Hudi on EMR syncing GLUE catalog issue

2020-02-18 Thread Mehrotra, Udit
Hi Igor, As of current implementation, Hudi submits queries like creating table, syncing partitions etc directly to the hive server instead of directly communicating with the metastore. Thus while launching the EMR cluster, you should install Hive on the cluster as well. Also enable glue

Re: Apache Hudi on AWS EMR

2020-02-18 Thread Mehrotra, Udit
Workaround provided by Gary can help querying Hudi tables through Athena for Copy On Write tables by basically querying only the latest commit files as standard parquet. It would definitely be worth documenting, as several people have asked for it and I remember providing the same suggestion on

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread vbal...@apache.org
+1 on minor release focussing on Apache compliance.  +1 on Vino yang to be Release Manager.  The compliance issues reported on build process in  https://lists.apache.org/list.html?gene...@incubator.apache.org:lte=1M:Hudi  should also be looked upon and be on the jira list (if not already).

Re: Apache Hudi on AWS EMR

2020-02-18 Thread Vinoth Chandar
Thanks everyone for chiming in. Esp Gary for the detailed workaround.. (should we FAQ this workaround.. food for thought) >> if I connect to the Hive catalog on EMR, which is able to provide the Hudi views correctly, I should be able to get correct results on Athena Knowing how the Presto/Hudi

Re: [Help] Hudi NOTICE need more work

2020-02-18 Thread Vinoth Chandar
To get more context on the NOTICE, may be balaji can chime in? (Since he drove the current NOTICE file during the initial release) I need to dig in more deeply to chime in, myself. On Tue, Feb 18, 2020 at 4:22 AM leesf wrote: > Hi all, > > During the voting process on rc1 0.5.1-incubating

Re: Discussion Thread: HUDI File Listing and Query Planning Improvements

2020-02-18 Thread Vinoth Chandar
+1 on this as well. Also happy to collaborate on the RFC itself and help it make progress.. >> For a column in the dataset, min/max range per Parquet file can be maintained This also (Nishith probably mentioned this) will help speed up the current bloom index's range checking.. On Tue, Feb 18,

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread Vinoth Chandar
+1 on vinoyang as the release manager +1 on making a shorter 0.5.2 release. My only suggestion is to have a concrete focus for this release as "ensuring the hudi release is apache compliant fully" (so it will count towards graduation). if you all agree: top of my mind, we need to probably

Re: [DISCUSS] Adding common errors and solutions to FAQs

2020-02-18 Thread Vinoth Chandar
you have it now! Thanks for driving this, Pratyaksh! On Mon, Feb 17, 2020 at 5:51 AM Pratyaksh Sharma wrote: > I can help do the initial set up of troubleshooting guide if no one else is > willing to drive this. Need perms for the same. > > On Mon, Feb 10, 2020 at 12:48 PM Vinoth Chandar

Re: Refactor and enhance Hudi Transformer

2020-02-18 Thread Vinoth Chandar
Thanks Hamid and Vinoyang for the great discussion On Fri, Feb 14, 2020 at 5:18 AM vino yang wrote: > I have filed a Jira issue[1] to track this work. > > [1]: https://issues.apache.org/jira/browse/HUDI-613 > > vino yang 于2020年2月13日周四 下午9:51写道: > > > Hi hamid, > > > > Agree with your opinion.

Re: Discussion Thread: HUDI File Listing and Query Planning Improvements

2020-02-18 Thread vino yang
Hi Balajee, Big +1 for the RFC, good optimization mechanism. Best, Vino vbal...@apache.org 于2020年2月18日周二 下午1:27写道: > > Big +1 on the requirement. This would also help datasets using cloud > storage by avoiding costly listings there. Will look closely on the design > and implementation in RFC

Re: [DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread vino yang
Hi Leesf, Thanks for kicking the discussion off. +1 for planning to release Hudi 0.5.2. The 0.5.2 version is a minor version before 0.6 version, more quickly release can solve some small problems. After releasing Hudi 0.5.1 version, we also fixed some bugs and developed some features. So it is

[DISCUSS] Next Apache Release(0.5.2)

2020-02-18 Thread leesf
Hello all, In the spirit of making Apache Hudi (incubating) releases at regular cadence, we are starting this thread to kickstart the planning and preparatory work for next release (0.5.2). As 0.5.2 is a minor release version and contains some features, bug fixes, code cleanup and some apache

Hudi on EMR syncing GLUE catalog issue

2020-02-18 Thread Igor Basko
Hi Dear List, I'm trying to catalog Hudi files in GLUE catalog using the sync hive tool, while using the spark save function (and not the standalone version). I've created an EMR with Spark application only (without Hive). Also added the following hive metastore client factory class