Re: the best approach to contribute lots of safe non-breaking micro-fixes to clean up the whole code base

2024-02-27 Thread Vinoth Chandar
Thanks for starting this. I think refactoring without clear end goals could cause a bunch of thrashing. There are some active code restructuring efforts (see the storage abstraction, file group reader etc).. Someone can speak if they need some help there. but that said, there are plenty of

Re: Invitation to contribute to OneTable

2023-12-05 Thread Vinoth Chandar
Thanks for including the Hudi community! I am happy to participate and this project helps us expose Hudi data into all available engines out there. So excited for that. On Mon, Dec 4, 2023 at 1:28 PM Jesus Camacho Rodriguez wrote: > Hi All, > > We are reaching out regarding a new project in

Re: [VOTE] Release 1.0.0-beta1, release candidate #1

2023-11-13 Thread Vinoth Chandar
+1 (binding) On Sun, Nov 12, 2023 at 10:07 PM Y Ethan Guo wrote: > +1 (binding) > > - Source, bundle validation pass > - Ran Spark Quickstart (Datasource in Scala, SQL) on Spark 3.3 > - Ran long-running Hudi streamer jobs writing COW and MOR tables > > On Sat, Nov 11, 2023 at 12:24 AM sagar

Request for review : RFC-66 / Non Blocking Concurrency Control

2023-09-28 Thread Vinoth Chandar
Hi all, We have some promising results and a more finalized approach for newer concurrency control for Hudi 1.0. Please help review this rfc https://github.com/apache/hudi/pull/7907 Thanks Vinoth

Re: Calling for 0.12.4 release

2023-09-22 Thread Vinoth Chandar
+1 thanks Yue! On Thu, Sep 21, 2023 at 18:19 Danny Chan wrote: > Thanks Yue Zhang for the contribution ~ > > Best, > Danny > > Y Ethan Guo 于2023年9月2日周六 00:24写道: > > > > Thanks, Yue Zhang, for volunteering to be the RM! > > > > On Thu, Aug 31, 2023 at 4:38 PM Yue Zhang > wrote: > > > > > Hi

Re: [VOTE] Release 0.14.0, release candidate #2

2023-09-14 Thread Vinoth Chandar
For all, link [2] should be https://dist.apache.org/repos/dist/dev/hudi/hudi-0.14.0-rc2/ On Wed, Sep 13, 2023 at 11:53 AM Prashant Wason wrote: > Hi everyone, > > Please review and vote on the *release candidate #2* for the version > 0.14.0, as follows: > > [ ] +1, Approve the release > > [ ]

Re: [DISCUSS] Multi-table transactions

2023-08-30 Thread Vinoth Chandar
+1 Reviewed the RFC. Looks like a promising direction to take. On Thu, Aug 24, 2023 at 9:26 AM sagar sumit wrote: > Hi devs, > > RFC-69 proposes some exciting features and in line with that vision, > I would like to propose support for multi-table transactions in Hudi. > > As the name suggests,

Re: [DISCUSS] Release Manager for 1.0

2023-08-16 Thread Vinoth Chandar
Awesome! that was easy. lets go! On Wed, Aug 16, 2023 at 5:32 AM sagar sumit wrote: > Hi Vinoth, > > 1.0 seems to be packed with exciting features. > I would be glad to volunteer as the release manager. > > Regards, > Sagar > > On Wed, Aug 16, 2023 at 5:24 PM Vinoth C

[DISCUSS] Release Manager for 1.0

2023-08-16 Thread Vinoth Chandar
Hi PMC/Committers, We are looking for a volunteer to act as release manager for the 1.0 release. https://cwiki.apache.org/confluence/display/HUDI/1.0+Execution+Planning Anyone interested? Thanks Vinoth

Re: DISCUSS Hudi 1.x plans

2023-08-16 Thread Vinoth Chandar
eft some feedback. > > On Wed, 10 May 2023 at 06:56, Vinoth Chandar wrote: > > > > All - the RFC is up here. Please comment on the PR or use the dev list to > > discuss ideas. > > https://github.com/apache/hudi/pull/8679/ > > > > On Mon, May 8, 2023 at 11:43

Re: [Feature Request] Support "faking" hudi commit time with the value of some field in the record

2023-08-16 Thread Vinoth Chandar
(sorry for the late reply) Hi - the commit time can be a logical time as well, a lot of tests work this way. There may be some table features (e.g time based cleaning) that may not work, but those are more convenience ones anyway. I assume, the consumer would process all events at the required

Re: [DISCUSS] Hudi Reverse Streamer

2023-08-16 Thread Vinoth Chandar
> > use > > > > case to do hudi => Kafka and would enjoy building a more general > tool. > > > > > > > > However we need a rfc basis to start some effort in the right way > > > > > > > > On April 12, 2023 3:08:22 AM UTC, Vinot

Re: Record level index with not unique keys

2023-08-16 Thread Vinoth Chandar
Hi, yes the indexing DAG can support this today and even if not, it can be easily fixed. Main issue would be how we encode the mapping well. for e.g if we want map from user_id to all events that belong to the user, we need a different, scalable way of storing this mapping. I can organize this

Re: [DISCUSS] Should we support a service to manage all deltastreamer jobs?

2023-08-16 Thread Vinoth Chandar
+1 there are RFCs on table management services, but not specific to deltastreamer itself. Are you proposing building something specific to that? On Wed, Jun 14, 2023 at 8:26 AM Pratyaksh Sharma wrote: > Hi, > > Personally I am in favour of creating such a UI where monitoring and > managing

Re: About 0.14.0 Release Timeline

2023-06-21 Thread Vinoth Chandar
+1 from me. On Wed, Jun 21, 2023 at 8:35 AM Prashant Wason wrote: > Hello Everyone, > > I would like to start the discussion on the 0.14.0 release timeline. How > about Jun 30 for feature freeze and July 15 for creating the release > branch? > > > Thanks > Prashant Wason > RM for 0.14.0 >

Re: [Action required] Default Spark profile changed to 3.2

2023-06-02 Thread Vinoth Chandar
Hi, Just tried doing a mvn clean install -DskipTests, and the build failed. My local SPARK_HOME is pointing to spark 3.3 installation. Does that all matter now? Quite possible this is an issue with my setup, just flagging. Thanks Vinoth On Fri, May 26, 2023 at 8:30 AM Shiyan Xu wrote: > Hi

Re: [DISCUSSION] Simplify code structure for supporting multiple Spark versions in Hudi

2023-06-02 Thread Vinoth Chandar
This is a good topic, thanks for raising this. Overall our reliance on spark classes/APIs that are declared experimental is an issue on paper. But there is few other ways to get right performance without relying on these. This has been the tricky issue IMO. Thoughts? I ll review the code

Re: [ANNOUNCE] Apache Hudi 0.13.1 released

2023-06-02 Thread Vinoth Chandar
Thanks for driving this! On Wed, May 31, 2023 at 10:00 Yue Zhang wrote: > The Apache Hudi team is pleased to announce the release of Apache Hudi > 0.13.1 > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and > Incrementals. Apache Hudi manages storage of large analytical

Re: DISCUSS Hudi 1.x plans

2023-05-10 Thread Vinoth Chandar
All - the RFC is up here. Please comment on the PR or use the dev list to discuss ideas. https://github.com/apache/hudi/pull/8679/ On Mon, May 8, 2023 at 11:43 PM Vinoth Chandar wrote: > I have claimed RFC-69, per our process. > > On Mon, May 8, 2023 at 9:19 PM Vinoth Chandar wrote

Re: Calling for 0.13.1 Release

2023-05-09 Thread Vinoth Chandar
Looks like the PR landed. On Thu, May 4, 2023 at 1:27 PM nicolas paris wrote: > Hi, any timeline for the 0.13.1 bugfix release ? > may that one be added to the prep branch > https://github.com/apache/hudi/pull/8432 > > > On Thu, 2023-03-09 at 11:21 -0600, Shiyan Xu wrote: > > thanks for

Re: DISCUSS Hudi 1.x plans

2023-05-08 Thread Vinoth Chandar
I have claimed RFC-69, per our process. On Mon, May 8, 2023 at 9:19 PM Vinoth Chandar wrote: > Hi all, > > I have been consolidating all our progress on Hudi and putting together a > proposal for Hudi 1.x vision and a concrete plan for the first version 1.0. > > Will plan

Re: Calling for 0.14.0 Release Manager

2023-05-08 Thread Vinoth Chandar
Great! Look forward to a fantastic 0.14 On Thu, May 4, 2023 at 2:07 PM Sivabalan wrote: > thanks! > > On Wed, 3 May 2023 at 13:40, Prashant Wason > wrote: > > > > I volunteer to drive the 0.14.0. > > > > Thanks > > Prashant > > > > > > On Wed, May 3, 2023 at 1:28 PM Sivabalan wrote: > > > > >

DISCUSS Hudi 1.x plans

2023-05-08 Thread Vinoth Chandar
Hi all, I have been consolidating all our progress on Hudi and putting together a proposal for Hudi 1.x vision and a concrete plan for the first version 1.0. Will plan to open up the RFC to gather ideas across the community in coming days. Thanks Vinoth

Re: [DISCUSS] Hudi Reverse Streamer

2023-04-11 Thread Vinoth Chandar
i Vinoth, > > > > I am aligned with the first reason that you mentioned. Better to have a > > separate tool to take care of this. > > > > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar < > > mail.vinoth.chan...@gmail.com> > > wrote: > > > &

Re: Re: Re: [DISCUSS] split source of kafka partition by count

2023-04-07 Thread Vinoth Chandar
feature, could you help review it? > > > BR, > Kong > > > > > At 2023-04-05 00:19:20, "Vinoth Chandar" wrote: > >Look forward to this! could really help backfill/rebootstrap scenarios. > > > >On Tue, Apr 4, 2023 at 9:18 AM Vinoth Chand

Re: Re: [DISCUSS] split source of kafka partition by count

2023-04-04 Thread Vinoth Chandar
Look forward to this! could really help backfill/rebootstrap scenarios. On Tue, Apr 4, 2023 at 9:18 AM Vinoth Chandar wrote: > Thinking out loud. > > 1. For insert operations, it should not matter anyway. > 2. For upsert etc, the preCombine would handle the ordering problems. >

Re: Re: [DISCUSS] split source of kafka partition by count

2023-04-04 Thread Vinoth Chandar
nd hudi does not require any guarantees > about the ordering of kafka events. > > > I already filed one JIRA[https://issues.apache.org/jira/browse/HUDI-6019], > could you help assign the JIRA to me? > > > > > > > > At 2023-04-03 23:27:13, "Vinoth Chanda

Re: What precombine field really is used for and its future?

2023-04-04 Thread Vinoth Chandar
This current thread is another example of a practical need for pre combine field. "[DISCUSS] split source of kafka partition by count" On Tue, Apr 4, 2023 at 7:31 AM Vinoth Chandar wrote: > Thanks for raising this issue. > > Love to use this opp to share more context on

Re: What precombine field really is used for and its future?

2023-04-04 Thread Vinoth Chandar
Thanks for raising this issue. Love to use this opp to share more context on why the preCombine field exists. - As you probably inferred already, we needed to eliminate duplicates, while dealing with out-of-order data (e.g database change records arriving in different orders from two

Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Vinoth Chandar
without success. > > Kind Regards, > David > > Sent from Outlook for Android<https://aka.ms/AAb9ysg> > ________ > From: Vinoth Chandar > Sent: Friday, March 31, 2023 5:09:52 AM > To: dev > Subject: [DISCUSS] Hudi Reverse Streamer > > Hi a

Re: [DISCUSS] split source of kafka partition by count

2023-04-03 Thread Vinoth Chandar
Hi, Does your implementation read out offset ranges from Kafka partitions? which means - we can create multiple spark input partitions per Kafka partitions? if so, +1 for overall goals here. How does this affect ordering? Can you think about how/if Hudi write operations can handle potentially

Re: Re: Re: DISCUSS

2023-03-30 Thread Vinoth Chandar
, I am very happy to receive your reply. Here are some of my > thoughts。 > > At 2023-03-21 23:32:44, "Vinoth Chandar" wrote: > >>but when it is used for data expansion, it still involves the need to > >redistribute the data records of some data files, thus affect

Re: When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of cycles? For example, if I cycle 5 times, I stop accessing data

2023-03-30 Thread Vinoth Chandar
I believe there is no control today. You could hack a precommit validator and call System.exit if you want ;) (ugly, I know) But maybe we could introduce some abstraction to do a check between loops? or allow users to plugin some logic to decide whether to continue or exit? Love to understand

Re: [DISCUSS] Hudi Reverse Streamer

2023-03-30 Thread Vinoth Chandar
i derived data) ==> Hudi Reverse Streamer ==> (Data Warehouse/Kafka/Operational Database) On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar wrote: > Hi all, > > Any interest in building a reverse streaming tool, that does the reverse > of what the DeltaStreamer tool does? It will re

[DISCUSS] Hudi Reverse Streamer

2023-03-30 Thread Vinoth Chandar
Hi all, Any interest in building a reverse streaming tool, that does the reverse of what the DeltaStreamer tool does? It will read Hudi table incrementally (only source) and write out the data to a variety of sinks - Kafka, JDBC Databases, DFS. This has come up many times with data warehouse

Re: [BIG CHANGE] Switch logger from log4j2 to slf4j

2023-03-22 Thread Vinoth Chandar
+1 as long as we don't break logging/bundling across all the 7 odd engines Hudi is integrated into :) On Wed, Feb 22, 2023 at 12:41 AM Danny Chan wrote: > Many popular Apache projects use slf4j now to avoid unnecessary > conflicts, like the Apache Spark, Apache Flink,etc. slf4j is a bridge >

Re: Re: DISCUSS

2023-03-21 Thread Vinoth Chandar
to do without a new meta field? On Thu, Mar 16, 2023 at 2:22 AM 吕虎 wrote: > Hello, > I feel very honored that you are interested in my views. > > Here are some of my thoughts marked with blue font. > > At 2023-03-16 13:18:08, "Vinoth Chandar" wrote: > &

Re: About for 0.12.3 Release Timeline

2023-03-21 Thread Vinoth Chandar
Hi, Given there are some critical regressions set to go, I would prefer to scope down 0.12.3 to just the few PRs and get something out asap. Once everyone returns, we can drive a 0.12.4 on top? We can then take even till end of April Others, thoughts? On Mon, Mar 20, 2023 at 23:39 Forward Xu

Re: DISCUSS

2023-03-16 Thread Vinoth Chandar
Thanks for the proposal! Some first set of questions here. >You need to pre-select the number of buckets and use the hash function to determine which bucket a record belongs to. >when building the table according to the estimated amount of data, and it cannot be changed after building the table

Re: [DISCUSS] Build tool upgrade

2023-02-13 Thread Vinoth Chandar
This is cool! :) On Mon, Feb 13, 2023 at 2:02 PM Daniel Kaźmirski wrote: > Hi, > > I did try to add the mentioned extension to Hudi pom. Here are the results: > > Clean with cache extension disabled > mvn clean package -DskipTests -Dspark3.3 -Dscala-2.12 > -Dmaven.build.cache.enabled=false >

Re: [ANNOUNCE] Apache Hudi 0.12.1 released

2022-10-19 Thread Vinoth Chandar
Great job everyone! On Wed, Oct 19, 2022 at 07:11 zhaojing yu wrote: > The Apache Hudi team is pleased to announce the release of Apache Hudi > 0.12.1. > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes > and Incrementals. Apache Hudi manages storage of large analytical >

Re: [DISCUSS] Hudi data TTL

2022-10-18 Thread Vinoth Chandar
+1 love to discuss this on a RFC proposal. On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin wrote: > That's a very interesting idea. > > Do you want to take a stab at writing a full proposal (in the form of RFC) > for it? > > On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang > wrote: > > > Hi all, >

Re: [DISCUSS] Build tool upgrade

2022-09-30 Thread Vinoth Chandar
Hi Raymond. This would be a large undertaking and a big change for everyone. What does the build time look like if we switch to gradle or bazel? And do we know why it takes 10 min to build and why is that not okay? Given we all use IDEs mostly anyway Thanks Vinoth On Fri, Sep 30, 2022 at 22:48

Re: 0.12.1 release timeline

2022-09-20 Thread Vinoth Chandar
; > > > Syncing non-partitioned table has bugs around partition parameters > > > <https://github.com/apache/hudi/pull/6525> > > > bootstrap bug fixes: https://github.com/apache/hudi/pull/6694 and > > > https://github.com/apache/hudi/pull/6676 > > > >

Re: 0.12.1 release timeline

2022-09-19 Thread Vinoth Chandar
tbh the RM can make this call. Whether or not 1 week is aggressive, really depends on the scope of release, whats left to land/test. Would it be useful to frame the discussion in that way? On Mon, Sep 19, 2022 at 1:25 PM zhaojing yu wrote: > Do anyone else have any suggestions? > We will

Re: [ANNOUNCE] Apache Hudi 0.12.0 released

2022-08-18 Thread Vinoth Chandar
Great job, Sagar! Huge congratulations to the entire community in getting this out! On Thu, Aug 18, 2022 at 10:45 PM sagar sumit wrote: > The Apache Hudi team is pleased to announce the release of Apache Hudi > 0.12.0. > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes > and

Re: [VOTE] Release 0.12.0, release candidate #2

2022-08-14 Thread Vinoth Chandar
+1 (binding) On Sun, Aug 14, 2022 at 14:50 Bhavani Sudha wrote: > +1 (binding) > > > [OK] Build successfully all supported spark version > > [OK] Ran validation script > > [OK] Ran quickstart tests with spark 2.4 > > [OK] Ran some IDE tests > > > sudha[9:33:26] scripts %

Re: 0.12.0 Release Timeline

2022-08-10 Thread Vinoth Chandar
Hello. Any updates on RC2? :) On Sat, Aug 6, 2022 at 10:36 AM sagar sumit wrote: > Hi folks, > > Thanks for voting on RC1. > I will be preparing RC2 by Monday, 8th August end of day PST, > and I will send out a separate voting email for RC2. > > Regards, > Sagar > > On Fri, Jul 29, 2022 at

Re: [DISCUSS]: Integrate column stats index with all query engines

2022-08-10 Thread Vinoth Chandar
+1 for this. Suggested new reviewers on the RFC. https://github.com/apache/hudi/pull/6345/files#r943073339 On Wed, Aug 10, 2022 at 9:56 PM Pratyaksh Sharma wrote: > Hello community, > > With the introduction of multi modal index in Hudi, there is a lot of scope > for improvement on the

Re: Joining Slack workspace

2022-07-31 Thread Vinoth Chandar
Hi Siva, has the site link been fixed? Thanks Vinoth On Fri, Jul 29, 2022 at 11:06 AM Sivabalan wrote: > thanks for confirming. > > On Fri, 29 Jul 2022 at 09:35, Ken Krugler > wrote: > > > That worked, thanks! > > > > — Ken > > > > > On Jul 28, 2022, at 8:11 PM, Sivabalan wrote: > > > > > >

Re: 0.12.0 Release Timeline

2022-07-14 Thread Vinoth Chandar
+1 from me. On Thu, Jul 14, 2022 at 9:43 AM sagar sumit wrote: > Hi Folks, > > After some deliberation with the community and keeping the release blockers > < > https://github.com/apache/hudi/pulls?q=is%3Apr+is%3Aopen+label%3Apriority%3Ablocker > > > in > mind, > I am proposing a new date for

Re: native interface support

2022-07-12 Thread Vinoth Chandar
Hi all, Overall +1. Love to take this idea forward. At the very least, a good C++ API for HoodieTimeline and HoodieTableFileSystemView should be enough to get COW working end-end. One issue is the lack of a standard HFile reader in C++ and given a lot of our metadata layer uses the HFile format,

Re: [DISCUSS] Diagnostic reporter

2022-06-15 Thread Vinoth Chandar
+1 from me. It will be very useful if we can have something that can gather troubleshooting info easily. This part takes a while currently. On Mon, May 30, 2022 at 9:52 AM Shiyan Xu wrote: > Hi all, > > When troubleshooting Hudi jobs in users' environments, we always ask users > to share

Re: [DISCUSS] Hudi sync meetings for Chinese community

2022-05-26 Thread Vinoth Chandar
Great! Thanks for volunteering On Thu, May 26, 2022 at 02:09 Shiyan Xu wrote: > Awesome! looking forward to an initial proposal! > > On Thu, May 26, 2022 at 4:17 PM Shimin Yang wrote: > > > Hi Shiyan, I'm from bytedance data lake team, and our team would like to > > drive and host the hudi

Re: 0.11.1 release timeline

2022-05-24 Thread Vinoth Chandar
+1 as well. On Mon, May 23, 2022 at 23:59 Shiyan Xu wrote: > +1 on the timeline. So to clarify, basically 06/01 is to cut RC1 and it > will be released if all testing/checks pass, right? > > On Tue, May 24, 2022 at 12:59 PM Y Ethan Guo wrote: > > > Hi folks, > > > > As the RM for the 0.11.1

Re: [VOTE] Monthly Community Sync Time

2022-05-17 Thread Vinoth Chandar
+1 for changing. 9AM is my preference On Tue, May 17, 2022 at 1:20 PM Bhavani Sudha wrote: > Hi everyone, > > The Community sync happens last Wednesday of every month. Currently it is > scheduled at 7 am which is way too early for a lot of folks. Following are > proposed times for the meeting.

Re: [ANNOUNCE] Apache Hudi 0.11.0 released

2022-05-02 Thread Vinoth Chandar
+1 this was a very well coordinated release. Took tons of dedication. Thank you Raymond! On Mon, May 2, 2022 at 20:24 Forward Xu wrote: > Thank you raymond for your hard work and dedication. > > forwardxu > best > > Shiyan Xu 于2022年5月3日周二 09:51写道: > > > The Apache Hudi team is pleased to

Re: Spark structured streaming and Spark SQL improvements

2022-04-27 Thread Vinoth Chandar
Thanks the thoughtful note, Daniel! All of 1-3 looks good to me. Yann/Raymond or other spark usuals here, any thoughts on adding these for 0.12? 0.12 we want to get schema evolution to GA. That's also a very useful suggestion. Tao (author for Schema evolution), any thoughts? On Mon, Apr 25,

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-27 Thread Vinoth Chandar
I left my thoughts on the RFC https://github.com/apache/hudi/pull/4309 I just see this as a another deployment model where a centralized set of microservices take up scheduling, execution of Hudi's table services. +1 on thinking about sharding,locking and HA upfront. Thanks Vinoth On Thu, Apr

Re: [DISCUSS] hudi index improve

2022-04-27 Thread Vinoth Chandar
Hi all, This is a great discussion and nice to see how all of this is coming together. Penning down my thoughts. A) +1 on exposing INDEX syntax, we can start with Spark/Flink where we have full control on connectors and iterate faster. B) Do we need a manual refresh mode? Almost all databases

Re: [VOTE] Release 0.11.0, release candidate #3

2022-04-26 Thread Vinoth Chandar
+1 (binding) Ran RC checks. Passed On Sun, Apr 24, 2022 at 6:18 AM Shiyan Xu wrote: > Hi everyone, > > Please review and vote on the release candidate #3 for the version 0.11.0, > as follows: > > [ ] +1, Approve the release > > [ ] -1, Do not approve the release (please provide specific

Re: [DISCUSS] Hudi community sync time

2022-04-26 Thread Vinoth Chandar
+1 as well. Current PST times are pretty hard for many folks. On Sat, Apr 16, 2022 at 6:20 AM Gary Li wrote: > +1 for splitting into two sessions. The current schedule is challenging for > both US and Chinese folks. We can organize another session for the Chinese > timezone. > > Calling out for

Re: spark 3.2.1 built-in bloom filters

2022-04-04 Thread Vinoth Chandar
is if agreed > > On Wed, 2022-03-30 at 14:36 -0700, Vinoth Chandar wrote: > > Hi, > > > > I noticed that it finally landed. We actually began tracking that > > JIRA > > while initially writing Hudi at Uber.. Parquet + Bloom Filters has > > taken > > ju

Re: [ANNOUNCE] New Apache Hudi Committer - Zhaojing Yu

2022-03-31 Thread Vinoth Chandar
Congrats! On Thu, Mar 31, 2022 at 4:06 AM leesf wrote: > Congrats! > > Vino Yang 于2022年3月31日周四 17:03写道: > > > Congrats! > > > > Best, > > Vino > > > > Gary Li 于2022年3月25日周五 19:11写道: > > > > > > Congrats! > > > > > > Best, > > > Gary > > > > > > On Fri, Mar 25, 2022 at 4:07 PM Shiyan Xu > > >

Re: spark 3.2.1 built-in bloom filters

2022-03-30 Thread Vinoth Chandar
Hi, I noticed that it finally landed. We actually began tracking that JIRA while initially writing Hudi at Uber.. Parquet + Bloom Filters has taken just a few years :) I think we could switch out to reading the built-in bloom filters as well. it could make the footer reading lighter potentially.

Re: [DISCUSS] New RFC to support Lock-free concurrency control on Merge-on-read tables

2022-03-24 Thread Vinoth Chandar
+1. Love to be a co-author on the RFC, if you are open to it. On Mon, Mar 21, 2022 at 12:31 PM 冯健 wrote: > Hi team, > > The situation is Optimistic concurrency control(OCC) has some limitation > >- > >When conflicts do occur, they may waste massive resources during every >attempt

Re: 0.11.0 release timeline

2022-03-22 Thread Vinoth Chandar
+1 from me, as long as we don’t push it out more. On Tue, Mar 22, 2022 at 12:29 Raymond Xu wrote: > Ok Vinoth, thanks for highlighting this. BigQuery integration is an > important feature to add to 0.11.0. I also see some other inflight work > from the backlog. To accommodate this and other

Re: [DISCUSS] New RFC to create LogCompaction action for MOR tables?

2022-03-21 Thread Vinoth Chandar
+1 overall On Sat, Mar 19, 2022 at 5:02 PM Surya Prasanna wrote: > Hi Sagar, > Sorry for the delay in response. Thanks for the questions. > > 1. Trying to understand the main goal. Is it to balance the tradeoff > between read and write amplification for metadata table? Or is it purely to >

Re: Unbundling "spark-avro" dependency

2022-03-08 Thread Vinoth Chandar
Thanks Alexey. This was actually the case for a while now, I think. From what I can see, our quickstart for spark still suggests passing spark-avro in via --packages, but utilities bundle related examples are relying on the fact that this is pre-bundled. I do acknowledge that with recent Spark

Re: Next stop : Minor Or Major release?

2022-02-17 Thread Vinoth Chandar
+1 on B as well. same rationale as Raymond's. I think we have all major chunks landed or PRs up. Love to provide integration testing before the release. On Thu, Feb 17, 2022 at 4:25 PM Raymond Xu wrote: > I'm +1 to B. There are really awesome features planned for 0.11.0. Hoping > to see these

Re: [DISCUSS] Change data feed for spark sql

2022-02-14 Thread Vinoth Chandar
Hi all, I would love to not introduce new constructs like "timestamp", "snapshots. Hudi already has a clear notion of commit times, that can unlock this. Can we just use this as an opportunity to standardize the incremental query's schema? In fact, don't we already have change feed with our

Re: [DISCUSS] Dropping Spark 3.0.x support in 0.11

2022-01-23 Thread Vinoth Chandar
+1 for this. The rate of API breakages across minor Spark versions are bit untenable anyway. On Wed, Jan 12, 2022 at 1:22 AM Raymond Xu wrote: > Hi Chen, > > yes this is actually been worked on by liujinhui > https://issues.apache.org/jira/browse/HUDI-2370 > > and it's planned for 0.11 > > -- >

Re: [VOTE] Release 0.10.1, release candidate #2

2022-01-22 Thread Vinoth Chandar
n 2022 at 00:08, Vinoth Chandar wrote: > > > -1 > > > > The artifact version is wrong! It should be 0.10.*1* > > > > > >- hudi-0.10.0-rc2.src.tgz > >< > > > https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc2/hudi-0.10.0-rc2

Re: [VOTE] Release 0.10.1, release candidate #2

2022-01-21 Thread Vinoth Chandar
-1 The artifact version is wrong! It should be 0.10.*1* - hudi-0.10.0-rc2.src.tgz - hudi-0.10.0-rc2.src.tgz.asc

Re: [DISCUSS] New RFC? Add Call Procedure Command for spark sql

2022-01-10 Thread Vinoth Chandar
+1 please start a RFC On Fri, Jan 7, 2022 at 5:50 AM Forward Xu wrote: > Hi All, > > I want to add Call Procedure Command to spark sql, which will be very > useful to meet DDL and DML functions that cannot be handled. I can think of > the following 4 aspects: > - Commit management > - Metadata

Re: 0.10.1 Release timeline

2021-12-28 Thread Vinoth Chandar
When we say code freeze, does it mean that all commits that need to be part of 0.10.1 be in master by Jan 7? Thanks Vinoth On Tue, Dec 28, 2021 at 11:02 AM Sivabalan wrote: > Hi folks, > As agreed upon in another thread, wanted to propose a timeline for > 0.10.1 minor release(bug fix

Re: Regular minor/patch releases

2021-12-28 Thread Vinoth Chandar
week is when we can target 0.10.1. we might need atleast a month from > major release to have accrued some bug fixes and hence. > > Open to hear thoughts from the community. > > > > On Wed, Dec 15, 2021 at 2:15 PM Vinoth Chandar wrote: > > > Hi all, > > > &g

Re: Preparation for 0.10.1 minor release

2021-12-20 Thread Vinoth Chandar
Hi Siva, Can we use "fix version(s)" to track this? We can tag multiple fix versions with each JIRA. RM's work can be just skim the commits landing and mark candidates for 0.10.1, keep cherry-picking to a 0.10.1 feature branch.? Thanks Vinoth On Mon, Dec 20, 2021 at 8:42 AM Sivabalan wrote: >

PSA, PR merges slowed down due to flaky CI

2021-12-16 Thread Vinoth Chandar
Hi all, We have been fighting some flakiness in the CI. I think we were able to resolve the IT tests continuously failing in Azure due to memory issues. There is some residual issues still. We would like to take sometime to resolve these before we push on with the PR backlog we have. Appreciate

Re: Regular minor/patch releases

2021-12-15 Thread Vinoth Chandar
here should only include the bug fixes, no > > > breaking change, no feature, it should not be a hard work i think. > > > > > > Best, > > > Danny > > > > > > Sivabalan 于2021年12月14日 周二上午4:06写道: > > > > > > > +1 in general. but

Regular minor/patch releases

2021-12-13 Thread Vinoth Chandar
Hi all, In the past we had plans for minor releases [1], but invariably we end up doing major ones, which also deliver the bug fixes. The reason was the cost involved in doing a release. We have made some good progress towards regression/integration test, which prompts me to revive this. What

Re: [DISCUSS] Propose Consistent Hashing Indexing for Dynamic Bucket Number

2021-12-13 Thread Vinoth Chandar
+1 on the overall idea. I am wondering if we can layer this on top of Hash Index as a way for just expanding the number of buckets. While Split/Merge sounds great, IMO there is significant operational overhead to it. Most practical scenarios can be met with ability to expand with zero impact as

Re: [VOTE] Release 0.10.0, release candidate #3

2021-12-04 Thread Vinoth Chandar
+1 (binding) Ran the RC checks in [1] . This is a huge release, thanks everyone for all the hard work! [1] https://gist.github.com/vinothchandar/68b34f3051e41752ebffd6a3edeb042b On Sat, Dec 4, 2021 at 5:20 AM Danny Chan wrote: > Hi everyone, > > Please review and vote on the release candidate

Re: [DISCUSS] Move to Spark DataSource V2 API

2021-11-21 Thread Vinoth Chandar
Hi all, Sorry. Bit late to the party here. +1 on kicking this off and +1 on reusing the work Raymond has already kickstarted here. I think we are in a good position to roll with this approach. The biggest issue with V2 on the writing side, remains that fact that we cannot really "shuffle" data

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Vinoth Chandar
5] Rolling Upgrade downgrade story for 0.10 & enabling >metadata (Owner: Manoj Govindassamy) >- [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth >Chandar) >- [HUDI-2480] FileSlice after pending compaction-requested instant-time >is

Re: [DISCUSS] RFC for Synchronous Metadata table for File listing

2021-11-12 Thread Vinoth Chandar
+1 on this. On Fri, Nov 5, 2021 at 9:17 AM Sivabalan wrote: > RFC-15 > < > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements > > > made an attempt to boost performance of file listing by storing all file > information in metadata table. As we are

Re: [DISCUSS] Hudi Community Communication Updates

2021-11-10 Thread Vinoth Chandar
+1 for this. We will also archive all community activity on ASF infrastructure this way! On Wed, Nov 10, 2021 at 7:14 AM Pratyaksh Sharma wrote: > Hi Rajesh, > > I do not have any strong opinions for/against point #1. > > Point #2 definitely seems useful to me. > I hope messages from #general

Re: [DISCUSS] RFC-27 for Data skipping/column stats index rewrite -> github RFC

2021-11-08 Thread Vinoth Chandar
+1 from me. Please preserve the old page, since it has a bunch of discussions in the community as well On Fri, Nov 5, 2021 at 3:37 PM Sivabalan wrote: > Hey folks, > We have already put up RFC-27 data skipping/column stats index here > < >

Re: [DISCUSS] Trino Plugin for Hudi

2021-11-05 Thread Vinoth Chandar
Could we please kick off an RFC for this? On Thu, Nov 4, 2021 at 8:58 PM sagar sumit wrote: > I have created an umbrella JIRA to track this story: > https://issues.apache.org/jira/browse/HUDI-2687 > Please also join #trino-hudi-connector channel in Hudi Slack for more > discussion. > > Regards,

Re: Limitations of non unique keys

2021-11-05 Thread Vinoth Chandar
). Thats the direction > > > we are approaching. > > > > wow this is amazing. I haven't found yet RFC about this, nor ready to > > test PR. > > > > This answer my initial question: with the secondary indexes options > > comming, the hudi key shall be

Re: [DISCUSS] Metadata based bloom index

2021-11-05 Thread Vinoth Chandar
+1 on this. I think cloud storage throttling is more of an issue that causes degradations when tables are enormous. but this approach should nicely handle that as well On Fri, Nov 5, 2021 at 9:31 AM Manoj Govindassamy < manoj.govindass...@gmail.com> wrote: > Hi Hudi Community, > > Hudi has

Re: Release 0.10.0 planning

2021-11-05 Thread Vinoth Chandar
mitters) chime in > here > > for the proposed date. > > > > > > > > On Wed, Nov 3, 2021 at 4:11 AM Vinoth Chandar wrote: > > > > > Folks, may be good to push it by a week. Nov 26 can be the RC cut date. > > > > > > On Mon, Nov 1, 2021 at 7:41 PM V

Re: Limitations of non unique keys

2021-11-03 Thread Vinoth Chandar
gt; dataset > > > > I wonder if there will be trouble I am unaware of with such trick > > > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote: > > > Hi, > > > > > > Are you asking if there are advantages to allowing duplicates or not >

Re: feature request/proposal: leverage bloom indexes for readingb

2021-11-03 Thread Vinoth Chandar
hen a full featured / documented hoodie client is maybe the best option > > > thought ? > > > On Thu Oct 28, 2021 at 2:34 PM CEST, Vinoth Chandar wrote: > > Sounds great! > > > > On Tue, Oct 26, 2021 at 7:26 AM Nicolas Paris > > wrote: > > > > &g

Re: Monthly or Bi-Monthly Dev meeting?

2021-11-03 Thread Vinoth Chandar
Hi all, I have shared some times and setup as discussed in this PR. https://github.com/apache/hudi/pull/3914 Thanks Vinoth On Fri, Oct 22, 2021 at 10:50 PM Pratyaksh Sharma wrote: > I can save them all on my external hard disk. :) > > On Fri, Oct 22, 2021 at 8:04 PM Vinoth Chand

Re: Release 0.10.0 planning

2021-11-03 Thread Vinoth Chandar
Folks, may be good to push it by a week. Nov 26 can be the RC cut date. On Mon, Nov 1, 2021 at 7:41 PM Vinoth Chandar wrote: > > Great! Is everyone good with the nov 19 date? Love to atleast do this > before nov 26, before holidays kick in! > > > On Mon, Nov 1, 2021 at 7:36 PM

Re: Release 0.10.0 planning

2021-11-01 Thread Vinoth Chandar
Great! Is everyone good with the nov 19 date? Love to atleast do this before nov 26, before holidays kick in! On Mon, Nov 1, 2021 at 7:36 PM Danny Chan wrote: > I can take that. > > Best, > Danny > > Vinoth Chandar 于2021年10月30日周六 上午6:07写道: > > > Hi all, >

Re: Release 0.10.0 planning

2021-10-29 Thread Vinoth Chandar
Hi all, I propose we cut the RC for 0.10.0 by Nov 19. Any volunteers for release manager? Thanks Vinoth On Sun, Oct 17, 2021 at 10:45 AM Sivabalan wrote: > This release has a lot of exciting features lined up. Eagerly looking > forward to it. > > On Thu, Oct 14, 2021 at 1:

Re: New site/docs navigation

2021-10-28 Thread Vinoth Chandar
ho want to work together to translate? > Please contact me. > > > 2021年10月28日 20:35,Vinoth Chandar 写道: > > > > Hi all, > > > > https://github.com/apache/hudi/pull/3855 puts up a nice redesign of the > > content, that can show case all of the Hudi capabilities. P

New site/docs navigation

2021-10-28 Thread Vinoth Chandar
Hi all, https://github.com/apache/hudi/pull/3855 puts up a nice redesign of the content, that can show case all of the Hudi capabilities. Please chime in and help merge the PR. As follow on, we can also fix the Chinese site docs after this? Thanks Vinoth

  1   2   3   4   5   6   7   8   9   10   >