+1. and we would discuss it further when design docs are available.
Best,
Leesf
Balaji Varadarajan 于2019年11月12日周二 下午4:17写道:
> +1 on the exporter tool idea.
>
> On Mon, Nov 11, 2019 at 10:36 PM vino yang wrote:
>
> > Hi Shiyan,
> >
> > +1 for this proposal, Also, it looks like an exporter
[1] +1. `views` indeed confused me a lot.
[2] +1. `snapshot` is more reasonable.
[3] I don't feel very strong to rename it, the current name `COPY_ON_WRITE`
is reasonable considering the cost to rename and the behavior that new
version parquet file will be created and seems to be copied from old
+1 on the exporter tool idea.
On Mon, Nov 11, 2019 at 10:36 PM vino yang wrote:
> Hi Shiyan,
>
> +1 for this proposal, Also, it looks like an exporter tool.
>
> @Vinoth Chandar Any thoughts about where to place it?
>
> Best,
> Vino
>
> Vinoth Chandar 于2019年11月12日周二 上午8:58写道:
>
> > We can
Agree with all 3 changes. The naming now looks more consistent than
earlier. +1 on them
Depending on whether we are renaming Input formats for (1) and (2) - this
could require some migration steps for
Balaji.V
On Mon, Nov 11, 2019 at 7:38 PM vino yang wrote:
> Hi Vinoth,
>
> Thanks for
+1. This would be a powerful feature which would open up use-cases
requiring repeatable query results.
Balaji.V
On Mon, Nov 11, 2019 at 8:12 AM nishith agarwal wrote:
> Folks,
>
> Starting a discussion thread for enabling time-travel for Hudi datasets.
> Please provide feedback on the RFC
Hi Shiyan,
+1 for this proposal, Also, it looks like an exporter tool.
@Vinoth Chandar Any thoughts about where to place it?
Best,
Vino
Vinoth Chandar 于2019年11月12日周二 上午8:58写道:
> We can wait for others to chime in as well. :)
>
> On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu
> wrote:
>
> >
Hi Vinoth,
Thanks for bringing these proposals.
+1 on all three. Especially, big +1 on the third renaming proposal.
When I was a newbie. The "COPY_ON_WRITE" term confused me a lot. It easily
mislead users on the "copy" term. And make users compare it with the
`CopyOnWriteArrayList` data
[1] +1; "query" indeed sounds better
[2] +1 on the term "snapshot"; so basically we follow the convention that
when we say "snapshot", it means "give me the most up-to-date facts (lowest
data latency) even if it takes some query time"
[3] Though I agree with the renaming, I have a different
+1 on all three rename proposals. I think this would make the concepts
super easy to follow for new users.
If changing [3] seems to be a stretch, we should definitely do [1] & [2] at
the least IMO. I will be glad to help out on the renames to whatever extent
possible should the Hudi community
We can wait for others to chime in as well. :)
On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu
wrote:
> Yes, Vinoth, you're right that it is more of an exporter, which exports a
> snapshot from Hudi dataset.
>
> It should support MOR too; it shall just leverage on existing
> SnapshotCopier logic to
Yes, Vinoth, you're right that it is more of an exporter, which exports a
snapshot from Hudi dataset.
It should support MOR too; it shall just leverage on existing
SnapshotCopier logic to find the latest file slices.
So is it good to create a RFC for further discussion?
On Mon, Nov 11, 2019 at
What you suggest sounds more like an `Exporter` tool? I imagine you will
support MOR as well? +1 on the idea itself. It could be useful if plain
parquet snapshot was generated as a backup.
On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu
wrote:
> Hi All,
>
> The existing SnapshotCopier under Hudi
yes. sounds good. As of now, its just Kabeer.@kabeer wdyt?
@nishith Personally, timing is an issue for me, if you are willing to
drive, please go ahead! I ll try to make it if possible
On Mon, Nov 11, 2019 at 8:25 AM nishith agarwal wrote:
> Vinoth,
>
> To meet mid way, how about once in 3
Yep, you are correct that it is throwing the exception because of the
DataSourceUtils.getNestedFieldValAsString.
I can take up the work to fix this behavior if it is not intended. I'd also
like to add extra error messaging and validation because currently it is not
clear to users what the error
Vinoth,
To meet mid way, how about once in 3 weeks for Europe and other time zones
? That works fine for me. In the interest of making the meetings useful for
everyone, we can see how productive the meetings are/% attendance for the
meetings for the initial few ones, and then may be we can follow
That overlaps with my office hours. I will try to attend it in 9 PM to 10
PM PST slot only. :)
On Mon, Nov 11, 2019 at 6:07 PM Vinoth Chandar wrote:
> I can make early morning PST meetings.i.e before 6AM.
>
> On Sun, Nov 10, 2019 at 11:22 PM Pratyaksh Sharma
> wrote:
>
> > @Vinoth Chandar
Hi Brandon,
I contributed to complex ComplexKeyGenerator sometime back. I don't think
it is intended behavior. If you are getting exception is it because of
DataSourceUtils.getNestedFieldValAsString(record, recordKeyField) ? I can't
think of any other reason why it should throw exception.
I
Thanks for the quick response Balaji!
I think there is a lot here to continue with:
1. I did see that recent pull request for the delete API. I think collaborating
to support another delete API with just record key would be a great next step.
I'll begin looking into it. Additionally, the
Dear Sudha
It looks like it is going to be an early call for those in Europe or follow the
weekly minutes of the meeting email. Looking at the poll it is quite obvious
that 9pm to 10pm PST wins the choice.
Thank you so much for running the poll and reporting the stats.
Kabeer.
On Nov 8 2019,
Brandon,
Great initiative and thoughts. Thanks for writing detailed description on
what you are looking to achieve.
Here are some of my comments/thoughts:
1. HUDI-326 : There is some work that is happening in this direction.
But, we should be able to collaborate on this. Siva has opened
Thank you all for the prompt response! I realized I dint add my preferred
times.These are the times that work for me.
Mon,Tue,Thu - 9pm - 11pm PST
Mon-Thu - 5 am - 6:30 am PST
Here is the summary from responses:
- From the 11 responses received so far, 9 of 11 people (including all
Dear Sudha
Really appreciate the initiative to promptly start this thread. My preferences
are as below:
Any weekday:
10PM PST to 11PM PST OR
10AM PST TO 2PM PST
thank you
On Nov 7 2019, at 6:46 am, Pratyaksh Sharma wrote:
> Interested.
>
> Timings:
> Mon-Fri 6AM-7.30AM PST
>
> On Thu, Nov 7,
Ok, that is a valid reason.
On Thu, Nov 7, 2019 at 2:03 AM Bhavani Sudha
wrote:
> Ah okay. That is a valid concern. Dint think about admin management for
> Hive dbs.
>
> Thanks,
> Sudha
>
> On Wed, Nov 6, 2019 at 12:28 PM Balaji Varadarajan
> wrote:
>
> > I have a different opinion on this.
Interested.
Timings:
Mon-Fri 6AM-7.30AM PST
On Thu, Nov 7, 2019 at 11:33 AM Gurudatt Kulkarni
wrote:
> Interested.
>
> Mon-Thu 5AM-6:30AM PST
> Mon-Thu 9PM-10:30PM PST
>
> These timings work for me.
>
>
> On Thu, Nov 7, 2019 at 10:20 AM Gary Li wrote:
>
> > Interested.
> > Mon-Thu 8 PM-11
Interested.
Mon-Thu 5AM-6:30AM PST
Mon-Thu 9PM-10:30PM PST
These timings work for me.
On Thu, Nov 7, 2019 at 10:20 AM Gary Li wrote:
> Interested.
> Mon-Thu 8 PM-11 PM PST.
> It's very difficult to cover America, Europe, and Asia in the same meeting.
> Maybe we can have US and US two
Interested.
Mon-Thu 8 PM-11 PM PST.
It's very difficult to cover America, Europe, and Asia in the same meeting.
Maybe we can have US and US two sessions and make them biweekly?
On Wed, Nov 6, 2019 at 7:12 PM Taher Koitawala wrote:
> Hi All,
>Mon-Thu 5AM-6:30AM PST
>
Hi All,
Mon-Thu 5AM-6:30AM PST
Mon-Thu 9PM-10:30PM PST
Works for me
On Thu, Nov 7, 2019, 7:26 AM Nishith wrote:
> Following times work for me
>
> Evening : Mon-Thu, 9pm - 1am
>
> Unfortunately, can’t do mornings.
>
> Sent from my iPhone
>
> > On Nov 6, 2019, at 4:51 PM,
Following times work for me
Evening : Mon-Thu, 9pm - 1am
Unfortunately, can’t do mornings.
Sent from my iPhone
> On Nov 6, 2019, at 4:51 PM, Y. Ethan Guo wrote:
>
> I'm interested in attending each weekly meeting. My preferred times:
>
> Morning: Wed, Fri, 5AM - 7:30AM PT
> Evening: Mon -
Thanks Sudha. Interested.
Tue - Thu, 8:30PM - 10:00PM PST
Wed - Fri, 3:00AM - 4:30AM PST
Y. Ethan Guo 于2019年11月7日周四 上午8:52写道:
> I'm interested in attending each weekly meeting. My preferred times:
>
> Morning: Wed, Fri, 5AM - 7:30AM PT
> Evening: Mon - Thu, 8PM - 11PM PT
>
>
> On Wed, Nov 6,
Thanks Sudha. The following times work for me :
Mon, Tue, Thursday - 9 p.m to 12 a.m PST
Wed - 5:00 to 6:00 am and 9:30 p.m to 12 a.m PST
On Wed, Nov 6, 2019 at 12:31 PM Vinoth Chandar wrote:
> Interested.
>
> Mon-Thu 5AM-6:30AM PST
> Mon-Thu 9PM-10:30PM PST
>
>
> On Wed, Nov 6, 2019 at
Ah okay. That is a valid concern. Dint think about admin management for
Hive dbs.
Thanks,
Sudha
On Wed, Nov 6, 2019 at 12:28 PM Balaji Varadarajan
wrote:
> I have a different opinion on this. Usually, in production deployments
> (atleast whatever I am aware of), database is generally managed
Interested.
Mon-Thu 5AM-6:30AM PST
Mon-Thu 9PM-10:30PM PST
On Wed, Nov 6, 2019 at 12:28 PM Bhavani Sudha
wrote:
> Hello all,
>
> Currently the weekly sync meeting is scheduled to run on Tuesdays from 9pm
> PST to 10 pm PST. Given our users are from multiple time zones, we can try
> to see
I have a different opinion on this. Usually, in production deployments
(atleast whatever I am aware of), database is generally managed at the
org/group level. Privacy policies like ACLs are usually done at database
level and would need first level management by admins. With such a setup,
its
+1 I think we should create db if it does not exist.
On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma
wrote:
> Hi,
>
> While doing hive sync using HiveSyncTool, we first check if the target
> table exists in hive. If not, we try to create it. However in this flow, if
> the database itself does
Thanks for the detailed design write up Vinoth. I concur with the others on
option 2, default indexing as off and enable it when we have enough confidence
on stability & performance. Although, I do think practically it might be good
to have the code in place for users who might revert to an
Thanks Vinoth for proposing a clean and extendable design. The overall design
looks great. Another rollout option is to only use consolidated log index for
index lookup if latest "valid" log block has been written in new format. If
that is not the case, we can revert to scanning previous log
I vote for the second option. Also it can give time to analyze on how to
deal with backwards compatibility. I ll take a look at the RFC later
tonight and get back.
On Sun, Oct 27, 2019 at 10:24 AM Vinoth Chandar wrote:
> One issue I have some open questions myself
>
> Is it ok to assume log
One issue I have some open questions myself
Is it ok to assume log will have old data block versions, followed by new
data block versions. For e.g, if rollout new code, then revert back then
there could be an arbitrary mix of new and old data blocks. Handling this
might make design/code fairly
Great!
On Wed, Oct 23, 2019 at 10:41 PM Vinay Patil
wrote:
> Thanks a lot Vinoth for opening this jira.
>
> Will start with the initial design and share the document.
>
> Regards,
> Vinay Patil
>
>
> On Mon, Oct 21, 2019 at 9:36 PM Balaji Varadarajan
> wrote:
>
> > +1. This is a much needed
Thanks a lot Vinoth for opening this jira.
Will start with the initial design and share the document.
Regards,
Vinay Patil
On Mon, Oct 21, 2019 at 9:36 PM Balaji Varadarajan
wrote:
> +1. This is a much needed and super useful feature for a lot of folks in
> the community.
>
> Balaji.V
Thanks all for the constructive comments! Will change the name in cWiki
On Tue, Oct 22, 2019 at 6:27 PM vino yang wrote:
> agree Vinoth, +1
>
> Vinoth Chandar 于2019年10月22日周二 下午8:31写道:
>
> > Good point. Even for HIP we initially had gdoc as the starting point and
> > once ratified we planned to
agree Vinoth, +1
Vinoth Chandar 于2019年10月22日周二 下午8:31写道:
> Good point. Even for HIP we initially had gdoc as the starting point and
> once ratified we planned to move it to cwiki. But practical issues like
> retaining formatting, porting over diagrams, version history between two
> things made
Good point. Even for HIP we initially had gdoc as the starting point and
once ratified we planned to move it to cwiki. But practical issues like
retaining formatting, porting over diagrams, version history between two
things made it cumbersome. So IMO single place is actually good. Wdyt?
On Tue,
+1 agree Thomas:
For some general ideas, we can write gdoc and open a "DISCUSS" ML thread.
Best,
Vino
Thomas Weise 于2019年10月22日周二 下午12:45写道:
> Just in case that wasn't considered: Not every document needs to be on
> cwiki, it is perfectly fine to write up ideas that are not a formal "HIP"
>
Just in case that wasn't considered: Not every document needs to be on
cwiki, it is perfectly fine to write up ideas that are not a formal "HIP"
in gdocs or similar.
Thomas
On Mon, Oct 21, 2019 at 9:40 PM Nishith wrote:
> +1
>
> Encourages folks to read and write designs/ideas.
>
> Sent from
+1
Encourages folks to read and write designs/ideas.
Sent from my iPhone
> On Oct 21, 2019, at 6:30 PM, leesf wrote:
>
> +1
>
> Best,
> Leesf
>
> 于2019年10月22日周二 上午3:40写道:
>
>> +1
>>
>> Balaji.V On Monday, October 21, 2019, 11:38:01 AM PDT, Y. Ethan Guo
>> wrote:
>>
>> +1 on RFC.
+1
Best,
Leesf
于2019年10月22日周二 上午3:40写道:
> +1
>
> Balaji.V On Monday, October 21, 2019, 11:38:01 AM PDT, Y. Ethan Guo
> wrote:
>
> +1 on RFC. It's good to have a few pages of RFC to get a quick look of an
> idea. It doesn't have to be a full standard like some IETF RFCs.
>
> On Mon,
+1
Balaji.V On Monday, October 21, 2019, 11:38:01 AM PDT, Y. Ethan Guo
wrote:
+1 on RFC. It's good to have a few pages of RFC to get a quick look of an
idea. It doesn't have to be a full standard like some IETF RFCs.
On Mon, Oct 21, 2019 at 5:31 AM Taher Koitawala wrote:
> Agree
+1. This is a much needed and super useful feature for a lot of folks in the
community.
Balaji.V On Monday, October 21, 2019, 7:08:30 AM PDT, Vinoth Chandar
wrote:
https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get
this into the next release as much as possible
https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get
this into the next release as much as possible :)
On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar wrote:
> No problem. Having kinesis will get us a compelling story for cloud data
> ingestion
>
> On Thu, Oct 17, 2019 at
Agree Vinoth +1
Regards,
Taher Koitawala
On Mon, Oct 21, 2019, 5:49 PM Bhavani Sudha wrote:
> +1 on RFC. Makes sense to me.
>
>
> On Sun, Oct 20, 2019 at 8:29 PM Vinoth Chandar wrote:
>
> > Someone asked me this and made me thinking about it. While HIP process
> > covers concrete proposals to
+1 on RFC. Makes sense to me.
On Sun, Oct 20, 2019 at 8:29 PM Vinoth Chandar wrote:
> Someone asked me this and made me thinking about it. While HIP process
> covers concrete proposals to Hudi, sometimes we may need to just write up
> some ideas and solicit comments (e.g HudiLink
>
>
No problem. Having kinesis will get us a compelling story for cloud data
ingestion
On Thu, Oct 17, 2019 at 8:38 PM Vinay Patil wrote:
> Hi Vinoth,
>
> Sry to miss these, busy with on-call issues for the last couple of weeks.
>
> Will create a ticket for tracking this , I will be actively
Just wanted to bump this thread and see if anyone is actively working on
kinesis support
On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar wrote:
> I think we are on the same page. Thanks for clarifying!
> Note on implementation: it would be great if we can reuse the spark
> streaming connector
https://issues.apache.org/jira/browse/HUDI-295 now tracks this
On Thu, Oct 3, 2019 at 5:45 PM leesf wrote:
> +1 on cleanup.
>
> Best,
> Leesf
>
> Bhavani Sudha Saktheeswaran 于2019年10月4日周五
> 上午5:53写道:
>
> > +1 . Thats a good idea.
> >
> >
> >
> > On Thu, Oct 3, 2019 at 2:32 PM
+1 on cleanup.
Best,
Leesf
Bhavani Sudha Saktheeswaran 于2019年10月4日周五
上午5:53写道:
> +1 . Thats a good idea.
>
>
>
> On Thu, Oct 3, 2019 at 2:32 PM vbal...@apache.org
> wrote:
>
> >
> > +1 on both cleanup. This would keep the git history clean and consistent
> > with contribution.
> > Balaji.V
+1 . Thats a good idea.
On Thu, Oct 3, 2019 at 2:32 PM vbal...@apache.org
wrote:
>
> +1 on both cleanup. This would keep the git history clean and consistent
> with contribution.
> Balaji.VOn Thursday, October 3, 2019, 09:53:46 AM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>
+1 on both cleanup. This would keep the git history clean and consistent with
contribution.
Balaji.VOn Thursday, October 3, 2019, 09:53:46 AM PDT, Vinoth Chandar
wrote:
Folks,
As we iterate across the RCs, we have added and removed to the
NOTICE/LICENSE files a lot. Does anyone feel
Based on some conversations I had with Flink folks including Hudi's very
own mentor Thomas, it seems future proof to look into supporting the Flink
streaming APIs. The batch apis IIUC will move towards converging with
Streaming APIs, which matches Hudi's model anyway
>From Hudi's perspective,
Hi Vinoth,
IMHO we should stick to Spark for micro batching for 2 reasons. 1:
Easy out use 2: Performance. Flink batch is not as fast as Spark. Also the
rich library of functions and the ease of integration which Spark has with
Hive etc that is not there in Flink batch.
Regards,
Taher
Hi Vino,
Agree with your suggestion. We all know when thought Flink is
streaming we can control how files get rolled out through checkpointing
configurations. Bad config and small files get rolled out. Good config and
files are properly sized.
Also I understand the concern of
Hi A simple example. In Hudi Project, you can find many code snippet like
`spark.read().format().load()` The load method can pass any path, especially
DFS paths. While if we only want to use Flink streaming, there is not a good
way to read HDFS now. In addition, we.also need to consider other
Hi Vino,
This is not a design for Hudi on Flink. This was simply a mock up of
tagLocations() spark cache to Flink state as Vinoth wanted to see.
As per the Flink batch and Streaming I am well aware of the batch and
Stream unification efforts of Flink. However I think that is still on
Hi Taher, As I mentioned in the previous mail. Things may not be too easy by
just using Flink state API. Copied here "Hudi can connect with many different
Source/Sinks. Some file-based reads are not appropriate for Flink Streaming."
Although, unify Batch and Streaming is Flink's goal. But, it
Hi All,
Sample code to see how records tagging will be handled in
Flink is posted on [1]. The main class to run the same is MockHudi.java
with a sample path for checkpointing.
As of now this is just a sample to know we should ke caching in Flink
states with bare minimum configs.
Sg, lets capture these discussions in the JIRA (link to the discussion
thread should suffice) and we can revisit one by one..
On Mon, Sep 23, 2019 at 8:31 PM Taher Koitawala wrote:
> Sure Vinoth, I think we need to try this out and check how it fits together
> and how deployable it is.
>
> On
Sure Vinoth, I think we need to try this out and check how it fits together
and how deployable it is.
On Sun, Sep 22, 2019, 7:01 PM Vinoth Chandar wrote:
> See a lot of Spark Streaming receiver based approach code there, which
> makes me a bit worried about scalability.
>
> Nonetheless. API
+1 For now we can keep this in hudi-utilities itself IMO.
As for the connector or Deltastreamer Source to be specific, should we just
integrate to Kinesis? If DynamoDB will pump its changes into Kinesis
anyway, why should we aware of DynanoDB directly?
Also we may need to rethink how we are going
+1 to adding more connectors to DeltStreamer and making them as much
pluggable modules as possible like Vino Yang suggested.
On Sat, Sep 21, 2019 at 7:12 PM vino yang wrote:
> + 1 to introduce these connectors. It's nice to see that Hudi's ecosystem
> is growing. As Hudi connects to more and
Hi Taher,
I agree with this , if the state is becoming too large we should have an
option of storing it in external state like File System or RocksDb.
@Vinoth Chandar can the state of HoodieBloomIndex go
beyond 10-15 GB
Regards,
Vinay Patil
On Fri, Sep 20, 2019 at 11:37 AM Taher Koitawala
Hi Vinoth,
Nifi has the capability to pass data to a custom spark job.
However that is done through a StreamingContext, not sure if we can build
something on this. I'm trying to wrap my head around how to fit the
StreamingContext in our existing code.
Here is an example:
Hi Taher,
Basically this can be proposal to support Kinesis and DynamoDb stream
support can be enabled by reusing this source code.
Flink has provided support for DynamoDb Streams by reusing Kinesis Streams
classes.
Regards,
Vinay Patil
On Sat, Sep 21, 2019 at 4:26 PM Taher Koitawala wrote:
That would be a great addition Vinay. How about adding Kinesis as well?
Regards,
Taher Koitawala
On Sat, Sep 21, 2019, 4:20 PM Vinay Patil wrote:
> Hi Team,
>
> The DynamoDb streams contains the CDC data when enabled on a DynamoDb
> table, we can add a source for DeltaStreamer which will
Hey Guys, Any thoughts on the above idea? To handle HoodieBloomIndex with
HeapState, RocksDBState and FsState but on Spark.
On Tue, Sep 17, 2019 at 1:41 PM Taher Koitawala wrote:
> Hi Vinoth,
>Having seen the doc and code. I understand the
> HoodieBloomIndex mainly caches key
I think we will have to make a Nifi Processor. The Nifi processor should
host all what do with Spark to write data. We will have to scope out the
work on this and compactions.
Regards,
Taher Koitawala
On Wed, Sep 18, 2019, 8:30 PM Suneel Marthi wrote:
> Adding Nifi dev@ to this thread.
>
>
>
Adding Nifi dev@ to this thread.
On Wed, Sep 18, 2019 at 10:57 AM Vinoth Chandar wrote:
> Not too familiar wth Nifi myself. Would this still target an use-case like
> what pratyaksh mentioned?
> For delta streamer specifically, we are moving more and more towards
> continuous mode, where
>
Not too familiar wth Nifi myself. Would this still target an use-case like what
pratyaksh mentioned?
For delta streamer specifically, we are moving more and more towards continuous
mode, where
Hudi writing and compaction are amanged by a single long running spark
application.
Would Nifi
That's another way of doing things. I want to know if someone wrote
something like PutParquet. Which directly can write data to Hudi. AFAIK I
don't think anyone has.
That will really be powerful.
On Wed, Sep 18, 2019, 1:37 PM Pratyaksh Sharma
wrote:
> Hi Taher,
>
> In the initial phase of our
Hi Taher,
In the initial phase of our CDC pipeline, we were using Hudi with Nifi.
Nifi was being used to read Binlog file of mysql and to push that data to
some Kafka topic. This topic was then getting consumed by DeltaStreamer. So
Nifi was indirectly involved in that flow.
On Wed, Sep 18, 2019
Hi Vinoth,
Having seen the doc and code. I understand the
HoodieBloomIndex mainly caches key and partition path. Can we address how
Flink does it? Like, have HeapState where the user chooses to cache the
Index on heap, RockDBState where indexes are written to RocksDB and finally
Alright then. Happy to take the lead here. But please give me a week or so,
to finish up the spark bundling and other jar issues.. Too much context
switching :)
On Mon, Sep 16, 2019 at 6:57 PM vino yang wrote:
> Hi guys,
>
> Currently, I am busy with HUDI-203[1] and other things.
>
> I agree
Hi guys,
Currently, I am busy with HUDI-203[1] and other things.
I agree with Vinoth that we should try to find a new solution to decouple
the dependency with the Spark RDD cache.
It's an excellent way to start this big work.
[1]: https://issues.apache.org/jira/browse/HUDI-203
+1 This is a pretty large undertaking. While the community is getting their
hands dirty and ramping up on Hudi internals, it would be productive if Vinoth
shepherds this
Balaji.VOn Monday, September 16, 2019, 11:30:44 AM PDT, Vinoth Chandar
wrote:
sg. :)
I will wait for others on
sg. :)
I will wait for others on this thread as well to chime in.
On Mon, Sep 16, 2019 at 11:27 AM Taher Koitawala wrote:
> Vinoth, I think right now given your experience with the project you should
> be scoping out what needs to be done to take us there. So +1 for giving you
> more work :)
>
Vinoth, I think right now given your experience with the project you should
be scoping out what needs to be done to take us there. So +1 for giving you
more work :)
We want to reach a point where we can start scoping out addition of Flink
and Beam components within. Then I think will tremendous
I still feel the key thing here is reimplementing HoodieBloomIndex without
needing spark caching.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103093742#Design(non-global)
documents the spark DAG in detail.
If everyone feels, it's best for me to scope the work out, then happy
It should work like any other source and none of the others are aware if
whether deltaStreamer is running in continuous mode or not.
Simplistically, it just needs a config to denote an incremental field - say
`_last_modified_at` and we use that as a checkpoint to tail that table
by including a
Guys I think we are slowing down on this again. We need to start planning
small small tasks towards this VC please can you help fast track this?
Regards,
Taher Koitawala
On Thu, Aug 15, 2019, 10:07 AM Vinoth Chandar wrote:
> Look forward to the analysis. A key class to read would be
>
Will this be the same implementation as session.read.jdbc("") and then call
this code continuously like how we are running HUDI in continuous mode?
On Mon, Sep 16, 2019 at 9:09 PM Vinoth Chandar wrote:
> Thanks, Taher! Any takers for driving this? This is something I would be
> very interested
Thanks, Taher! Any takers for driving this? This is something I would be
very interested in getting involved with. Dont have the bandwidth atm :/
On Sun, Sep 15, 2019 at 11:15 PM Taher Koitawala wrote:
> Thank you all for your support. JIRA filed at
>
Thank you all for your support. JIRA filed at
https://issues.apache.org/jira/browse/HUDI-251
Regards,
Taher Koitawala
On Mon, Sep 16, 2019 at 11:34 AM Taher Koitawala wrote:
> Since everyone is fully onboard. I am creating a JIRA to track this.
>
> On Sun, Sep 15, 2019 at 9:47 AM
Since everyone is fully onboard. I am creating a JIRA to track this.
On Sun, Sep 15, 2019 at 9:47 AM vbal...@apache.org
wrote:
>
> +1. Agree with everyone's point. Go for it Taher !!
> Balaji.VOn Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani
> Sudha Saktheeswaran wrote:
>
> +1 I
+1. Agree with everyone's point. Go for it Taher !!
Balaji.VOn Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani Sudha
Saktheeswaran wrote:
+1 I think adding new sources to DeltaStreamer is really valuable.
Thanks,
Sudha
On Sat, Sep 14, 2019 at 7:52 AM vino yang wrote:
> Hi
+1 I think adding new sources to DeltaStreamer is really valuable.
Thanks,
Sudha
On Sat, Sep 14, 2019 at 7:52 AM vino yang wrote:
> Hi Taher,
>
> IMO, it's a good supplement to Hudi.
>
> So +1 from my side.
>
> Vinoth Chandar 于2019年9月14日周六 下午10:23写道:
>
> > Hi Taher,
> >
> > I am fully
Hi Taher,
I am fully onboard on this. This is such a frequently asked question and having
it all doable with a simple DeltaStreamer command would be really powerful.
+1
- Vinoth
On 2019/09/14 05:51:05, Taher Koitawala wrote:
> Hi All,
> Currently, we are trying to pull data
Thanks for driving this ! :)
On Sat, Aug 24, 2019 at 4:39 PM vino yang wrote:
> Hi guys,
>
> Glad to see that Hudi's doc has supported Jekyll-multiple-languages plugin.
> It's the precondition to contribute to the translation of Chinese docs.
> Thanks to Vinoth.
>
> Now, welcome to start our
Hi guys,
Glad to see that Hudi's doc has supported Jekyll-multiple-languages plugin.
It's the precondition to contribute to the translation of Chinese docs.
Thanks to Vinoth.
Now, welcome to start our contribution. Please pay attention, if you want
to contribute to Chinese documents, you should
+1, good idea
vbal...@apache.org 于2019年8月23日周五 上午3:46写道:
>
> +1, I like the idea. It would also make the whole page modular.
> Balaji.VOn Thursday, August 22, 2019, 12:40:11 PM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
> +1 I was thinking along similar lines for the demo page
>
>
+1, I like the idea. It would also make the whole page modular.
Balaji.VOn Thursday, August 22, 2019, 12:40:11 PM PDT, Vinoth Chandar
wrote:
+1 I was thinking along similar lines for the demo page
Our doc theme should already support this
+1 I was thinking along similar lines for the demo page
Our doc theme should already support this
https://idratherbewriting.com/documentation-theme-jekyll/mydoc_navtabs.html
On Thu, Aug 22, 2019 at 12:04 PM Bhavani Sudha Saktheeswaran
wrote:
> Hi all,
>
> I was going through the
901 - 1000 of 1077 matches
Mail list logo