Hi Pratyaksh,
The partitioning format is pluggable in Hudi.
1. For Hudi Writing, you can simply use one of the several implementations of
org.apache.hudi.KeyGenerator or write your own implementation to control
partition path format. You can configure partition-path using
>> Currently Spark Streaming micro batching fits well with Hudi, since it
amortizes the cost of indexing, workload profiling etc. 1 spark micro batch
= 1 hudi commit
With the per-record model in Flink, I am not sure how useful it will be to
support hudi.. for e.g, 1 input record cannot be 1 hudi
Hi guys,
Thanks for agreeing with this proposal.
To vinu:
> I also suggest we add a new component in JIRA with a few volunteers to
help
review PRs that come in this area?
+1, yes, we really need a new component in JIRA.
Best,
Vino
Y. Ethan Guo 于2019年8月14日周三 上午2:22写道:
> +1 I can also help
Hi Vinoth,
I have commented my mail id on the mentioned github issue.
Sure, I will update the documentation.
On Tue, Aug 13, 2019 at 11:41 PM Vinoth Chandar wrote:
> Hi Pratyaksh,
>
> We have pre-approved anyone with @apache.org. email and a few others..
> Typically,
+1 I can also help on the Chinese version of the docs.
On Tue, Aug 13, 2019 at 11:08 AM Vinoth Chandar wrote:
> +1 Thanks for starting this initiative, Vino.
>
> I also suggest we add a new component in JIRA with a few volunteers to help
> review PRs that come in this area?
>
> On Tue, Aug 13,
Done!
On Tue, Aug 13, 2019 at 6:44 AM leesf wrote:
> Hi,
>
> I want to contribute to Apache Hudi.
> Would you please give me the contributor permission?
> My JIRA ID is xleesf.
>
> leesf 于2019年8月13日周二 下午9:42写道:
>
> > Hi,
> >
> > I want to contribute to Apache Calcite.
> > Would you please give
Hi Pratyaksh,
We have pre-approved anyone with @apache.org. email and a few others..
Typically, https://github.com/apache/incubator-hudi/issues/143 is used for
reporting the email to be added.. Can you provide your email there and we
will add you in
P.S: I realize there is a documentation gap on
+1 Thanks for starting this initiative, Vino.
I also suggest we add a new component in JIRA with a few volunteers to help
review PRs that come in this area?
On Tue, Aug 13, 2019 at 9:02 AM Gary Li wrote:
> +1 This is a great idea. I think there are also some room for improvement
> for the
+1 This is a great idea. I think there are also some room for improvement
for the English version as well.
Some of my colleagues are very interested in Hudi but they found the
documentation was a little bit challenging to understand. Same for me when
I first started to work on Hudi as well.
I am
Currently, Hudi has not won much attention in China, partly because of the
lack of Chinese resources and documents. I personally think we should add
more documentation and develop multilingualism. Just as Flink has official
Chinese documentation[1], this can quickly let Chinese developers know
Leave a comment here:
https://github.com/apache/incubator-hudi/issues/143
And the team will get back to you shortly!
Shinray K.
On 8/13/19, 2:25 AM, "Pratyaksh Sharma" wrote:
This email is from an external sender.
Hi,
I was going through the pre-requisites here
Thanks! Def glad to get this done. Credits should go to balaji :)
My 2c is that its okay for classes to remain how they are. There is
diminishing returns to do that and “Hoodie” is the suggested pronunciation
:) anyway.
On Sun, Aug 11, 2019 at 6:23 PM vino yang wrote:
> Hi Vinoth,
>
> Thanks
Hi,
I want to contribute to Apache Hudi.
Would you please give me the contributor permission?
My JIRA ID is xleesf.
leesf 于2019年8月13日周二 下午9:42写道:
> Hi,
>
> I want to contribute to Apache Calcite.
> Would you please give me the contributor permission?
> My JIRA ID is xleesf.
>
Hi,
I want to contribute to Apache Calcite.
Would you please give me the contributor permission?
My JIRA ID is xleesf.
Hi Nick and Taher,
I just want to answer Nishith's question. Reference his old description
here:
> You can do a parallel investigation while we are deciding on the module
structure. You could be looking at all the patterns in Hudi's Spark APIs
usage (RDD/DataSource/SparkContext) and see if such
Hi,
I have been working on Hudi for sometime and have an improvement suggestion.
When we build a CDC pipeline, generally the field used for partitioning is date
(created_at), and the general format of created_at is -MM-dd HH:mm:ss.S. If
we have this field formatted to /MM/dd, then
Hi Vino,
According to what I've seen Hudi has a lot of spark component flowing
throwing it. Like Taskcontexts, JavaSparkContexts etc. The main classes I
guess we should focus upon is HoodieTable and Hoodie write clients.
Also Vino, I don't think we should be providing Flink dataset
Hi all,
After doing some research, let me share my information:
- Limitation of computing engine capabilities: Hudi uses Spark's
RDD#persist, and Flink currently has no API to cache datasets. Maybe we can
only choose to use external storage or do not use cache? For the use of
other
18 matches
Mail list logo