Hi team, for the kafka source, when pulling data from kafka, the default
parallelism is the number of kafka partitions.
There are cases:
Pulling large amount of data from kafka (eg. maxEvents=1), but the # of
kafka partition is not enough, the procedure of the pulling will cost too much
does this affect ordering? Can you think about how/if Hudi write
>operations can handle potentially out-of-order events being read out?
>It feels like we can add a JIRA for this anyway.
>
>
>
>On Thu, Mar 30, 2023 at 10:02 PM 孔维 <18701146...@163.com> wrote:
>
>> Hi te
y Kafka
>> specific logic or force use of special payloads etc. thoughts?
>>
>> I assigned the jira to you and also made you a contributor. So in future,
>> you can self-assign.
>>
>> On Mon, Apr 3, 2023 at 7:08 PM 孔维 <18701146...@163.com> wrote:
>>
Hi, team,
Background:
More and more hudi accesses use deltastreamer, resulting in a large number of
deltastreamer jobs that need to be managed. In our company, we also manage a
large number of deltastreamer jobs by ourselves, and there is a lot of
operation and maintenance management and
Hi team,
I am thinking about whether it is necessary to add the feature of configuration
hot update to deltastreamer.
In our company, hudi is used as a platform. We provide deltastreamer (run in
continuous mode) to write to a large number of sources (including mysql & tidb)
as a long time
applicable. If not, proceed as usual.
>Should not be hard to add the support.
>
>
>
>
>On Mon, 22 May 2023 at 00:05, 孔维 <18701146...@163.com> wrote:
>
>> Hi team,
>>
>> I am thinking about whether it is necessary to add the feature of
>> configuration ho
, "Danny Chan" wrote:
>The general direction looks good, for functionality that only compact
>partial log files, does the existing log compaction match your needs?
>
>https://github.com/apache/hudi/blob/master/rfc/rfc-48/rfc-48.md
>
>Best,
>Danny
>
>孔维 <187
Background:
1. The data arrives roughly in event time order
2. When some users read the hudi table, they may not concern with the immediate
full data, but the full data before time T (eg. daily snapshot data)
3. Reading the RT table will be more time-consuming than reading the COW table
(RO