[DISCUSS] split source of kafka partition by count

2023-03-30 Thread
Hi team, for the kafka source, when pulling data from kafka, the default parallelism is the number of kafka partitions. There are cases: Pulling large amount of data from kafka (eg. maxEvents=1), but the # of kafka partition is not enough, the procedure of the pulling will cost too much

Re:Re: [DISCUSS] split source of kafka partition by count

2023-04-03 Thread
does this affect ordering? Can you think about how/if Hudi write >operations can handle potentially out-of-order events being read out? >It feels like we can add a JIRA for this anyway. > > > >On Thu, Mar 30, 2023 at 10:02 PM 孔维 <18701146...@163.com> wrote: > >> Hi te

Re:Re: Re: [DISCUSS] split source of kafka partition by count

2023-04-06 Thread
y Kafka >> specific logic or force use of special payloads etc. thoughts? >> >> I assigned the jira to you and also made you a contributor. So in future, >> you can self-assign. >> >> On Mon, Apr 3, 2023 at 7:08 PM 孔维 <18701146...@163.com> wrote: >>

[DISCUSS] Should we support a service to manage all deltastreamer jobs?

2023-06-14 Thread
Hi, team, Background: More and more hudi accesses use deltastreamer, resulting in a large number of deltastreamer jobs that need to be managed. In our company, we also manage a large number of deltastreamer jobs by ourselves, and there is a lot of operation and maintenance management and

[DISCUSS] should deltastreamer support configuration hot update?

2023-05-22 Thread
Hi team, I am thinking about whether it is necessary to add the feature of configuration hot update to deltastreamer. In our company, hudi is used as a platform. We provide deltastreamer (run in continuous mode) to write to a large number of sources (including mysql & tidb) as a long time

Re:Re: [DISCUSS] should deltastreamer support configuration hot update?

2023-05-24 Thread
applicable. If not, proceed as usual. >Should not be hard to add the support. > > > > >On Mon, 22 May 2023 at 00:05, 孔维 <18701146...@163.com> wrote: > >> Hi team, >> >> I am thinking about whether it is necessary to add the feature of >> configuration ho

Re:Re: [Discussion] Support EventTimeBasedCompactionStrategy based on merging some log files

2023-12-04 Thread
, "Danny Chan" wrote: >The general direction looks good, for functionality that only compact >partial log files, does the existing log compaction match your needs? > >https://github.com/apache/hudi/blob/master/rfc/rfc-48/rfc-48.md > >Best, >Danny > >孔维 <187

[Discussion] Support EventTimeBasedCompactionStrategy based on merging some log files

2023-11-27 Thread
Background: 1. The data arrives roughly in event time order 2. When some users read the hudi table, they may not concern with the immediate full data, but the full data before time T (eg. daily snapshot data) 3. Reading the RT table will be more time-consuming than reading the COW table (RO