Re: [DISCUSS] Hudi data TTL

2022-10-18 Thread Jian Feng
> > >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang > >> wrote: > >> > >>> Hi all, > >>> > >>> Do we have plan to integrate data TTL into HUDI, so we don't have to > >>> schedule a offline spark job to delete outdated data, just set a TTL > >>> config, then writer or some offline service will delete old data as > >>> expected. > >>> > >> > > -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

Re: [DISCUSS] New RFC to support Lock-free concurrency control on Merge-on-read tables

2022-03-24 Thread Jian Feng
n make sure they don’t write data to the > same > > log > >file (plan to create multiple marker files to achieve this). And with > > log > >merge API(preCombine logic in Payload class), data in log files can be > > read > >properly > >- > > > >Since hudi already has an index type like Bucket index which can map > >key-bucket in a consistent way. Data duplicates can be eliminated > > > > > > Thanks, > > Jian Feng > > > -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

Re: [DISCUSS] Trino Plugin for Hudi

2021-10-20 Thread Jian Feng
odb.io/blog/2020/08/04/prestodb-and-hudi > > [3] https://github.com/trinodb/trino/pull/9641 > > [4] > > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution > > [5] > > > https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver > > [6] https://github.com/codope/trino/tree/hudi-plugin > > [7] https://trino.io/docs/current/develop/connectors.html > > > -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

2021-10-17 Thread Jian Feng
at 12:50 AM Vinoth Chandar wrote: > Yeah all the rate limiting code in HBaseIndex is working around for these > large bulk writes. > > On Tue, Oct 5, 2021 at 11:16 AM Jian Feng wrote: > > > actually I met this problem when bootstrap a huge table,after changed > >

Re: [Phishing Risk] [External] [Delta Streamer] file name mismatch with meta when compaction running

2021-10-06 Thread Jian Feng
ay > provide more information. Great thanks to the author. > > > https://mp.weixin.qq.com/s?__biz=MzIyMzQ0NjA0MQ===2247484306=1=1d853469159a600d82050c17e6a2a075=e81f56e4df68dff2da417109c4a971aef54f056bc0519558c58e23fe60b90dc6e4f8d7e92774=1688466117=zh_CN#rd > > On Wed, Oct 6, 20

[Delta Streamer] file name mismatch with meta when compaction running

2021-10-05 Thread Jian Feng
tried recreate table , it happens again -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

2021-10-05 Thread Jian Feng
n of > the > > whole hudi project. > > > > On Mon, Oct 4, 2021, 11:29 PM wrote: > > when I bootstrape a huge hbase index table, I found all keys have a > prefix > > 'itemid:', then it caused data skew, there are 100 region servers in > hbase > > but only one was handle datas Is there any way to avoid this issue on the > > Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure > > > -- Full jian | Mobile Address

is there solution to solve hbase data screw issue

2021-10-04 Thread Jian Feng
when I bootstrape a huge hbase index table, I found all keys have a prefix 'itemid:', then it caused data skew, there are 100 region servers in hbase but only one was handle datas Is there any way to avoid this issue on the Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure

How to read hudi files with Mapreduce?

2021-08-11 Thread Jian Feng
Hi all, anyone can give me a sample? -- FengJian Data Infrastructure Team Mobile +65 90388153 Address 5 Science Park Drive, Shopee Building, Singapore 118265

ingetst avro nested array field error occur

2021-07-22 Thread Jian Feng
anyone can help to see this issue ? https://github.com/apache/hudi/issues/3327 when ingest data into hudi table ,I found if the avro schema has a array field with nested array field and has no other fields, error happens but if I add a dummy field , ingestion works fine -- FengJian Data

what's different between Append only and insert in Flink stream?

2021-07-10 Thread Jian Feng
I saw a pr here https://github.com/apache/hudi/pull/3252 -- FengJian Data Infrastructure Team Mobile +65 90388153 Address 5 Science Park Drive, Shopee Building, Singapore 118265