Re: RichCdcSinkBuilder with Iceberg catalog?

Kyle Weller Fri, 19 Jul 2024 11:11:51 -0700

I wonder if Apache XTable <https://xtable.apache.org/> is also a
viable option to consider? Data could still be written and stored natively
as Paimon and asynchronously generate the iceberg manifest files and sync
to an Iceberg catalog. It is working great between Iceberg, Hudi, Delta
today in production. There may be some code in that project to leverage or
adding paimon XTable interface would auto unlock omni directional
translation to all 4 table formats versus a 1 by 1 integration.


On Fri, Jul 19, 2024 at 8:41 AM Andrew Otto <o...@wikimedia.org> wrote:

> > > Another approach is to create a snapshot compatible way for Paimon to 
> > > generate
> Iceberg, which is what we are working on.
> Hi, just checking in!  How is this going? Thanks!
>
> On Mon, Jun 10, 2024 at 9:17 AM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Awesome, I look forward to it!  Thank you!
>>
>> On Mon, Jun 10, 2024 at 2:35 AM Jingsong Li <jingsongl...@gmail.com>
>> wrote:
>>
>>> We are developing prototype in our internal.
>>>
>>> It takes about 2 to 3 months.
>>>
>>> Andrew Otto <o...@wikimedia.org>于2024年5月29日 周三21:46写道：
>>>
>>>> > Another approach is to create a snapshot compatible way for Paimon
>>>> to generate Iceberg, which is what we are working on.
>>>>
>>>> Oh!  Very interesting.  Can you say more? And/or do you have links to
>>>> Jira or anything?
>>>>
>>>> Thanks for your response! :)
>>>>
>>>>
>>>>
>>>> On Wed, May 29, 2024 at 7:41 AM Jingsong Li <jingsongl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> It is difficult to move this mechanism to the Iceberg sink. The table
>>>>> structure change in Iceberg's design requires generating a new
>>>>> snapshot, which poses significant challenges to schema evolution.
>>>>>
>>>>> Another approach is to create a snapshot compatible way for Paimon to
>>>>> generate Iceberg, which is what we are working on.
>>>>>
>>>>> Best,
>>>>> Jingsong
>>>>>
>>>>> On Fri, May 24, 2024 at 8:11 PM Andrew Otto <o...@wikimedia.org>
>>>>> wrote:
>>>>> >
>>>>> > Hi!
>>>>> >
>>>>> > How coupled to Paimon catalogs and tables is the cdc part of
>>>>> Paimon?  RichCdcMultiplexRecord and related code seem incredibly useful
>>>>> even outside of the context of the Paimon table format.
>>>>> >
>>>>> > I'm asking because the database sync action feature is amazing.  At
>>>>> the Wikimedia Foundation, we are on an all-in journey with Iceberg.  I'm
>>>>> wondering how hard it would be to extract the CDC logic from Paimon and
>>>>> abstract the Sink bits.
>>>>> >
>>>>> > Could the table/database sync with schema evolution (without Flink
>>>>> job restarts!) potentially work with the Iceberg sink?
>>>>> >
>>>>> > Thanks!
>>>>> > -Andrew Otto
>>>>> >  Wikimedia Foundation
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>

Re: RichCdcSinkBuilder with Iceberg catalog?

Reply via email to