Hi Sean,

So you mean if I use those file formats it will do the work of CDC
automatically or I would have to handle it via code ?

Hi Mich,

Not sure if I understood you. Let me try to explain my scenario. Suppose
there is a Id "1" which is inserted today, so I transformed and ingested
it. Now suppose if this user id is deleted from the source itself. Then how
can I delete it in my transformed db
?



On Thu, 27 Jan 2022, 22:44 Sean Owen, <sro...@gmail.com> wrote:

> This is what storage engines like Delta, Hudi, Iceberg are for. No need to
> manage it manually or use a DBMS. These formats allow deletes, upserts, etc
> of data, using Spark, on cloud storage.
>
> On Thu, Jan 27, 2022 at 10:56 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Where ETL data is stored?
>>
>>
>>
>> *But now the main problem is when the record at the source is deleted, it
>> should be deleted in my final transformed record too.*
>>
>>
>> If your final sync (storage) is data warehouse, it should be soft flagged
>> with op_type (Insert/Update/Delete) and op_time (timestamp).
>>
>>
>>
>> HTH
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 27 Jan 2022 at 15:48, Sid Kal <flinkbyhe...@gmail.com> wrote:
>>
>>> I am using Spark incremental approach for bringing the latest data
>>> everyday. Everything works fine.
>>>
>>> But now the main problem is when the record at the source is deleted, it
>>> should be deleted in my final transformed record too.
>>>
>>> How do I capture such changes and change my table too ?
>>>
>>> Best regards,
>>> Sid
>>>
>>>

Reply via email to