Re: Delta with intelligent upsett

2019-11-02 Thread Burak Yavuz
You can just add the target partitioning filter to your MERGE or UPDATE
condition, e.g.

MERGE INTO target
USING source
ON target.key = source.key AND target.year = year(current_date())
...

Best,
Burak

On Thu, Oct 31, 2019, 10:15 PM ayan guha  wrote:

>
> Hi
>
> we have a scenario where we have a large table  ie 5-6B records. The table
> is repository of data from past N years. It is possible that some updates
> take place on the data and thus er are using Delta table.
>
> As part of the business process we know updates can happen only within M
> years of past records where M is much smaller than N. Eg the table can hold
> 20 yrs of data but we know updates can happen only for last year not before
> that.
>
> Is there some way to indicate this additional intelligence to Delta so it
> can look into only last years data while running a merge or update? It
> seems to be an obvious performance booster.
>
> Any thoughts?
> --
> Best Regards,
> Ayan Guha
> --
> Best Regards,
> Ayan Guha
>


Re: Delta with intelligent upsett

2019-11-01 Thread Roland Johann
If the dataset contains a column like changed_at/created_at you can use this as 
watermark and filter out rows that have changed_at/created_at before the 
watermark. 

Best Regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann



> Am 01.11.2019 um 07:52 schrieb Gourav Sengupta :
> 
> should not a where clause on the partition field help with that? I am 
> obviously missing something in the question.
> 
> Regards,
> Gourav
> 
> On Thu, Oct 31, 2019 at 9:15 PM ayan guha  > wrote:
> 
> Hi
> 
> we have a scenario where we have a large table  ie 5-6B records. The table is 
> repository of data from past N years. It is possible that some updates take 
> place on the data and thus er are using Delta table. 
> 
> As part of the business process we know updates can happen only within M 
> years of past records where M is much smaller than N. Eg the table can hold 
> 20 yrs of data but we know updates can happen only for last year not before 
> that. 
> 
> Is there some way to indicate this additional intelligence to Delta so it can 
> look into only last years data while running a merge or update? It seems to 
> be an obvious performance booster. 
> 
> Any thoughts?
> -- 
> Best Regards,
> Ayan Guha
> -- 
> Best Regards,
> Ayan Guha



Re: Delta with intelligent upsett

2019-11-01 Thread Gourav Sengupta
should not a where clause on the partition field help with that? I am
obviously missing something in the question.

Regards,
Gourav

On Thu, Oct 31, 2019 at 9:15 PM ayan guha  wrote:

>
> Hi
>
> we have a scenario where we have a large table  ie 5-6B records. The table
> is repository of data from past N years. It is possible that some updates
> take place on the data and thus er are using Delta table.
>
> As part of the business process we know updates can happen only within M
> years of past records where M is much smaller than N. Eg the table can hold
> 20 yrs of data but we know updates can happen only for last year not before
> that.
>
> Is there some way to indicate this additional intelligence to Delta so it
> can look into only last years data while running a merge or update? It
> seems to be an obvious performance booster.
>
> Any thoughts?
> --
> Best Regards,
> Ayan Guha
> --
> Best Regards,
> Ayan Guha
>