Re: Re: Re: QueryDatabaseTable - Deleted Records

2017-09-18 Thread Matt Burgess
Uwe,

Is there anything in the V$ARCHIVED_LOG table [1] in your source
database? If so you may be able to get some of that information from
there. Also is LogMiner [2] enabled on the database? That's another
way to be able to query the logs to get things like deletes.

In general, there has to be something in the system that knows when
something is deleted. That can be done in a few ways, with varying
amounts of success:

1) Cross-join the source table with the target table. This won't pick
up records that have been inserted, updated, then deleted since the
last time the join was performed. Plus the join is very costly for
large tables.
2) Change the schema of the tables to add a soft delete flag (as
mentioned in an earlier reply). This often cannot be done because the
main app needs a particular schema, or because the CDC user does not
have permission to do such things to the source DB
3) Intercept the calls that will change the DB. This is fragile
because you may not know if the call succeeds at the DB. Plus you may
not be able to change the architecture to put something in between the
users and the DB.
4) Interrogate the logs. This is IMO the only real way to do CDC, and
is how most CDC solutions operate. The CaptureChangeMySQL processor
reads the MySQL binary logs, and if/when the CaptureChangeOracle
processor is implemented, it will likely require LogMiner or something
to be enabled at the source to make the information available.  This
pattern can apply to all DBs that support such a thing, as long as
they can be interrogated via JDBC (SQL queries) or some client tool
(which hopefully for us has a Java port!)

Regards,
Matt

[1] 
https://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_1016.htm#REFRN30011
[2] 
https://docs.oracle.com/cd/E11882_01/server.112/e22490/logminer.htm#SUTIL1559

On Mon, Sep 18, 2017 at 1:35 PM, Uwe Geercken <uwe.geerc...@web.de> wrote:
> Andrew,
>
> yes. we are doing the same for the oracle db which is quite old and does not
> provide this information.
>
> Anyway. Was just curious if somebody has a smarter solution. The blogs of
> Nifi and Kafka have really good samples of extracting data but none of them
> touches the topic of deletes.
>
> Rgds,
>
> Uwe
>
>
>
> Gesendet: Samstag, 16. September 2017 um 13:52 Uhr
> Von: "Andrew Grande" <apere...@gmail.com>
> An: users@nifi.apache.org
> Betreff: Re: Re: QueryDatabaseTable - Deleted Records
>
> As an interesting architectural approach we took eons ago, before NiFi, was
> to take daily snapshots of a full table. Every row would then be
> hashed/digested or in any other way uniquely identified and 2 datasets would
> be crossed and compared to find inserts/deletes/updates. It was involved,
> but worked.
>
> Andrew
>
>
> On Sat, Sep 16, 2017, 2:38 AM Uwe Geercken <uwe.geerc...@web.de> wrote:
>>
>> Bryan,
>>
>> yes, the change log would be possible. In my use case I have Oracle 11 as
>> the source - and I can not change the source easily (takes long - is
>> expensive).
>>
>> I was expecting this answer but wanted to make sure that I have not missed
>> anything. I will try to build my use case around something else then.
>>
>> Thanks for your response(s).
>>
>> Rgds,
>>
>> Uwe
>>
>> Gesendet: Freitag, 15. September 2017 um 16:15 Uhr
>> Von: "Bryan Bende" <bbe...@gmail.com>
>> An: users@nifi.apache.org
>> Betreff: Re: QueryDatabaseTable - Deleted Records
>> Uwe,
>>
>> Typically you need to process the change log of the database in this
>> case, which unfortunately usually becomes database specific.
>>
>> I believe we have a processor CaptureChangeMySQL that can process the
>> MySQL change log.
>>
>> -Bryan
>>
>>
>> On Tue, Sep 12, 2017 at 1:39 PM, Uwe Geercken <uwe.geerc...@web.de> wrote:
>> > Hello,
>> >
>> > apparently the QueryDatabaseTable processor catches changes made to the
>> > data
>> > of the source database - updates and inserts.
>> >
>> > Has anybody a good idea or strategy how to handle deletes in the source
>> > database? Of course one could flag a record as deleted instead of
>> > phisically
>> > deleting it. But this means changing the source system in many cases and
>> > that is sometimes not possible. And yes, if you process the change log
>> > (if
>> > available) of the source system that is also a good option.
>> >
>> > Would be greatful for any tips or a best practive of how you do it.
>> >
>> > Rgds,
>> >
>> > Uwe


Aw: Re: Re: QueryDatabaseTable - Deleted Records

2017-09-18 Thread Uwe Geercken

Andrew,

 

yes. we are doing the same for the oracle db which is quite old and does not provide this information.

 

Anyway. Was just curious if somebody has a smarter solution. The blogs of Nifi and Kafka have really good samples of extracting data but none of them touches the topic of deletes.

 

Rgds,

 

Uwe

 

 

 

Gesendet: Samstag, 16. September 2017 um 13:52 Uhr
Von: "Andrew Grande" <apere...@gmail.com>
An: users@nifi.apache.org
Betreff: Re: Re: QueryDatabaseTable - Deleted Records


As an interesting architectural approach we took eons ago, before NiFi, was to take daily snapshots of a full table. Every row would then be hashed/digested or in any other way uniquely identified and 2 datasets would be crossed and compared to find inserts/deletes/updates. It was involved, but worked.

Andrew
 


On Sat, Sep 16, 2017, 2:38 AM Uwe Geercken <uwe.geerc...@web.de> wrote:





Bryan,

 

yes, the change log would be possible. In my use case I have Oracle 11 as the source - and I can not change the source easily (takes long - is expensive).

 

I was expecting this answer but wanted to make sure that I have not missed anything. I will try to build my use case around something else then.

 

Thanks for your response(s).

 

Rgds,

 

Uwe

 

Gesendet: Freitag, 15. September 2017 um 16:15 Uhr
Von: "Bryan Bende" <bbe...@gmail.com>
An: users@nifi.apache.org
Betreff: Re: QueryDatabaseTable - Deleted Records











Uwe,

Typically you need to process the change log of the database in this
case, which unfortunately usually becomes database specific.

I believe we have a processor CaptureChangeMySQL that can process the
MySQL change log.

-Bryan


On Tue, Sep 12, 2017 at 1:39 PM, Uwe Geercken <uwe.geerc...@web.de> wrote:
> Hello,
>
> apparently the QueryDatabaseTable processor catches changes made to the data
> of the source database - updates and inserts.
>
> Has anybody a good idea or strategy how to handle deletes in the source
> database? Of course one could flag a record as deleted instead of phisically
> deleting it. But this means changing the source system in many cases and
> that is sometimes not possible. And yes, if you process the change log (if
> available) of the source system that is also a good option.
>
> Would be greatful for any tips or a best practive of how you do it.
>
> Rgds,
>
> Uwe













Re: Re: QueryDatabaseTable - Deleted Records

2017-09-16 Thread Andrew Grande
As an interesting architectural approach we took eons ago, before NiFi, was
to take daily snapshots of a full table. Every row would then be
hashed/digested or in any other way uniquely identified and 2 datasets
would be crossed and compared to find inserts/deletes/updates. It was
involved, but worked.

Andrew

On Sat, Sep 16, 2017, 2:38 AM Uwe Geercken  wrote:

> Bryan,
>
> yes, the change log would be possible. In my use case I have Oracle 11 as
> the source - and I can not change the source easily (takes long - is
> expensive).
>
> I was expecting this answer but wanted to make sure that I have not missed
> anything. I will try to build my use case around something else then.
>
> Thanks for your response(s).
>
> Rgds,
>
> Uwe
>
> *Gesendet:* Freitag, 15. September 2017 um 16:15 Uhr
> *Von:* "Bryan Bende" 
> *An:* users@nifi.apache.org
> *Betreff:* Re: QueryDatabaseTable - Deleted Records
> Uwe,
>
> Typically you need to process the change log of the database in this
> case, which unfortunately usually becomes database specific.
>
> I believe we have a processor CaptureChangeMySQL that can process the
> MySQL change log.
>
> -Bryan
>
>
> On Tue, Sep 12, 2017 at 1:39 PM, Uwe Geercken  wrote:
> > Hello,
> >
> > apparently the QueryDatabaseTable processor catches changes made to the
> data
> > of the source database - updates and inserts.
> >
> > Has anybody a good idea or strategy how to handle deletes in the source
> > database? Of course one could flag a record as deleted instead of
> phisically
> > deleting it. But this means changing the source system in many cases and
> > that is sometimes not possible. And yes, if you process the change log
> (if
> > available) of the source system that is also a good option.
> >
> > Would be greatful for any tips or a best practive of how you do it.
> >
> > Rgds,
> >
> > Uwe
>