Hi Boris, The way we are exposing the data to the end  users is through a
Hive view on top of the Hbase tables. I guess a Hive-ql script that does
select count(*) of every table should do the job. I 'm pretty sure it will
be slow as well and there will be a lot of manual work when reconciling it
with the output from Oracle side but I don't see any other option.Maybe
we'll do this once a month or  fortnightly and will trust the error
notification from Nifi to establish whether everything working fine or
not.  I agree with your comment that counts wont match. We are using Kafka
to from the data stream coming from GG and I was wondering if there is way
we can tell how many messages have been read by the consumer and how many
are waiting to be read ? and by this we can sort of justify the difference
between GG and Hbase  but i guess that question for another topic.


On Tue, Jun 12, 2018 at 9:59 PM Boris Tyukin <bo...@boristyukin.com> wrote:

> I would be curious to hear how you end up doing it, Faisal. In my
> experience taking row count from HBase tables was painfully slow and this
> was one of the reasons we decided to move to Apache Kudu. We tried 5
> different ways taking row counts with HBase and it was still painfully slow.
>
> Another issue you will encounter is that counts will never match since GG
> delivers changes in near real-time. I heard about one creative way using
> Oracle flashback feature and compare counts at the precise moment of time.
>
> Boris
>
> On Mon, Jun 11, 2018 at 10:36 PM Faisal Durrani <te04.0...@gmail.com>
> wrote:
>
>> Hi Andrew,
>>
>> Thank you for your suggestion. We are using the timestamp property of the
>> PutHbase processor to enforce the order. This timestamp is extracted from
>> the golden gate message as a meta data. I agree with your approach for
>> creating an End of day file which seems to be the most logical way of doing
>> the reconciliation.
>>
>> Thank you for the help.
>> Faisal
>>
>> On Mon, Jun 11, 2018 at 6:07 PM Andrew Psaltis <psaltis.and...@gmail.com>
>> wrote:
>>
>>> Hi Faisal,
>>> OK, so then a single partition in Kafka and I assume that you are using
>>> the EnforceOrder processor or another means to ensure the records are
>>> in order so that updates happen after inserts and so forth. In that case,
>>> assuming everything is in order the simplest approach is just to do a count
>>> at the end-of-the-day, whatever that means for your business. However,
>>> often times a simple count is not good enough, as you actually want to know
>>> that the tables reconcile. Often times, in this case, there is a way to
>>> produce an End Of Day file or record for the source system. This will
>>> usually contain things like the record count, along with a sum of various
>>> columns, and possibly a query that is used to produce the sum or another
>>> validation. With this, you can then ensure that the tables do reconcile.
>>>
>>> Do you have the ability to create an "End of Day" file or something
>>> along those lines that you can use?
>>>
>>>
>>> [1]
>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.EnforceOrder/index.html
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>
>>> On Mon, Jun 11, 2018 at 11:11 AM Faisal Durrani <te04.0...@gmail.com>
>>> wrote:
>>>
>>>> Hi Andrew,
>>>> We are receiving the golden gate transactions from Kafka which is
>>>> received in Nifi through consume kafka processor . Our data flow then
>>>> reduces the golden gate json message and sends the data to the target table
>>>> in Hbase using the PutHbase Json processor.
>>>>
>>>> Thanks,
>>>> Faisal
>>>>
>>>> On Mon, Jun 11, 2018 at 12:01 PM Andrew Psaltis <
>>>> psaltis.and...@gmail.com> wrote:
>>>>
>>>>> Hi Faisal,
>>>>> There are various ways this can be handled. But this is going to
>>>>> depend on, how are you receiving data from Oracle via Golden Gate. Are you
>>>>> using the HBase Handler, the HDFS Handler, a Flat File, Kafka, or via
>>>>> another means?
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>>
>>>>> On Mon, Jun 11, 2018 at 9:43 AM Faisal Durrani <te04.0...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Is there a recommended way to ensure the row counts form tables in
>>>>>> source (Oracle) are consistent with that of target tables in Hbase (
>>>>>> data-lake)? .We are using Nifi which receives the golden gate messages 
>>>>>> and
>>>>>> then by using different processor we store the transactions in Hbase ,so
>>>>>> essentially the tables in Hbase should be in sync with the tables in 
>>>>>> Oracle
>>>>>> at all times. I am interested in knowing how the teams ensure and proof
>>>>>> this ? Do they take row counts from source and target everyday and match 
>>>>>> it
>>>>>> and say that its synced ? I used the counter option in Nifi which
>>>>>> maintained the record received against each table but i guess that is not
>>>>>> an optimized way to do it.
>>>>>>
>>>>>

Reply via email to