Hi Boris, The way we are exposing the data to the end users is through a Hive view on top of the Hbase tables. I guess a Hive-ql script that does select count(*) of every table should do the job. I 'm pretty sure it will be slow as well and there will be a lot of manual work when reconciling it with the output from Oracle side but I don't see any other option.Maybe we'll do this once a month or fortnightly and will trust the error notification from Nifi to establish whether everything working fine or not. I agree with your comment that counts wont match. We are using Kafka to from the data stream coming from GG and I was wondering if there is way we can tell how many messages have been read by the consumer and how many are waiting to be read ? and by this we can sort of justify the difference between GG and Hbase but i guess that question for another topic.
On Tue, Jun 12, 2018 at 9:59 PM Boris Tyukin <bo...@boristyukin.com> wrote: > I would be curious to hear how you end up doing it, Faisal. In my > experience taking row count from HBase tables was painfully slow and this > was one of the reasons we decided to move to Apache Kudu. We tried 5 > different ways taking row counts with HBase and it was still painfully slow. > > Another issue you will encounter is that counts will never match since GG > delivers changes in near real-time. I heard about one creative way using > Oracle flashback feature and compare counts at the precise moment of time. > > Boris > > On Mon, Jun 11, 2018 at 10:36 PM Faisal Durrani <te04.0...@gmail.com> > wrote: > >> Hi Andrew, >> >> Thank you for your suggestion. We are using the timestamp property of the >> PutHbase processor to enforce the order. This timestamp is extracted from >> the golden gate message as a meta data. I agree with your approach for >> creating an End of day file which seems to be the most logical way of doing >> the reconciliation. >> >> Thank you for the help. >> Faisal >> >> On Mon, Jun 11, 2018 at 6:07 PM Andrew Psaltis <psaltis.and...@gmail.com> >> wrote: >> >>> Hi Faisal, >>> OK, so then a single partition in Kafka and I assume that you are using >>> the EnforceOrder processor or another means to ensure the records are >>> in order so that updates happen after inserts and so forth. In that case, >>> assuming everything is in order the simplest approach is just to do a count >>> at the end-of-the-day, whatever that means for your business. However, >>> often times a simple count is not good enough, as you actually want to know >>> that the tables reconcile. Often times, in this case, there is a way to >>> produce an End Of Day file or record for the source system. This will >>> usually contain things like the record count, along with a sum of various >>> columns, and possibly a query that is used to produce the sum or another >>> validation. With this, you can then ensure that the tables do reconcile. >>> >>> Do you have the ability to create an "End of Day" file or something >>> along those lines that you can use? >>> >>> >>> [1] >>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.EnforceOrder/index.html >>> >>> Thanks, >>> Andrew >>> >>> >>> >>> On Mon, Jun 11, 2018 at 11:11 AM Faisal Durrani <te04.0...@gmail.com> >>> wrote: >>> >>>> Hi Andrew, >>>> We are receiving the golden gate transactions from Kafka which is >>>> received in Nifi through consume kafka processor . Our data flow then >>>> reduces the golden gate json message and sends the data to the target table >>>> in Hbase using the PutHbase Json processor. >>>> >>>> Thanks, >>>> Faisal >>>> >>>> On Mon, Jun 11, 2018 at 12:01 PM Andrew Psaltis < >>>> psaltis.and...@gmail.com> wrote: >>>> >>>>> Hi Faisal, >>>>> There are various ways this can be handled. But this is going to >>>>> depend on, how are you receiving data from Oracle via Golden Gate. Are you >>>>> using the HBase Handler, the HDFS Handler, a Flat File, Kafka, or via >>>>> another means? >>>>> >>>>> Thanks, >>>>> Andrew >>>>> >>>>> On Mon, Jun 11, 2018 at 9:43 AM Faisal Durrani <te04.0...@gmail.com> >>>>> wrote: >>>>> >>>>>> Is there a recommended way to ensure the row counts form tables in >>>>>> source (Oracle) are consistent with that of target tables in Hbase ( >>>>>> data-lake)? .We are using Nifi which receives the golden gate messages >>>>>> and >>>>>> then by using different processor we store the transactions in Hbase ,so >>>>>> essentially the tables in Hbase should be in sync with the tables in >>>>>> Oracle >>>>>> at all times. I am interested in knowing how the teams ensure and proof >>>>>> this ? Do they take row counts from source and target everyday and match >>>>>> it >>>>>> and say that its synced ? I used the counter option in Nifi which >>>>>> maintained the record received against each table but i guess that is not >>>>>> an optimized way to do it. >>>>>> >>>>>