Re: Row counts - date lake

2018-06-12 Thread Faisal Durrani
Hi Boris, The way we are exposing the data to the end users is through a Hive view on top of the Hbase tables. I guess a Hive-ql script that does select count(*) of every table should do the job. I 'm pretty sure it will be slow as well and there will be a lot of manual work when reconciling it

Re: Row counts - date lake

2018-06-12 Thread Boris Tyukin
I would be curious to hear how you end up doing it, Faisal. In my experience taking row count from HBase tables was painfully slow and this was one of the reasons we decided to move to Apache Kudu. We tried 5 different ways taking row counts with HBase and it was still painfully slow. Another

Re: Row counts - date lake

2018-06-11 Thread Faisal Durrani
Hi Andrew, Thank you for your suggestion. We are using the timestamp property of the PutHbase processor to enforce the order. This timestamp is extracted from the golden gate message as a meta data. I agree with your approach for creating an End of day file which seems to be the most logical way

Re: Row counts - date lake

2018-06-11 Thread Andrew Psaltis
Hi Faisal, OK, so then a single partition in Kafka and I assume that you are using the EnforceOrder processor or another means to ensure the records are in order so that updates happen after inserts and so forth. In that case, assuming everything is in order the simplest approach is just to do a

Re: Row counts - date lake

2018-06-10 Thread Faisal Durrani
Hi Andrew, We are receiving the golden gate transactions from Kafka which is received in Nifi through consume kafka processor . Our data flow then reduces the golden gate json message and sends the data to the target table in Hbase using the PutHbase Json processor. Thanks, Faisal On Mon, Jun

Re: Row counts - date lake

2018-06-10 Thread Andrew Psaltis
Hi Faisal, There are various ways this can be handled. But this is going to depend on, how are you receiving data from Oracle via Golden Gate. Are you using the HBase Handler, the HDFS Handler, a Flat File, Kafka, or via another means? Thanks, Andrew On Mon, Jun 11, 2018 at 9:43 AM Faisal

Row counts - date lake

2018-06-10 Thread Faisal Durrani
Is there a recommended way to ensure the row counts form tables in source (Oracle) are consistent with that of target tables in Hbase ( data-lake)? .We are using Nifi which receives the golden gate messages and then by using different processor we store the transactions in Hbase ,so essentially