Thanks for your info. I am planning to implement a pig udf to do record
look ups. Kindly let me know if this is a good idea.

Best
Ayan

On Thu, Sep 3, 2015 at 2:55 PM, Jörn Franke <jornfra...@gmail.com> wrote:

>
> You may check if it makes sense to write a coprocessor doing an upsert for
> you, if it does not exist already. Maybe phoenix for Hbase supports this
> already.
>
> Another alternative, if the records do not have an unique Id, is to put
> them into a text index engine, such as Solr or Elasticsearch, which does in
> this case a fast matching with relevancy scores.
>
>
> You can use also Spark and Pig there. However, I am not sure if Spark is
> suitable for these one row lookups. Same holds for Pig.
>
>
> Le mer. 2 sept. 2015 à 23:53, ayan guha <guha.a...@gmail.com> a écrit :
>
> Hello group
>
> I am trying to use pig or spark in order to achieve following:
>
> 1. Write a batch process which will read from a file
> 2. Lookup hbase to see if the record exists. If so then need to compare
> incoming values with hbase and update fields which do not match. Else
> create a new record.
>
> My questions:
> 1. Is this a good use case for pig or spark?
> 2. Is there any way to read hbase for each incoming record in pig without
> writing map reduce code?
> 3. In case of spark I think we have to connect to hbase for every record.
> Is thr any other way?
> 4. What is the best connector for hbase which gives this functionality?
>
> Best
>
> Ayan
>
>
>


-- 
Best Regards,
Ayan Guha

Reply via email to