Ayan:
Please read this:
http://hbase.apache.org/book.html#cp
Cheers
On Thu, Sep 3, 2015 at 2:13 PM, ayan guha wrote:
> Hi
>
> Thanks for your comments. My driving point is instead of loading Hbase
> data entirely I want to process record by record lookup and that is best
> done in UDF or map fu
Hi
Thanks for your comments. My driving point is instead of loading Hbase data
entirely I want to process record by record lookup and that is best done in
UDF or map function. I also would loved to do it in Spark but no production
cluster yet here :(
@Franke: I do not have enough competency on co
But I don't see how it works here with phoenix or hbase coprocessor.
Remember we are joining 2 big data sets here, one is the big file in HDFS,
and records in HBASE. The driving force comes from Hadoop cluster.
On Thu, Sep 3, 2015 at 11:37 AM, Jörn Franke wrote:
> If you use pig or spark you
If you use pig or spark you increase the complexity from an operations
management perspective significantly. Spark should be seen from a platform
perspective if it make sense. If you can do it directly with hbase/phoenix
or only hbase coprocessor then this should be preferred. Otherwise you pay
mor
Yes. Ayan, you approach will work.
Or alternatively, use Spark, and write a Scala/Java function which
implements similar logic in your Pig UDF.
Both approaches look similar.
Personally, I would go with Spark solution, it will be slightly faster, and
easier if you already have Spark cluster setup
Thanks for your info. I am planning to implement a pig udf to do record
look ups. Kindly let me know if this is a good idea.
Best
Ayan
On Thu, Sep 3, 2015 at 2:55 PM, Jörn Franke wrote:
>
> You may check if it makes sense to write a coprocessor doing an upsert for
> you, if it does not exist al
You may check if it makes sense to write a coprocessor doing an upsert for
you, if it does not exist already. Maybe phoenix for Hbase supports this
already.
Another alternative, if the records do not have an unique Id, is to put
them into a text index engine, such as Solr or Elasticsearch, which d
Hello group
I am trying to use pig or spark in order to achieve following:
1. Write a batch process which will read from a file
2. Lookup hbase to see if the record exists. If so then need to compare
incoming values with hbase and update fields which do not match. Else
create a new record.
My qu