Re: Using Phoenix as an InputFormat

Ravi Kiran Fri, 04 Apr 2014 12:55:03 -0700

Hi Andrew,

   As part of a custom Pig Loader , we are coming up with a
PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
currently within the Phoenix-Pig module, most of the code can be reused for
a MR job.


Regards
Ravi



On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]> wrote:

> you can create a custom function (for example
> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html)
>
>
> On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]> wrote:
>
>> I am considering using Phoenix, but I know that I will want to transform
>> my data via MapReduce, e.g. UPSERT some core data, then go back over the
>> data set and "fill in" some additional columns (appropriately stored in
>> additional column groups).
>>
>> I think all I need to do is implement an InputFormat implementation that
>> takes a table name (or more generally /select * from table where .../).
>> But in order to define splits, I need to somehow discover key ranges so
>> that I can issue a series of contiguous range scans.
>>
>> Can you suggest how I might go about this in a general way... if I get
>> this right then I'll contribute the code.  Else I will need to use
>> external knowledge of my specific table data to partition the task.  If
>> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that
>> would also achieve the goal.  Or is there some way to implement the
>> InputFormat via a native HBase API call perhaps?
>>
>> Andrew.
>>
>> (MongoDB's InputFormat implementation, calls an internal function on the
>> server to do this:
>> https://github.com/mongodb/mongo-hadoop/blob/master/core/
>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>>
>>
>

Re: Using Phoenix as an InputFormat

Reply via email to