you can create a custom function (for example
http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html)


On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]> wrote:

> I am considering using Phoenix, but I know that I will want to transform
> my data via MapReduce, e.g. UPSERT some core data, then go back over the
> data set and "fill in" some additional columns (appropriately stored in
> additional column groups).
>
> I think all I need to do is implement an InputFormat implementation that
> takes a table name (or more generally /select * from table where .../).
> But in order to define splits, I need to somehow discover key ranges so
> that I can issue a series of contiguous range scans.
>
> Can you suggest how I might go about this in a general way... if I get
> this right then I'll contribute the code.  Else I will need to use
> external knowledge of my specific table data to partition the task.  If
> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that
> would also achieve the goal.  Or is there some way to implement the
> InputFormat via a native HBase API call perhaps?
>
> Andrew.
>
> (MongoDB's InputFormat implementation, calls an internal function on the
> server to do this:
> https://github.com/mongodb/mongo-hadoop/blob/master/core/
> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>
>

Reply via email to