Hi Andrew, As part of a custom Pig Loader , we are coming up with a PhoenixInputFormat, PhoenixRecordReader.. Though these classes are currently within the Phoenix-Pig module, most of the code can be reused for a MR job.
Regards Ravi On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]> wrote: > you can create a custom function (for example > http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html) > > > On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]> wrote: > >> I am considering using Phoenix, but I know that I will want to transform >> my data via MapReduce, e.g. UPSERT some core data, then go back over the >> data set and "fill in" some additional columns (appropriately stored in >> additional column groups). >> >> I think all I need to do is implement an InputFormat implementation that >> takes a table name (or more generally /select * from table where .../). >> But in order to define splits, I need to somehow discover key ranges so >> that I can issue a series of contiguous range scans. >> >> Can you suggest how I might go about this in a general way... if I get >> this right then I'll contribute the code. Else I will need to use >> external knowledge of my specific table data to partition the task. If >> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that >> would also achieve the goal. Or is there some way to implement the >> InputFormat via a native HBase API call perhaps? >> >> Andrew. >> >> (MongoDB's InputFormat implementation, calls an internal function on the >> server to do this: >> https://github.com/mongodb/mongo-hadoop/blob/master/core/ >> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java) >> >> >
