Hey Ravi, Do you have any rough idea on when will PhoenixPigLoader be available for use?
On Fri, Apr 4, 2014 at 1:52 PM, Andrew <[email protected]> wrote: > HI Ravi, > > That's helpful, thank you. Are these in the Github repo yet, so I can > have a look to get an idea? (I don't see anything in > phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop) > > Andrew. > > > On 04/04/2014 15:54, Ravi Kiran wrote: > >> Hi Andrew, >> >> As part of a custom Pig Loader , we are coming up with a >> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are >> currently within the Phoenix-Pig module, most of the code can be reused for >> a MR job. >> >> Regards >> Ravi >> >> >> >> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]<mailto: >> [email protected]>> wrote: >> >> you can create a custom function (for example >> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your- >> own-built-in-function.html >> ) >> >> >> On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected] >> <mailto:[email protected]>> wrote: >> >> I am considering using Phoenix, but I know that I will want to >> transform >> my data via MapReduce, e.g. UPSERT some core data, then go >> back over the >> data set and "fill in" some additional columns (appropriately >> stored in >> additional column groups). >> >> I think all I need to do is implement an InputFormat >> implementation that >> takes a table name (or more generally /select * from table >> where .../). >> But in order to define splits, I need to somehow discover key >> ranges so >> that I can issue a series of contiguous range scans. >> >> Can you suggest how I might go about this in a general way... >> if I get >> this right then I'll contribute the code. Else I will need to use >> external knowledge of my specific table data to partition the >> task. If >> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, >> then that >> would also achieve the goal. Or is there some way to >> implement the >> InputFormat via a native HBase API call perhaps? >> >> Andrew. >> >> (MongoDB's InputFormat implementation, calls an internal >> function on the >> server to do this: >> https://github.com/mongodb/mongo-hadoop/blob/master/core/ >> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java) >> >> >> >> >
