Hi , I will definitely drop a mail to you once the code is available which definitely isn't far away.
Regards Ravi On Sat, Apr 5, 2014 at 3:02 AM, Localhost shell < [email protected]> wrote: > Hey Ravi, > > Do you have any rough idea on when will PhoenixPigLoader be available for > use? > > > > > On Fri, Apr 4, 2014 at 1:52 PM, Andrew <[email protected]> wrote: > >> HI Ravi, >> >> That's helpful, thank you. Are these in the Github repo yet, so I can >> have a look to get an idea? (I don't see anything in >> phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop) >> >> Andrew. >> >> >> On 04/04/2014 15:54, Ravi Kiran wrote: >> >>> Hi Andrew, >>> >>> As part of a custom Pig Loader , we are coming up with a >>> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are >>> currently within the Phoenix-Pig module, most of the code can be reused for >>> a MR job. >>> >>> Regards >>> Ravi >>> >>> >>> >>> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]<mailto: >>> [email protected]>> wrote: >>> >>> you can create a custom function (for example >>> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your- >>> own-built-in-function.html >>> ) >>> >>> >>> On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I am considering using Phoenix, but I know that I will want to >>> transform >>> my data via MapReduce, e.g. UPSERT some core data, then go >>> back over the >>> data set and "fill in" some additional columns (appropriately >>> stored in >>> additional column groups). >>> >>> I think all I need to do is implement an InputFormat >>> implementation that >>> takes a table name (or more generally /select * from table >>> where .../). >>> But in order to define splits, I need to somehow discover key >>> ranges so >>> that I can issue a series of contiguous range scans. >>> >>> Can you suggest how I might go about this in a general way... >>> if I get >>> this right then I'll contribute the code. Else I will need to >>> use >>> external knowledge of my specific table data to partition the >>> task. If >>> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, >>> then that >>> would also achieve the goal. Or is there some way to >>> implement the >>> InputFormat via a native HBase API call perhaps? >>> >>> Andrew. >>> >>> (MongoDB's InputFormat implementation, calls an internal >>> function on the >>> server to do this: >>> https://github.com/mongodb/mongo-hadoop/blob/master/core/ >>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java) >>> >>> >>> >>> >> >
