Hi Andrew,
The moment I check in into git , I will drop a message to you. For now,
I am writing test cases .
Regards
On Sat, Apr 5, 2014 at 9:04 AM, Ravi Kiran <[email protected]>wrote:
> Hi Andrew,
> The moment I check it , I will drop a message to you. For now, I am
> writing test cases .
>
> Regards
> Ravi
>
>
> On Sat, Apr 5, 2014 at 2:22 AM, Andrew <[email protected]> wrote:
>
>> HI Ravi,
>>
>> That's helpful, thank you. Are these in the Github repo yet, so I can
>> have a look to get an idea? (I don't see anything in
>> phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)
>>
>> Andrew.
>>
>>
>> On 04/04/2014 15:54, Ravi Kiran wrote:
>>
>>> Hi Andrew,
>>>
>>> As part of a custom Pig Loader , we are coming up with a
>>> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
>>> currently within the Phoenix-Pig module, most of the code can be reused for
>>> a MR job.
>>>
>>> Regards
>>> Ravi
>>>
>>>
>>>
>>> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <[email protected]<mailto:
>>> [email protected]>> wrote:
>>>
>>> you can create a custom function (for example
>>> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-
>>> own-built-in-function.html
>>> )
>>>
>>>
>>> On Fri, Apr 4, 2014 at 12:47 AM, Andrew <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> I am considering using Phoenix, but I know that I will want to
>>> transform
>>> my data via MapReduce, e.g. UPSERT some core data, then go
>>> back over the
>>> data set and "fill in" some additional columns (appropriately
>>> stored in
>>> additional column groups).
>>>
>>> I think all I need to do is implement an InputFormat
>>> implementation that
>>> takes a table name (or more generally /select * from table
>>> where .../).
>>> But in order to define splits, I need to somehow discover key
>>> ranges so
>>> that I can issue a series of contiguous range scans.
>>>
>>> Can you suggest how I might go about this in a general way...
>>> if I get
>>> this right then I'll contribute the code. Else I will need to
>>> use
>>> external knowledge of my specific table data to partition the
>>> task. If
>>> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
>>> then that
>>> would also achieve the goal. Or is there some way to
>>> implement the
>>> InputFormat via a native HBase API call perhaps?
>>>
>>> Andrew.
>>>
>>> (MongoDB's InputFormat implementation, calls an internal
>>> function on the
>>> server to do this:
>>> https://github.com/mongodb/mongo-hadoop/blob/master/core/
>>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>>>
>>>
>>>
>>>
>>
>