Re: Question about IndexTool

Yiannis Gkoufas Wed, 16 Sep 2015 01:34:13 -0700

Hi Gabriel,

thanks a lot for the reply. I noticed my self afterwards that it does a
rollback on every upsert and then extracts the KeyValues.
Basically I am trying to replicate the same job but in Spark and I cannot
understand where in the existing source code of IndexTool is guaranteed
that the row keys written in the HFiles are in the correct order.
I have been getting errors "Added a key not lexically larger than previous
key"


Thanks a lot!


On 15 September 2015 at 19:46, Gabriel Reid <[email protected]> wrote:

> The upsert statements in the MR jobs are used to convert data into the
> appropriate encoding for writing to an HFile -- the data doesn't actually
> get pushed to Phoenix from within the MR job. Instead, the created
> KeyValues are extracted from the "output" of the upsert statement, and the
> statement is rolled-back within the MR job. The extracted KeyValues are
> then written to the HFile.
>
> - Gabriel
>
> On Tue, Sep 15, 2015 at 2:12 PM Yiannis Gkoufas <[email protected]>
> wrote:
>
>> Hi there,
>>
>> I was going through the code related to index creation via MapReduce job
>> (IndexTool) and I have some questions.
>> If I am not mistaken, for a global secondary index Phoenix creates a new
>> HBase table which has the appropriate key (the column value of the original
>> table you want to index) and loads the column values you have in your
>> INCLUDE statement.
>> In the PhoenixIndexImportMapper I can see that an Upsert statement is
>> executed, but also HFiles are written.
>> My question is the following: why is the Upsert statement needed if the
>> table containing the secondary index will be populated from the HFiles
>> written?
>>
>> Thanks a lot
>>
>

Re: Question about IndexTool

Reply via email to