Hi Gabriel, thanks a lot for the reply. I noticed my self afterwards that it does a rollback on every upsert and then extracts the KeyValues. Basically I am trying to replicate the same job but in Spark and I cannot understand where in the existing source code of IndexTool is guaranteed that the row keys written in the HFiles are in the correct order. I have been getting errors "Added a key not lexically larger than previous key"
Thanks a lot! On 15 September 2015 at 19:46, Gabriel Reid <[email protected]> wrote: > The upsert statements in the MR jobs are used to convert data into the > appropriate encoding for writing to an HFile -- the data doesn't actually > get pushed to Phoenix from within the MR job. Instead, the created > KeyValues are extracted from the "output" of the upsert statement, and the > statement is rolled-back within the MR job. The extracted KeyValues are > then written to the HFile. > > - Gabriel > > On Tue, Sep 15, 2015 at 2:12 PM Yiannis Gkoufas <[email protected]> > wrote: > >> Hi there, >> >> I was going through the code related to index creation via MapReduce job >> (IndexTool) and I have some questions. >> If I am not mistaken, for a global secondary index Phoenix creates a new >> HBase table which has the appropriate key (the column value of the original >> table you want to index) and loads the column values you have in your >> INCLUDE statement. >> In the PhoenixIndexImportMapper I can see that an Upsert statement is >> executed, but also HFiles are written. >> My question is the following: why is the Upsert statement needed if the >> table containing the secondary index will be populated from the HFiles >> written? >> >> Thanks a lot >> >
