[jira] [Commented] (PHOENIX-4641) Perform index maintenance on server-side for transactional local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16395424#comment-16395424 ] Ohad Shacham commented on PHOENIX-4641: --- FYI [~jamestaylor]. I created OMID-93, will create a pull request tomorrow. > Perform index maintenance on server-side for transactional local indexes > > > Key: PHOENIX-4641 > URL: https://issues.apache.org/jira/browse/PHOENIX-4641 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Priority: Major > > PHOENIX-4278 changed index maintenance for transactional tables to be > performed on the client side. For local indexes, this is not ideal and not > really necessary as the updates to the indexes will all be local. By doing > this on the client side, we'd incur extra overhead: > - extra RPCs for updates to local index tables separate from RPCs for data > tables > - related to this, more network bandwidth would be used > - calculation on client-side to determine region start key (which is someone > unclear whether there's a race condition with a split occurring while this is > being determined) > - the updates to local indexes would no longer be row-level atomic with data > table HBase updates (though they'd be atomic because they're transactional) > With Tephra, we can do the index maintenance on the server side without > further changes. For Omid, it's more difficult since we must: > - perform all writes > - write to the commit table > - write the shadow cells (which requires knowing the index updates) > If there will already be an API to write the shadow cells (required for the > initial population of local indexes), then perhaps we could piggyback on > that. On the client-side, we could do the following: > - perform all writes > - write to the commit table > - perform writes again, but with a flag set to indicate that only the shadow > cells need to be written (note we already have the > mutation.setAttribute(REPLAY_WRITES, REPLAY_ONLY_INDEX_WRITES) option that > will help with this) . In this case, we'd execute the logic to compute the > index updates twice, but on the plus side, we wouldn't incur the other > overhead mentioned before. > All in all, it's unclear if this is worth doing. It doesn't make a lot of > sense to use local indexes for transactional tables, since one of the biggest > benefits of local indexes is row level atomicity between index and table rows > is already achieved more generally by transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4641) Perform index maintenance on server-side for transactional local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388464#comment-16388464 ] Ohad Shacham commented on PHOENIX-4641: --- I can easily build such an API. I assume it will get a mutation list and return a list of puts? > Perform index maintenance on server-side for transactional local indexes > > > Key: PHOENIX-4641 > URL: https://issues.apache.org/jira/browse/PHOENIX-4641 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Priority: Major > > PHOENIX-4278 changed index maintenance for transactional tables to be > performed on the client side. For local indexes, this is not ideal and not > really necessary as the updates to the indexes will all be local. By doing > this on the client side, we'd incur extra overhead: > - extra RPCs for updates to local index tables separate from RPCs for data > tables > - related to this, more network bandwidth would be used > - calculation on client-side to determine region start key (which is someone > unclear whether there's a race condition with a split occurring while this is > being determined) > - the updates to local indexes would no longer be row-level atomic with data > table HBase updates (though they'd be atomic because they're transactional) > With Tephra, we can do the index maintenance on the server side without > further changes. For Omid, it's more difficult since we must: > - perform all writes > - write to the commit table > - write the shadow cells (which requires knowing the index updates) > If there will already be an API to write the shadow cells (required for the > initial population of local indexes), then perhaps we could piggyback on > that. On the client-side, we could do the following: > - perform all writes > - write to the commit table > - perform writes again, but with a flag set to indicate that only the shadow > cells need to be written (note we already have the > mutation.setAttribute(REPLAY_WRITES, REPLAY_ONLY_INDEX_WRITES) option that > will help with this) . In this case, we'd execute the logic to compute the > index updates twice, but on the plus side, we wouldn't incur the other > overhead mentioned before. > All in all, it's unclear if this is worth doing. It doesn't make a lot of > sense to use local indexes for transactional tables, since one of the biggest > benefits of local indexes is row level atomicity between index and table rows > is already achieved more generally by transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4641) Perform index maintenance on server-side for transactional local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388175#comment-16388175 ] James Taylor commented on PHOENIX-4641: --- FYI, [~ohads]. Will there be an API that will generate the shadow cells given the index updates? If so, this would be pretty easy to implement. > Perform index maintenance on server-side for transactional local indexes > > > Key: PHOENIX-4641 > URL: https://issues.apache.org/jira/browse/PHOENIX-4641 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Priority: Major > > PHOENIX-4278 changed index maintenance for transactional tables to be > performed on the client side. For local indexes, this is not ideal and not > really necessary as the updates to the indexes will all be local. By doing > this on the client side, we'd incur extra overhead: > - extra RPCs for updates to local index tables separate from RPCs for data > tables > - related to this, more network bandwidth would be used > - calculation on client-side to determine region start key (which is someone > unclear whether there's a race condition with a split occurring while this is > being determined) > - the updates to local indexes would no longer be row-level atomic with data > table HBase updates (though they'd be atomic because they're transactional) > With Tephra, we can do the index maintenance on the server side without > further changes. For Omid, it's more difficult since we must: > - perform all writes > - write to the commit table > - write the shadow cells (which requires knowing the index updates) > If there will already be an API to write the shadow cells (required for the > initial population of local indexes), then perhaps we could piggyback on > that. On the client-side, we could do the following: > - perform all writes > - write to the commit table > - perform writes again, but with a flag set to indicate that only the shadow > cells need to be written (note we already have the > mutation.setAttribute(REPLAY_WRITES, REPLAY_ONLY_INDEX_WRITES) option that > will help with this) . In this case, we'd execute the logic to compute the > index updates twice, but on the plus side, we wouldn't incur the other > overhead mentioned before. > All in all, it's unclear if this is worth doing. It doesn't make a lot of > sense to use local indexes for transactional tables, since one of the biggest > benefits of local indexes is row level atomicity between index and table rows > is already achieved more generally by transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)