[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan
[ https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082635#comment-15082635 ] ramkrishna.s.vasudevan edited comment on HBASE-14221 at 1/5/16 8:31 AM: Patch that avoid compareRows in StoreScanner layer alone. Previous patches were focussing on HRegion layer also. Now in a test like TestMultiColumnScanner we avoid around 2k - 3k compreRows comparisons with this patch. The idea is to make the matcher's curCell to null when we ever the Matcher says SEEK_NEXT_ROW or INCLUDE_AND_SEEK_NEXT_ROW. This is because we are sure that the seek would have fetched the next row and so the next cell will any way be the next row and the current next() call should come out with a DONE call. [~larsh] What do you think of this patch? was (Author: ram_krish): Patch that avoid compareRows in StoreScanner layer alone. Previous patches were focussing on HRegion layer also. Now in a test like TestMultiColumnScanner we avoid around 2k - 3k compreRows comparisons with this patch. The idea is to make the matcher's curCell to null when we ever the Matcher says SEEK_NEXT_ROW or INCLUDE_AND_SEEK_NEXT_ROW. This is because we are sure that the seek would have fetch the next row and so the next cell will any way be the next row so the current next() call should come out with a DONE call. [~larsh] What do you think of this patch? > Reduce the number of time row comparison is done in a Scan > -- > > Key: HBASE-14221 > URL: https://issues.apache.org/jira/browse/HBASE-14221 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, > HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, > HBASE-14221_9.patch, withmatchingRowspatch.png, withoutmatchingRowspatch.png > > > When we tried to do some profiling with the PE tool found this. > Currently we do row comparisons in 3 places in a simple Scan case. > 1) ScanQueryMatcher > {code} >int ret = this.rowComparator.compareRows(curCell, cell); > if (!this.isReversed) { > if (ret <= -1) { > return MatchCode.DONE; > } else if (ret >= 1) { > // could optimize this, if necessary? > // Could also be called SEEK_TO_CURRENT_ROW, but this > // should be rare/never happens. > return MatchCode.SEEK_NEXT_ROW; > } > } else { > if (ret <= -1) { > return MatchCode.SEEK_NEXT_ROW; > } else if (ret >= 1) { > return MatchCode.DONE; > } > } > {code} > 2) In StoreScanner next() while starting to scan the row > {code} > if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || > matcher.curCell == null || > isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) { > this.countPerRow = 0; > matcher.setToNewRow(peeked); > } > {code} > Particularly to see if we are in a new row. > 3) In HRegion > {code} > scannerContext.setKeepProgress(true); > heap.next(results, scannerContext); > scannerContext.setKeepProgress(tmpKeepProgress); > nextKv = heap.peek(); > moreCellsInRow = moreCellsInRow(nextKv, currentRowCell); > {code} > Here again there are cases where we need to careful for a MultiCF case. Was > trying to solve this for the MultiCF case but is having lot of cases to > solve. But atleast for a single CF case I think these comparison can be > reduced. > So for a single CF case in the SQM we are able to find if we have crossed a > row using the code pasted above in SQM. That comparison is definitely needed. > Now in case of a single CF the HRegion is going to have only one element in > the heap and so the 3rd comparison can surely be avoided if the > StoreScanner.next() was over due to MatchCode.DONE caused by SQM. > Coming to the 2nd compareRows that we do in StoreScanner. next() - even that > can be avoided if we know that the previous next() call was over due to a new > row. Doing all this I found that the compareRows in the profiler which was > 19% got reduced to 13%. Initially we can solve for single CF case which can > be extended to MultiCF cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan
[ https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084988#comment-15084988 ] ramkrishna.s.vasudevan edited comment on HBASE-14221 at 1/6/16 5:16 AM: Pushed to master. Thanks for the reviews and comments over here [~lhofhansl]. Should this be pushed to 1.0 branches as well? was (Author: ram_krish): Pushed to master. Thanks for the reviews and comments over there [~lhofhansl]. Should this be pushed to 1.0 branches as well? > Reduce the number of time row comparison is done in a Scan > -- > > Key: HBASE-14221 > URL: https://issues.apache.org/jira/browse/HBASE-14221 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, > HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, > HBASE-14221_9.patch, withmatchingRowspatch.png, withoutmatchingRowspatch.png > > > When we tried to do some profiling with the PE tool found this. > Currently we do row comparisons in 3 places in a simple Scan case. > 1) ScanQueryMatcher > {code} >int ret = this.rowComparator.compareRows(curCell, cell); > if (!this.isReversed) { > if (ret <= -1) { > return MatchCode.DONE; > } else if (ret >= 1) { > // could optimize this, if necessary? > // Could also be called SEEK_TO_CURRENT_ROW, but this > // should be rare/never happens. > return MatchCode.SEEK_NEXT_ROW; > } > } else { > if (ret <= -1) { > return MatchCode.SEEK_NEXT_ROW; > } else if (ret >= 1) { > return MatchCode.DONE; > } > } > {code} > 2) In StoreScanner next() while starting to scan the row > {code} > if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || > matcher.curCell == null || > isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) { > this.countPerRow = 0; > matcher.setToNewRow(peeked); > } > {code} > Particularly to see if we are in a new row. > 3) In HRegion > {code} > scannerContext.setKeepProgress(true); > heap.next(results, scannerContext); > scannerContext.setKeepProgress(tmpKeepProgress); > nextKv = heap.peek(); > moreCellsInRow = moreCellsInRow(nextKv, currentRowCell); > {code} > Here again there are cases where we need to careful for a MultiCF case. Was > trying to solve this for the MultiCF case but is having lot of cases to > solve. But atleast for a single CF case I think these comparison can be > reduced. > So for a single CF case in the SQM we are able to find if we have crossed a > row using the code pasted above in SQM. That comparison is definitely needed. > Now in case of a single CF the HRegion is going to have only one element in > the heap and so the 3rd comparison can surely be avoided if the > StoreScanner.next() was over due to MatchCode.DONE caused by SQM. > Coming to the 2nd compareRows that we do in StoreScanner. next() - even that > can be avoided if we know that the previous next() call was over due to a new > row. Doing all this I found that the compareRows in the profiler which was > 19% got reduced to 13%. Initially we can solve for single CF case which can > be extended to MultiCF cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan
[ https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956224#comment-14956224 ] Lars Hofhansl edited comment on HBASE-14221 at 10/14/15 4:18 AM: - I think [~mcorgan]'s KeyValueScannerHeap is worth exploring still (see later on that jira). It beats PriorityQueue in every test, and since it is our implementation we can further tweak it down the road. Matt's MIA unfortunately, but I plan to test some more with it. (And I have some awesome database guys sitting less than 30 feet form me, and they come up with a striking similar scanner approach for their LSM based database) was (Author: lhofhansl): I think [~mcorgan] KeyValueScannerHeap is worth exploring still (see later on that jira). It's beats PriorityQueue in every test, and since it is our implementation we can further tweak it down the road. Matt's MIA unfortunately, but I plan to test some more with it. (And I have some awesome database guys sitting less than 30 feet form me, and they come up with a striking similar scanner approach for their LSM based database) > Reduce the number of time row comparison is done in a Scan > -- > > Key: HBASE-14221 > URL: https://issues.apache.org/jira/browse/HBASE-14221 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, > HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, > withmatchingRowspatch.png, withoutmatchingRowspatch.png > > > When we tried to do some profiling with the PE tool found this. > Currently we do row comparisons in 3 places in a simple Scan case. > 1) ScanQueryMatcher > {code} >int ret = this.rowComparator.compareRows(curCell, cell); > if (!this.isReversed) { > if (ret <= -1) { > return MatchCode.DONE; > } else if (ret >= 1) { > // could optimize this, if necessary? > // Could also be called SEEK_TO_CURRENT_ROW, but this > // should be rare/never happens. > return MatchCode.SEEK_NEXT_ROW; > } > } else { > if (ret <= -1) { > return MatchCode.SEEK_NEXT_ROW; > } else if (ret >= 1) { > return MatchCode.DONE; > } > } > {code} > 2) In StoreScanner next() while starting to scan the row > {code} > if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || > matcher.curCell == null || > isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) { > this.countPerRow = 0; > matcher.setToNewRow(peeked); > } > {code} > Particularly to see if we are in a new row. > 3) In HRegion > {code} > scannerContext.setKeepProgress(true); > heap.next(results, scannerContext); > scannerContext.setKeepProgress(tmpKeepProgress); > nextKv = heap.peek(); > moreCellsInRow = moreCellsInRow(nextKv, currentRowCell); > {code} > Here again there are cases where we need to careful for a MultiCF case. Was > trying to solve this for the MultiCF case but is having lot of cases to > solve. But atleast for a single CF case I think these comparison can be > reduced. > So for a single CF case in the SQM we are able to find if we have crossed a > row using the code pasted above in SQM. That comparison is definitely needed. > Now in case of a single CF the HRegion is going to have only one element in > the heap and so the 3rd comparison can surely be avoided if the > StoreScanner.next() was over due to MatchCode.DONE caused by SQM. > Coming to the 2nd compareRows that we do in StoreScanner. next() - even that > can be avoided if we know that the previous next() call was over due to a new > row. Doing all this I found that the compareRows in the profiler which was > 19% got reduced to 13%. Initially we can solve for single CF case which can > be extended to MultiCF cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan
[ https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946327#comment-14946327 ] Lars Hofhansl edited comment on HBASE-14221 at 10/7/15 6:08 AM: [~ram_krish], take a look at the "-takeALook" sample. That's what I mean. I let the SQM decide when a new row is found (it's better encapsulation, and it's doing the comparison there anyway). Haven't tested in beyond running TestScanner and TestAtomicOperation, which both still pass. (I am not suggesting we use my patch, it's just easier to explain what I mean by having it in a patch rather then describing it in words). was (Author: lhofhansl): [~ram_krish], take a look at the "-takeALook" sample. That's what I mean. I let the SQM decide when a new row is found (it's better encapsulation, and it's doing the comparison there anyway). Haven't tested in beyond running TestScanner and TestAtomicOperation, which both still pass. > Reduce the number of time row comparison is done in a Scan > -- > > Key: HBASE-14221 > URL: https://issues.apache.org/jira/browse/HBASE-14221 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, > HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, > withmatchingRowspatch.png, withoutmatchingRowspatch.png > > > When we tried to do some profiling with the PE tool found this. > Currently we do row comparisons in 3 places in a simple Scan case. > 1) ScanQueryMatcher > {code} >int ret = this.rowComparator.compareRows(curCell, cell); > if (!this.isReversed) { > if (ret <= -1) { > return MatchCode.DONE; > } else if (ret >= 1) { > // could optimize this, if necessary? > // Could also be called SEEK_TO_CURRENT_ROW, but this > // should be rare/never happens. > return MatchCode.SEEK_NEXT_ROW; > } > } else { > if (ret <= -1) { > return MatchCode.SEEK_NEXT_ROW; > } else if (ret >= 1) { > return MatchCode.DONE; > } > } > {code} > 2) In StoreScanner next() while starting to scan the row > {code} > if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || > matcher.curCell == null || > isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) { > this.countPerRow = 0; > matcher.setToNewRow(peeked); > } > {code} > Particularly to see if we are in a new row. > 3) In HRegion > {code} > scannerContext.setKeepProgress(true); > heap.next(results, scannerContext); > scannerContext.setKeepProgress(tmpKeepProgress); > nextKv = heap.peek(); > moreCellsInRow = moreCellsInRow(nextKv, currentRowCell); > {code} > Here again there are cases where we need to careful for a MultiCF case. Was > trying to solve this for the MultiCF case but is having lot of cases to > solve. But atleast for a single CF case I think these comparison can be > reduced. > So for a single CF case in the SQM we are able to find if we have crossed a > row using the code pasted above in SQM. That comparison is definitely needed. > Now in case of a single CF the HRegion is going to have only one element in > the heap and so the 3rd comparison can surely be avoided if the > StoreScanner.next() was over due to MatchCode.DONE caused by SQM. > Coming to the 2nd compareRows that we do in StoreScanner. next() - even that > can be avoided if we know that the previous next() call was over due to a new > row. Doing all this I found that the compareRows in the profiler which was > 19% got reduced to 13%. Initially we can solve for single CF case which can > be extended to MultiCF cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)