subject:"\[jira\] \[Commented\] \(HBASE\-4433\) avoid extra next \(potentially a seek\) if done with column\/row"

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-11-27 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834297#comment-13834297
]

Lars Hofhansl commented on HBASE-4433:
--

reseek was also dramatically improved with HBASE-9915 if a block encoder is
used.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[Noticed this in 89, but quite likely true of trunk as well.]
When we are done with the requested column(s) the code still does an extra
next() call before it realizes that it is actually done. This extra next()
call could potentially result in an unnecessary extra block load. This is
likely to be especially bad for CFs where the KVs are large blobs where each
KV may be occupying a block of its own. So the next() can often load a new
unrelated block unnecessarily.
--
For the simple case of reading say the top-most column in a row in a single
file, where each column (KV) was say a block of its own-- it seems that we
are reading 3 blocks, instead of 1 block!
I am working on a simple patch and with that the number of seeks is down to
2.
[There is still an extra seek left. I think there were two levels of
extra/unnecessary next() we were doing without actually confirming that the
next was needed. One at the StoreScanner/ScanQueryMatcher level which this
diff avoids. I think the other is at hfs.next() (at the storefile scanner
level) that's happening whenever a HFile scanner servers out a data-- and
perhaps that's the additional seek that we need to avoid. But I want to
tackle this optimization first as the two issues seem unrelated.]
--
The basic idea of the patch I am working on/testing is as follows. The
ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if
the KV needs to be included and then if done, only in the the next call it
returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases
when ExplicitColumnTracker knows it is done with a particular column/row, the
patch attempts to combine the INCLUDE code and done hint into a single match
code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-05 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593210#comment-13593210
]

ramkrishna.s.vasudevan commented on HBASE-4433:
---

bq.b.t.w how to modify exist comment? Find no way to do it, while it seems some
one could modify their comment.
You need admin access for that.
Your above points makes sense. Was going thro the code and hence got the
doubt.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-04 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592938#comment-13592938
]

Raymond Liu commented on HBASE-4433:

To figure out how much overhead the seek will have. I read a few more code. My
table is major compacted. And it seems that under this situation. The lazy seek
approaching doesn't help. since there are only 1 scanner involved. Still each
time this scanner will go through a lazy seek, then add to heap , sort, poll
out , for a second real seek. it introduce one extra lazy seek and construction
of a second fake key for seek. And the best path should be go direct seek
without this lazy seek when there are only 1 storefilescanner is involved ( or
1 storefilescanner + 1 memstorescanner?). And I tweak the code a little bit to
find out how much it will impact the result. it show to me the scan time is
reduced from 260s to 240s for include_and_seek, though still far from 190s for
include then seek since there are still one seek involved which is expensive
than next.

However I find it hard to do thing right if you want to switch from lazy seek
to non_lazy seek later. try to read more code to find a solution.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-04 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593022#comment-13593022
]

ramkrishna.s.vasudevan commented on HBASE-4433:
---

Nice findings Liu. As Lars pointed out we can work on improvments here. Add
some intelligence or some mathematics to figure out which path to take under
what condition.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-04 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593129#comment-13593129
]

ramkrishna.s.vasudevan commented on HBASE-4433:
---

bq. The lazy seek approaching doesn't help. since there are only 1 scanner
involved.
Can you brief more on this. Basically lazy seek helps to reduce the numbers of
hFiles to be seeked right?

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-04 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593145#comment-13593145
]

Raymond Liu commented on HBASE-4433:

Right, Lazy seek try to avoid seek in old hfiles when possible. While for my
case, there are only 1 hFiles for Major compact is done. And also , during
scan, storeFileScanner could be closed when done. Thus sooner or later, there
will be only one storeFileScanner remain.

And there are various other situation. say if you need to scan all version of
data, in this case, a lazy seek just push the real seek later. But do not
reduce the number of real seek.

In both case, lazy seek will add overheads.

Of course, when there are a lot of hfiles with different version of rows , and
you just want to get the first version out of it. in this case lazy seek will
provide helps.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-04 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593147#comment-13593147
]

Raymond Liu commented on HBASE-4433:

And also , when there are only one version of row exist, no matter how many
hfile you have, a sequence scan operation will always need scan all the hfile
row by row. you don't skip any real seek by lazy seek. And in many case, like
hive on top of hbase or a bulkloaded read only table, I think it's quite normal
that a row only got one version.

b.t.w how to modify exist comment? Find no way to do it, while it seems some
one could modify their comment.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-03-01 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590820#comment-13590820
]

Lars Hofhansl commented on HBASE-4433:
--

Thanks Raymond. Seems like there's room for improvement in many scenarios. I'll
also do some tests.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589333#comment-13589333
]

Raymond Liu commented on HBASE-4433:

I have run another test, say with the same 200G 18 column table, I do scan on
every other column.
Thus with include then seek approaching, it will be c1 - next c2 - seek c3 -
next c4 - seek c5 ...
And with include_and_seek approaching, it will be c1 - seek c3 - seek c5 ...

Say, an extra next is involved for each seek op. And this is the worst case for
include then seek approaching. While in my case, this two approaching don't
show noticeable performance difference. say all around 207s. While for the
previous best case(c1-next c2- next c3 v.s. c1-seek c2-seek c3) 190s vs
250s.

So, if the next() op do not involve extra block loading, I think this is
acceptable.
And for extra block loading, only happens when the next col is in next block,
and it fully occupy the next block. This could be rare ( either col is huge, in
this case, default block size should be adjusted? or history version is huge,
in this case, only when the current kv happen to be the very last kv in current
block, and the next block is all occupied by history versions)

And also, the wildcolumntracker now go with include and seek approaching when
max version is achieved.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Kannan Muthukkaruppan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589705#comment-13589705
]

Kannan Muthukkaruppan commented on HBASE-4433:
--

Sorry for missing this thread. Will post a more detailed reply when I am at
the computer. In a later jira we fixed it such that seek is really cheap if
it is to a key within the same block. No need for log(n) walk thru the
index if key we are seeking to is in the same block.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589722#comment-13589722
]

Lars Hofhansl commented on HBASE-4433:
--

Thanks Kannan. Looks like something we should into the 0.94/0.95/trunk branches
as well (assuming from Raymond's numbers that this change is only in the FB
branch).

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589763#comment-13589763
]

Liyin Tang commented on HBASE-4433:
---

Hi Lars, the jira Kannan mentioned is [HBASE-5987] HFileBlockIndex
improvements. By looking ahead at the next indexed key, HBase internal reader
knows whether to keep scanning the current DataBlock or look up the index. This
feature avoids additional index lookup overhead when multiple requests are
sequentially scanning the HFile data block.

Actually, we have a list of jiras in our FB internal HBase release. Do you know
a proper place we could share these work with more hbase-dev ?

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Kannan Muthukkaruppan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589767#comment-13589767
]

Kannan Muthukkaruppan commented on HBASE-4433:
--

The relevant JIRA that addresses this issue is: HBASE-5987.

Basically, whenever we go done an index, we also lookahead and maintain the
start key of the next block in the HFileScanner state. When a need to reseek to
a key arises, we do a quick check to see if the key is in the same block (i.e.
is less than the start key of the next block). If it is, the reseek doesn't
need to consult the index again and can simple march along in the same block to
find the key; else, it uses the index to find the block it needs to go to.

Looks like this was fixed in 0.95. Raymond: Which version are you trying this
with?
---

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589776#comment-13589776
]

Ted Yu commented on HBASE-4433:
---

HBASE-5987 has been ported to 0.94 through HBASE-6032
Meaning, the improvement is in 0.94.3

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589784#comment-13589784
]

Lars Hofhansl commented on HBASE-4433:
--

Thanks Liyin, Kannan, and Ted :)

[~colorant] Which version of HBase did you use for your tests?

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590138#comment-13590138
]

Raymond Liu commented on HBASE-4433:

Hi, I did this test in 0.94.1 , but I have already port HBASE-6032 onto it.
without this patch, the difference is even larger.

So this is not about index key issue.
I think the overhead is that the fake key need to be construct for a seek
operation. And still the seek op itself slightly expensive than get op.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590194#comment-13590194
]

Raymond Liu commented on HBASE-4433:

Anyway, To make sure no other issue might impact on the result. I do the same
test again upon 0.94.5. And with similar result.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588107#comment-13588107
]

Raymond Liu commented on HBASE-4433:

I got a issue here related to this one. For a table which do not have multiple
version for it's row. each row only got a single version. thus, a next
operation will read in the next column's keyvalue and match the next column
without a seek operation. In this case, this next() operation is actually save
the time and improve the performance. With a 200G table to scan in my test,
next instead of seek with be 30% faster. say 190s v.s. 250s.

So I think this behavior might need to be treat differently for different
situation. For I think this one version each row read only table is also very
typical case. And this patch actually make the performance worse.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588122#comment-13588122
]

ramkrishna.s.vasudevan commented on HBASE-4433:
---

Reading the description of JIRA i understand it was basically done for large
blobs. Hence they tried to seek and then next() so that unnecessary block seek
does not happen.
So your case is a plain case where you just need the next column.

Any suggestions how to go about with this? Can we have some configuration?

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589068#comment-13589068
]

Raymond Liu commented on HBASE-4433:

I am wondering, we might add a conf to let user choose the strategy to allow
include_and_seek or just separate include/seek. However, the difference of this
kind of settings might not be easy to be figure out by an end user. And whether
the table have many history versions or not also totally depends on the usage
of the table. Better to have some auto select mechanism to help with it.

If the table is mainly go with one time write/many read mode, only user know
it, I don't know is there any way to find out this by hbase itself?

While if table is configed with MAX history VERSION set to 1 etc, Then for most
chance I guess it is safe for the column tracker to go with separate
include/seek approaching.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589206#comment-13589206
]

ramkrishna.s.vasudevan commented on HBASE-4433:
---

I agree Raymond with you on the part that end user cannot figure it out.
But having a config knob will atleast help in understanding the behaviour of
the application and then decide on the nature of the include/seek mechanism.
Also having a knob will atleast help users not to recompile code by making
changes in the code. Just saying.
But still will there be a chance that the
bq.When we are done with the requested column(s) the code still does an extra
next() call before it realizes that it is actually done. This extra next() call
could potentially result in an unnecessary extra block load
This may happen.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread Raymond Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589221#comment-13589221
]

Raymond Liu commented on HBASE-4433:

You are right, there are chance that an extra next() will be called. And for a
large kv that occupy a single block, this might have it load an unnecessary
extra block, while for most case if the single kv is not that big, then the
next block always need to be loaded even for seek_next_col, seek_next_row might
not if it involves a lot of cols that one row span multi blocks.

And, if not for an extra big KV, for multi history version columns, this extra
next might not cost much even it actually need to be seek through, for It save
part of the time for seek since it is already passed. Anyway, it will need real
case to verify the performance impact.

And , Yes, I agree with you, if we can't tell which mechanism should be used, a
configure is very useful.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-27 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589224#comment-13589224
]

Lars Hofhansl commented on HBASE-4433:
--

Interesting! This is almost impossible to get right automatically I think. Even
with MAX_VERSIONS=1 there might be a bunch of version, where INCLUDE_AND_SEEK_*
is better.

Could use the size of the KV as a guidepost. If MAX_VERSIONS * size than the
HFile blocksize (64k by default) we could do INCLUDE_AND_SEEK, other do INCLUDE
following by SEEK (if needed).

(Just made this up, but we can probably use some heuristic like this)

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115353#comment-13115353
 ] 

Hudson commented on HBASE-4433:
---

Integrated in HBase-TRUNK #2261 (See 
[https://builds.apache.org/job/HBase-TRUNK/2261/])
HBASE-4433  avoid extra next (potentially a seek) if done with column/row 
(kannan via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.94.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Andrew Purtell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115743#comment-13115743
]

Andrew Purtell commented on HBASE-4433:
---

According to my tests, this is safe to do on 0.92 and 0.90 branches as well.
This change should be applied there.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.94.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Jonathan Gray (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115867#comment-13115867
]

Jonathan Gray commented on HBASE-4433:
--

Is this not strictly an improvement/feature? It seems like it doesn't belong
in stable branches :)

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115938#comment-13115938
 ] 

Hudson commented on HBASE-4433:
---

Integrated in HBase-0.92 #23 (See 
[https://builds.apache.org/job/HBase-0.92/23/])
HBASE-4433: avoid extra next (potentially a seek) if done with column/row
HBASE-4433: avoid extra next (potentially a seek) if done with column/row

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-26 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115113#comment-13115113
]

Kannan Muthukkaruppan commented on HBASE-4433:
--

ping. for code review.

test suite ran clean.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-26 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115191#comment-13115191
]

Ted Yu commented on HBASE-4433:
---

+1 on patch.
Nice work.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-24 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114113#comment-13114113
]

jirapos...@reviews.apache.org commented on HBASE-4433:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2044/
---

Review request for Michael Stack, Jonathan Gray and Mikhail Bautin.

Summary
---

Avoids extra next (potentially seek) calls when we are done with each column
requested.

This addresses bug HBASE-4433.
https://issues.apache.org/jira/browse/HBASE-4433

Diffs
-

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
1175286

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
1175286

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
1175286

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
1175286

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
1175286

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
1175286

Diff: https://reviews.apache.org/r/2044/diff

Testing
---

Ran TestBlocksRead/TestExplicitColumnTracker/TestQueryMatcher. Running the full
suite now.

Thanks,

Kannan

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-19 Thread Jonathan Gray (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108110#comment-13108110
]

Jonathan Gray commented on HBASE-4433:
--

Good stuff. I think the first iteration of the ColumnTracker had the
INCLUDE_AND_* primitives but it was simplified. Would be pretty cool that
write up a unit test that creates single-KV sized blocks and you could run
various queries to see the number of blocks accessed. Especially nice to catch
regressions in the future.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

31 matches

Site Navigation

Mail list logo

Footer information