[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-10 Thread Igor Kuzmitshov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965203#comment-13965203
 ] 

Igor Kuzmitshov commented on HBASE-6618:


[~alexb], you are right about keeping the mask separate, somehow I forgot that 
? can be a “normal byte”, sorry.

I have just checked other Filters, it seems that all are quite low-level and 
use byte arrays as constructor parameters. It makes sense to use byte arrays as 
parameters to be consistent, but adding a builder could be nice as well.

For me, the biggest “inconvenience” (especially when using HBase shell) of 
constructing a FuzzyRowFilter is not in byte arrays themselves, but in Lists of 
Pairs (or Triples) of byte arrays. I would add a simpler constructor for one 
rule (I guess one rule would be enough quite often) and a separate method to 
add rules:
{code}
FuzzyRowFilter(byte[] fuzzyInfo, byte[] lowerBytes, byte[] upperBytes)
void addRule(byte[] fuzzyInfo, byte[] lowerBytes, byte[] upperBytes)
{code}

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-10 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965545#comment-13965545
 ] 

Alex Baranau commented on HBASE-6618:
-

totally agree on overloading ctor. Will add that. Also will see if Builder 
makes sense: it'd help with these lists of pairs/triples.

Thank you for looking at the patch, [~kuzmiigo]!

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964508#comment-13964508
 ] 

Alex Baranau commented on HBASE-6618:
-

Making it more useable with better API for construction of the filter makes a 
lot of sense to me. It is not that simple as just defining human-readable 
string as bytes might not be human readable. I'm thinking about builder-like 
construction of the filter, should be helpful. As an addition to raw 
definition. Will share thoughts/changes very soon if I figure out a good way 
for it.

bq. There are many notion when using FuzzyRowFilter, could we do these checks 
in the internal of FuzzyRowFilter ?

not sure I got the question, sorry: which checks?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread Igor Kuzmitshov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964536#comment-13964536
 ] 

Igor Kuzmitshov commented on HBASE-6618:


Using (human-readable) strings instead of byte arrays seems possible when 
non-printable bytes are given in \x00 format (widely used in HBase) and 
conversions are done with toBytesBinary() and toStringBinary() of 
org.apache.hadoop.hbase.util.Bytes. Example: from ??a\x00 to ??c\x1F.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964558#comment-13964558
 ] 

Alex Baranau commented on HBASE-6618:
-

right. I mean it won't be human-friendly though still... I thought more about 
smth like this:

{code}
new FuzzyRowFilter.Builder()
  .any(length) // meaning  for 4
  .range(range_start_bytes, range_end_bytes)  // builder will check that 
length of those is the same
  .any(length)
  .fixed(couple_fixed_bytes)
  .build();
{code}

We may also overload with allowing strings if makes sense. So that e.g. 
???(11-88)??AAA could be built with:

{code}
new FuzzyRowFilter.Builder()
  .any(3)
  .range(11, 88)
  .any(2)
  .fixed(AAA)
  .build();
{code}

thoughts?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964560#comment-13964560
 ] 

Alex Baranau commented on HBASE-6618:
-

with human-readable and ? inside - you have to somehow define how to put ? 
if I want it as normal byte...

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964897#comment-13964897
 ] 

chunhui shen commented on HBASE-6618:
-

bq.you have to somehow define how to put ? if I want it as normal byte.
We could use '\' before '?' to define the normal byte '?'

As my consideration,  user could construct FuzzyRowFilter with the readable 
String directly.
e.g ???11??AA\x00??\?
Using Bytes.toBytesBinary to convert the string to bytes, then parse the bytes, 
if the byte is '?', mark it as non-fixed byte, if the byte is '\', skip it and 
see the next byte, and so on

Of course, if user want to make '\x00' as 4 bytes, the above seems wrong.  
For this case, we should also support constructing FuzzyRowFilter with the 
readable byte array.
For example, ???11??AA\x00??\? = 
byte[0]='?'
byte[1]='?'
byte[2]='?'
byte[3]='1'
byte[4]='1'
byte[5]='?'
byte[6]='?'
byte[7]='A'
byte[8]='A'
byte[9]=0
byte[10]='?'
byte[11]='?'
byte[12]='\'
byte[13]='?'

Correct me if something wrong :)


 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964901#comment-13964901
 ] 

chunhui shen commented on HBASE-6618:
-

bq.not sure I got the question, sorry: which checks?
{code}
+ * p
+ *   NOTE that currently no checks are performed to ensure that length of 
ranges lower bytes and
+ *   ranges upper bytes match mask length. Filter may work incorrectly or fail 
(with runtime
+ *   exceptions) if this is broken.
+ * /p
+ *
+ * p
+ *   NOTE that currently no checks are performed to ensure that ranges are 
defined correctly (i.e.
+ *   lower value of each range is not greater than upper value). Filter may 
work incorrectly or fail
+ *   (with runtime exceptions) if this is broken.
+ * /p
+ *
+ * p
+ *   NOTE that currently no checks are performed to ensure that at non-fixed 
positions in
+ *   ranges lower bytes and ranges upper bytes zeroes are set, but 
implementation may rely on this.
+ * /p
{code}
I mean the above checks


 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-09 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964919#comment-13964919
 ] 

Alex Baranau commented on HBASE-6618:
-

got it thanx.

bq. We could use '\' before '?' to define the normal byte '?'

and then \ before \ if we need \. And so on.

I mean we can do that. Having Strings _with special chars_ where impl expects 
_any_ bytes as normal input at times not trivial. And if we really want that we 
would make it in API in some from of standard. And then, we could have it 
everywhere, e.g. in Puts, etc. I am not sure we want to create specific format 
for one filter..

What are your thoughts on builder? seems like can help to avoid all those 
special chars and still keep it very human-friendly. We can allow also Strings 
as I mentioned with \x notation. But the difference is that no pain with 
special chars and easy guiding API for users...

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-08 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963753#comment-13963753
 ] 

chunhui shen commented on HBASE-6618:
-

FuzzyRowFilter seems available only for advanced users.

Should we support creating FuzzyRowFilter with the Human-Readable String, e.g. 
_0004_??
There are many notion when using FuzzyRowFilter, could we do these checks in 
the internal of FuzzyRowFilter ?



 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-05 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961243#comment-13961243
 ] 

Alex Baranau commented on HBASE-6618:
-

[~kuzmiigo], heh - in satisfies() found that it was incorrect. I.e. as you 
described. Sorry I overlooked that. Fixed in latest patch.

[~tedyu] added more tests and also test for compat with client compiled against 
old version of code. Though I'd really ask you to check if that was done 
correctly: don't have a lot of experience with protobuf.

Thank you guys!

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961265#comment-13961265
 ] 

Ted Yu commented on HBASE-6618:
---

Thanks for the quick turnaround.

[~apurtell]: do you want this in 0.98 ?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961272#comment-13961272
 ] 

Hadoop QA commented on HBASE-6618:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638881/HBASE-6618_5.patch
  against trunk revision .
  ATTACHMENT ID: 12638881

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9206//console

This message is automatically generated.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, 
 HBASE-6618_5.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-04 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960916#comment-13960916
 ] 

Alex Baranau commented on HBASE-6618:
-

[~kuzmiigo] 

bq. I thought that the value in the fixed part is checked as whole, but the 
code actually checks its bytes in isolation, so the rule is actually 0(0 - 
9)(0 - 9)(1 - 9)

not true. aa68 will satisfy the rule ??(53 - 97). Added a test specifically for 
that:

{code}
// Range
Assert.assertEquals(FuzzyRowFilter.SatisfiesCode.YES,
FuzzyRowFilter.satisfies(
  new byte[]{1, 1, 6, 8},
  new Triplebyte[], byte[], byte[](
new byte[]{0, 0, 1, 1}, // mask
new byte[]{1, 1, 5, 6}, // upper bytes
new byte[]{1, 1, 9, 7}))); // lower bytes
{code}

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-04 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960924#comment-13960924
 ] 

Alex Baranau commented on HBASE-6618:
-

Updated patch, also uploaded to review board at 
https://reviews.apache.org/r/8786. Very small change to fit latest trunk. 
[~yuzhih...@gmail.com] if you have time by chance - very much appreciate if you 
can review. This version is much better, much more flexible than current one.. 
Thank you a lot in advance

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-04 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960927#comment-13960927
 ] 

Alex Baranau commented on HBASE-6618:
-

I mean than the one available in HBase currently

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960955#comment-13960955
 ] 

Ted Yu commented on HBASE-6618:
---

If a client, compiled with FuzzyRowFilter before this change, uses 
FuzzyRowFilter in Scan, would server side be able to handle ?

Please add more tests for the new rules. This would make code more robust and 
help detect regression.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960963#comment-13960963
 ] 

Hadoop QA commented on HBASE-6618:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12638828/HBASE-6618_4.patch
  against trunk revision .
  ATTACHMENT ID: 12638828

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9204//console

This message is automatically generated.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-04-03 Thread Igor Kuzmitshov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959055#comment-13959055
 ] 

Igor Kuzmitshov commented on HBASE-6618:


Please note that in the version proposed by me (aa68 should satisfy rule ??(53 
- 97)) it's not possible to have adjacent ranges in the rule: the high-level 
??(10-19)(00-30) and ??(1000-1930) will be written as the same range (key 
start, key end, mask): ??1000, ??1930, 11. This can be solved by using 
different values in the mask (it would be more convenient to use 0 for 
non-fixed bytes, 1 for range 1, 2 for range 2 and so on).

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2014-02-28 Thread Igor Kuzmitshov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915732#comment-13915732
 ] 

Igor Kuzmitshov commented on HBASE-6618:


Looking at the description above that rule (0001 - 0999) means any 4 
bytesany 4 bytes value between 0001 and 0999, I thought that the value 
in the fixed part is checked as whole, but the code actually checks its bytes 
in isolation, so the rule is actually 0(0 - 9)(0 - 9)(1 - 9).

It's fine for ranges like this, but let's take another: ??(53 - 97). I would 
expect aa68 to satisfy the rule, but in the proposed implementation it doesn't 
(because bytes are checked in isolation and 8 is outside the range \[3, 7\]). 
Could you clarify if this is the intended behaviour?

If yes, i.e. aa68 should not satisfy rule ??(53 - 97):
It would be nice to make it more clear in the description that all bytes are 
checked in isolation and there are actually no n-bytes values. In this case, 
there is a bug: for rule ??(50 - 97) and value MM58 (where M is max byte \xFF), 
satisfies() returns SatisfiesCode.NO_NEXT because nextRowKeyCandidateExists is 
only updated for non-fixed positions. It should return NEXT_EXISTS, because 
MM60 should be the next key.

If no, i.e. aa68 should satisfy rule ??(53 - 97):
In this case, satisfy() should be fixed. I made a patch with the fix and can 
add it if needed. It also has a small optimisation when there is no need to 
check less significant bytes. For example: for range \[120, 500\] and key 345, 
it will compare the first byte (3) only, as it's clear that the whole value is 
in the range.

In any case, tests might include testing satisfy() with ranges (the current 
patch only adds tests for getNextForFuzzyRule() with ranges).

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-12-24 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856495#comment-13856495
 ] 

Alex Baranau commented on HBASE-6618:
-

Yeah, looks like nobody looked at the patch, even though I know others use it 
(patch). Weird and don't know how to push anyone to do that.

Not sure if patch fits latest version. If there's still an interest from any 
committer (I hope so) to take a look and proceed with the issue, I will take a 
look at it and make sure it is good for latest version.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-12-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856496#comment-13856496
 ] 

Ted Yu commented on HBASE-6618:
---

Alex:
I should have time to review.
Please update the patch. 

Thanks

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-03-07 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596725#comment-13596725
 ] 

James Taylor commented on HBASE-6618:
-

Would it be possible to generalize this a bit further to handle variable length 
key parts assuming you know the terminator (both Phoenix and Orderly, use a 0 
byte terminator)? With the work you've already done here to support ranges, you 
could support a good set of skip scanning scenarios for multi-part row keys.

Take for example a fuzzy row filter expressed for a two part row key of VARCHAR 
+ byte[4] like this:

'foo%' (this would be for the VARCHAR key part - 'foo' followed by zero or 
more characters)
 [4000-6000) (this would be for the INT key part - 4000 inclusive to 6000 
exclusive)

In this case (as you've already pointed out), you can use the first row key as 
your guide. Let's say the first row key is ['foobar'][1000]. You could form a 
skip hint as ['foobar'][4000] (i.e. 'foobar' + new byte[] {0} + new byte[] 
{1,0,0,0}).
Then you'd let all values pass until you got to ['foobar'][6000], in which case 
you'd form your next skip hint.



 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618_3.path, 
 HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-03-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596734#comment-13596734
 ] 

Ted Yu commented on HBASE-6618:
---

bq. skip hint as ['foobar'][4000] (i.e. 'foobar' + new byte[] {0} + new byte[] 
{1,0,0,0}).
Is there a typo above ?
Should it be:
{code}
'foobar' + new byte[] {0} + new byte[] {4,0,0,0}
{code}

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618_3.path, 
 HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-03-07 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596741#comment-13596741
 ] 

James Taylor commented on HBASE-6618:
-

Yes, that's a typo. Thanks - you've got good eyes!

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618_3.path, 
 HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-01-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543171#comment-13543171
 ] 

Hadoop QA commented on HBASE-6618:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12563115/HBASE-6618_3.path
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer
  org.apache.hadoop.hbase.ipc.TestRpcMetrics

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3826//console

This message is automatically generated.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618_3.path, 
 HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2013-01-03 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543182#comment-13543182
 ] 

Alex Baranau commented on HBASE-6618:
-

bq. -1 lineLengths. The patch introduces lines longer than 100

I guess that's because of generated code (from *.proto). That should be OK, 
right?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618_3.path, 
 HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-12-31 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541465#comment-13541465
 ] 

Alex Baranau commented on HBASE-6618:
-

Created https://reviews.apache.org/r/8786. I think I will add more unit-tests. 
Comments are very welcome!

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-12-31 Thread Anil Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541468#comment-13541468
 ] 

Anil Gupta commented on HBASE-6618:
---

Hi Alex,

Actually, we decided not to use full-table scans for this type of query. Hence, 
i could not devote time on this. We are trying out Secondary Index in HBase.
Sorry, for the late update.

Wish you a Happy New Year!
~Anil Gupta
Software Engineer II, Intuit, Inc 

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-12-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541510#comment-13541510
 ] 

Hadoop QA commented on HBASE-6618:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562810/HBASE-6618.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer
  org.apache.hadoop.hbase.filter.TestFuzzyRowFilter
  org.apache.hadoop.hbase.ipc.TestRpcMetrics

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3788//console

This message is automatically generated.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, 
 HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-12-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541532#comment-13541532
 ] 

Hadoop QA commented on HBASE-6618:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562822/HBASE-6618_2.path
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer
  org.apache.hadoop.hbase.ipc.TestRpcMetrics

 {color:red}-1 core zombie tests{color}.  There are 3 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3789//console

This message is automatically generated.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618_2.path, HBASE-6618-algo-desc-bits.png, 
 HBASE-6618-algo.patch, HBASE-6618.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-27 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442944#comment-13442944
 ] 

Alex Baranau commented on HBASE-6618:
-

Weird. I can open it. Anyhow, sent it to your email.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-26 Thread Anil Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442276#comment-13442276
 ] 

Anil Gupta commented on HBASE-6618:
---

Hi Alex,

I am still unable to access the png file for the algorithm. Is there some 
problem with JIRA system? or Can you re-upload the image?

Thanks,
Thanks,
Anil Gupta
Software Engineer II, Intuit, Inc 

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-23 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440242#comment-13440242
 ] 

Alex Baranau commented on HBASE-6618:
-

Ah, sorry, haven't said anything about that. For toInc - we may not change it 
at every step, so if there's a missing arrow, that means nothing should be 
changed.

Thanx for checking out!

One thing that I'm not 100% sure about - is it better to adjust current 
FuzzyRowFilter and this functionality to it or add new. I'm leaning towards 
adjusting FuzzyRowFilter as this new feature fits naturally in it. Thoughts?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-23 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440268#comment-13440268
 ] 

Zhihong Ted Yu commented on HBASE-6618:
---

Enhancing existing class is fine. 

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-22 Thread Anil Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439311#comment-13439311
 ] 

Anil Gupta commented on HBASE-6618:
---

Hi Alex,

I agree with you idea of RangeBased Fuzzy Filter. However, I would like to take 
a phased approach in developing this:
In your proposal, the user can provide multiple fuzzy ranges in a single scan. 
i.e. any 4 bytesany 6 bytes value between _0001 and 0099any 3 
bytesany 4 bytes value between _001 and _099
Instead of the above, IMO lets try to make a filter for any 4 bytesany 6 
bytes value between _0001 and 0099any 3 bytes or any 4 bytesany 6 
bytes value between _0001 and 0099. Once we develop this then we can 
enhance it to use multiple fuzzy ranges. This is just my thought/approach of 
developing this. Let me know your opinion.

From this week, at work I had to shift focus from HBase to Hive and HCatalog 
for another POC. So, I'll be squeezing time for this JIRA out of work 
schedule. I'll start looking into the current implementation of FuzzyRowFilter 
to get idea about implementation.

Thanks,
Anil Gupta
Software Engineer II, Intuit, Inc 

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor

 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-22 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440039#comment-13440039
 ] 

Zhihong Ted Yu commented on HBASE-6618:
---

Thanks for the update, Alex.
I get your idea, though a few arrows seem to be missing (e.g. CCF is ?) in the 
diagram for toInc.

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor
 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch


 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-20 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438134#comment-13438134
 ] 

Alex Baranau commented on HBASE-6618:
-

Just an idea. May be we should try improve existing FuzzyRowFilter by allowing 
to specify each fuzzy rule with:
* fuzzy key start
* fuzzy key end  this is currently missing in FuzzyRowFilter
* mask

This looks flexible enough to me. E.g. one could specify rule 
(_0001_-_0099_)???(_001-_099), i.e. any 4 bytesany 6 bytes value between 
_0001_ and _0099_any 3 bytesany 4 bytes value between _001 and 
_099 with this definition:
* _0001_???_001
* _0099_???_099  currently missing
* 00111

In this case any sequence of fixed positions treated as one n-bytes value.

--
Alternatively, such fuzzy rule can be specified as list of parts, each part 
being one of:
* n fuzzy bytes
* start/stop key part range (of the same length)

This might be closer to human-readable definition, though the former one 
could be easier to deal with.

Anil, as you expressed willing to work on this, what are your thoughts? May be 
you have smth different in your mind?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor

 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support

2012-08-20 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438147#comment-13438147
 ] 

Alex Baranau commented on HBASE-6618:
-

Sorry for the spam, for some reason I cannot edit the comment and JIRA broke 
formatting for the text pieces of my previous comment (I should have checked 
that first, sorry). This is how it supposed to look:

Just an idea. May be we should try improve existing FuzzyRowFilter by allowing 
to specify each fuzzy rule with:
* fuzzy key start
* fuzzy key end  this is currently missing in FuzzyRowFilter
* mask

This looks flexible enough to me. E.g. one could specify rule ?\?\??(0001 - 
0999)???(001 - 099), i.e. any 4 bytesany 4 bytes value between 0001 and 
0999any 3 bytesany 3 bytes value between 001 and 099 with this 
definition:
* ?\?\??0001???001
* ?\?\??0999???099  currently missing
* 111000

In this case any sequence of fixed positions treated as one n-bytes value.

Alternatively, such fuzzy rule can be specified as list of parts, each part 
being one of:
* n fuzzy bytes
* start/stop key part range (of the same length)

This might be closer to human-readable definition, though the former one 
could be easier to deal with.

Anil, as you expressed willing to work on this, what are your thoughts? May be 
you have smth different in your mind?

 Implement FuzzyRowFilter with ranges support
 

 Key: HBASE-6618
 URL: https://issues.apache.org/jira/browse/HBASE-6618
 Project: HBase
  Issue Type: New Feature
  Components: filters
Reporter: Alex Baranau
Priority: Minor

 Apart from current ability to specify fuzzy row filter e.g. for 
 userId_actionId format as _0004 (where 0004 - actionId) it would be 
 great to also have ability to specify the fuzzy range , e.g. _0004, 
 ..., _0099.
 See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
 Note: currently it is possible to provide multiple fuzzy row rules to 
 existing FuzzyRowFilter, but in case when the range is big (contains 
 thousands of values) it is not efficient.
 Filter should perform efficient fast-forwarding during the scan (this is what 
 distinguishes it from regex row filter).
 While such functionality may seem like a proper fit for custom filter (i.e. 
 not including into standard filter set) it looks like the filter may be very 
 re-useable. We may judge based on the implementation that will hopefully be 
 added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira