[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965203#comment-13965203 ] Igor Kuzmitshov commented on HBASE-6618: [~alexb], you are right about keeping the mask separate, somehow I forgot that ? can be a “normal byte”, sorry. I have just checked other Filters, it seems that all are quite low-level and use byte arrays as constructor parameters. It makes sense to use byte arrays as parameters to be consistent, but adding a builder could be nice as well. For me, the biggest “inconvenience” (especially when using HBase shell) of constructing a FuzzyRowFilter is not in byte arrays themselves, but in Lists of Pairs (or Triples) of byte arrays. I would add a simpler constructor for one rule (I guess one rule would be enough quite often) and a separate method to add rules: {code} FuzzyRowFilter(byte[] fuzzyInfo, byte[] lowerBytes, byte[] upperBytes) void addRule(byte[] fuzzyInfo, byte[] lowerBytes, byte[] upperBytes) {code} Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965545#comment-13965545 ] Alex Baranau commented on HBASE-6618: - totally agree on overloading ctor. Will add that. Also will see if Builder makes sense: it'd help with these lists of pairs/triples. Thank you for looking at the patch, [~kuzmiigo]! Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964508#comment-13964508 ] Alex Baranau commented on HBASE-6618: - Making it more useable with better API for construction of the filter makes a lot of sense to me. It is not that simple as just defining human-readable string as bytes might not be human readable. I'm thinking about builder-like construction of the filter, should be helpful. As an addition to raw definition. Will share thoughts/changes very soon if I figure out a good way for it. bq. There are many notion when using FuzzyRowFilter, could we do these checks in the internal of FuzzyRowFilter ? not sure I got the question, sorry: which checks? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964536#comment-13964536 ] Igor Kuzmitshov commented on HBASE-6618: Using (human-readable) strings instead of byte arrays seems possible when non-printable bytes are given in \x00 format (widely used in HBase) and conversions are done with toBytesBinary() and toStringBinary() of org.apache.hadoop.hbase.util.Bytes. Example: from ??a\x00 to ??c\x1F. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964558#comment-13964558 ] Alex Baranau commented on HBASE-6618: - right. I mean it won't be human-friendly though still... I thought more about smth like this: {code} new FuzzyRowFilter.Builder() .any(length) // meaning for 4 .range(range_start_bytes, range_end_bytes) // builder will check that length of those is the same .any(length) .fixed(couple_fixed_bytes) .build(); {code} We may also overload with allowing strings if makes sense. So that e.g. ???(11-88)??AAA could be built with: {code} new FuzzyRowFilter.Builder() .any(3) .range(11, 88) .any(2) .fixed(AAA) .build(); {code} thoughts? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964560#comment-13964560 ] Alex Baranau commented on HBASE-6618: - with human-readable and ? inside - you have to somehow define how to put ? if I want it as normal byte... Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964897#comment-13964897 ] chunhui shen commented on HBASE-6618: - bq.you have to somehow define how to put ? if I want it as normal byte. We could use '\' before '?' to define the normal byte '?' As my consideration, user could construct FuzzyRowFilter with the readable String directly. e.g ???11??AA\x00??\? Using Bytes.toBytesBinary to convert the string to bytes, then parse the bytes, if the byte is '?', mark it as non-fixed byte, if the byte is '\', skip it and see the next byte, and so on Of course, if user want to make '\x00' as 4 bytes, the above seems wrong. For this case, we should also support constructing FuzzyRowFilter with the readable byte array. For example, ???11??AA\x00??\? = byte[0]='?' byte[1]='?' byte[2]='?' byte[3]='1' byte[4]='1' byte[5]='?' byte[6]='?' byte[7]='A' byte[8]='A' byte[9]=0 byte[10]='?' byte[11]='?' byte[12]='\' byte[13]='?' Correct me if something wrong :) Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964901#comment-13964901 ] chunhui shen commented on HBASE-6618: - bq.not sure I got the question, sorry: which checks? {code} + * p + * NOTE that currently no checks are performed to ensure that length of ranges lower bytes and + * ranges upper bytes match mask length. Filter may work incorrectly or fail (with runtime + * exceptions) if this is broken. + * /p + * + * p + * NOTE that currently no checks are performed to ensure that ranges are defined correctly (i.e. + * lower value of each range is not greater than upper value). Filter may work incorrectly or fail + * (with runtime exceptions) if this is broken. + * /p + * + * p + * NOTE that currently no checks are performed to ensure that at non-fixed positions in + * ranges lower bytes and ranges upper bytes zeroes are set, but implementation may rely on this. + * /p {code} I mean the above checks Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13964919#comment-13964919 ] Alex Baranau commented on HBASE-6618: - got it thanx. bq. We could use '\' before '?' to define the normal byte '?' and then \ before \ if we need \. And so on. I mean we can do that. Having Strings _with special chars_ where impl expects _any_ bytes as normal input at times not trivial. And if we really want that we would make it in API in some from of standard. And then, we could have it everywhere, e.g. in Puts, etc. I am not sure we want to create specific format for one filter.. What are your thoughts on builder? seems like can help to avoid all those special chars and still keep it very human-friendly. We can allow also Strings as I mentioned with \x notation. But the difference is that no pain with special chars and easy guiding API for users... Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963753#comment-13963753 ] chunhui shen commented on HBASE-6618: - FuzzyRowFilter seems available only for advanced users. Should we support creating FuzzyRowFilter with the Human-Readable String, e.g. _0004_?? There are many notion when using FuzzyRowFilter, could we do these checks in the internal of FuzzyRowFilter ? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961243#comment-13961243 ] Alex Baranau commented on HBASE-6618: - [~kuzmiigo], heh - in satisfies() found that it was incorrect. I.e. as you described. Sorry I overlooked that. Fixed in latest patch. [~tedyu] added more tests and also test for compat with client compiled against old version of code. Though I'd really ask you to check if that was done correctly: don't have a lot of experience with protobuf. Thank you guys! Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961265#comment-13961265 ] Ted Yu commented on HBASE-6618: --- Thanks for the quick turnaround. [~apurtell]: do you want this in 0.98 ? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961272#comment-13961272 ] Hadoop QA commented on HBASE-6618: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638881/HBASE-6618_5.patch against trunk revision . ATTACHMENT ID: 12638881 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9206//console This message is automatically generated. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, HBASE-6618_5.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960916#comment-13960916 ] Alex Baranau commented on HBASE-6618: - [~kuzmiigo] bq. I thought that the value in the fixed part is checked as whole, but the code actually checks its bytes in isolation, so the rule is actually 0(0 - 9)(0 - 9)(1 - 9) not true. aa68 will satisfy the rule ??(53 - 97). Added a test specifically for that: {code} // Range Assert.assertEquals(FuzzyRowFilter.SatisfiesCode.YES, FuzzyRowFilter.satisfies( new byte[]{1, 1, 6, 8}, new Triplebyte[], byte[], byte[]( new byte[]{0, 0, 1, 1}, // mask new byte[]{1, 1, 5, 6}, // upper bytes new byte[]{1, 1, 9, 7}))); // lower bytes {code} Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960924#comment-13960924 ] Alex Baranau commented on HBASE-6618: - Updated patch, also uploaded to review board at https://reviews.apache.org/r/8786. Very small change to fit latest trunk. [~yuzhih...@gmail.com] if you have time by chance - very much appreciate if you can review. This version is much better, much more flexible than current one.. Thank you a lot in advance Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960927#comment-13960927 ] Alex Baranau commented on HBASE-6618: - I mean than the one available in HBase currently Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960955#comment-13960955 ] Ted Yu commented on HBASE-6618: --- If a client, compiled with FuzzyRowFilter before this change, uses FuzzyRowFilter in Scan, would server side be able to handle ? Please add more tests for the new rules. This would make code more robust and help detect regression. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960963#comment-13960963 ] Hadoop QA commented on HBASE-6618: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638828/HBASE-6618_4.patch against trunk revision . ATTACHMENT ID: 12638828 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9204//console This message is automatically generated. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13959055#comment-13959055 ] Igor Kuzmitshov commented on HBASE-6618: Please note that in the version proposed by me (aa68 should satisfy rule ??(53 - 97)) it's not possible to have adjacent ranges in the rule: the high-level ??(10-19)(00-30) and ??(1000-1930) will be written as the same range (key start, key end, mask): ??1000, ??1930, 11. This can be solved by using different values in the mask (it would be more convenient to use 0 for non-fixed bytes, 1 for range 1, 2 for range 2 and so on). Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915732#comment-13915732 ] Igor Kuzmitshov commented on HBASE-6618: Looking at the description above that rule (0001 - 0999) means any 4 bytesany 4 bytes value between 0001 and 0999, I thought that the value in the fixed part is checked as whole, but the code actually checks its bytes in isolation, so the rule is actually 0(0 - 9)(0 - 9)(1 - 9). It's fine for ranges like this, but let's take another: ??(53 - 97). I would expect aa68 to satisfy the rule, but in the proposed implementation it doesn't (because bytes are checked in isolation and 8 is outside the range \[3, 7\]). Could you clarify if this is the intended behaviour? If yes, i.e. aa68 should not satisfy rule ??(53 - 97): It would be nice to make it more clear in the description that all bytes are checked in isolation and there are actually no n-bytes values. In this case, there is a bug: for rule ??(50 - 97) and value MM58 (where M is max byte \xFF), satisfies() returns SatisfiesCode.NO_NEXT because nextRowKeyCandidateExists is only updated for non-fixed positions. It should return NEXT_EXISTS, because MM60 should be the next key. If no, i.e. aa68 should satisfy rule ??(53 - 97): In this case, satisfy() should be fixed. I made a patch with the fix and can add it if needed. It also has a small optimisation when there is no need to check less significant bytes. For example: for range \[120, 500\] and key 345, it will compare the first byte (3) only, as it's clear that the whole value is in the range. In any case, tests might include testing satisfy() with ranges (the current patch only adds tests for getNextForFuzzyRule() with ranges). Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856495#comment-13856495 ] Alex Baranau commented on HBASE-6618: - Yeah, looks like nobody looked at the patch, even though I know others use it (patch). Weird and don't know how to push anyone to do that. Not sure if patch fits latest version. If there's still an interest from any committer (I hope so) to take a look and proceed with the issue, I will take a look at it and make sure it is good for latest version. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856496#comment-13856496 ] Ted Yu commented on HBASE-6618: --- Alex: I should have time to review. Please update the patch. Thanks Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.99.0 Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596725#comment-13596725 ] James Taylor commented on HBASE-6618: - Would it be possible to generalize this a bit further to handle variable length key parts assuming you know the terminator (both Phoenix and Orderly, use a 0 byte terminator)? With the work you've already done here to support ranges, you could support a good set of skip scanning scenarios for multi-part row keys. Take for example a fuzzy row filter expressed for a two part row key of VARCHAR + byte[4] like this: 'foo%' (this would be for the VARCHAR key part - 'foo' followed by zero or more characters) [4000-6000) (this would be for the INT key part - 4000 inclusive to 6000 exclusive) In this case (as you've already pointed out), you can use the first row key as your guide. Let's say the first row key is ['foobar'][1000]. You could form a skip hint as ['foobar'][4000] (i.e. 'foobar' + new byte[] {0} + new byte[] {1,0,0,0}). Then you'd let all values pass until you got to ['foobar'][6000], in which case you'd form your next skip hint. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596734#comment-13596734 ] Ted Yu commented on HBASE-6618: --- bq. skip hint as ['foobar'][4000] (i.e. 'foobar' + new byte[] {0} + new byte[] {1,0,0,0}). Is there a typo above ? Should it be: {code} 'foobar' + new byte[] {0} + new byte[] {4,0,0,0} {code} Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596741#comment-13596741 ] James Taylor commented on HBASE-6618: - Yes, that's a typo. Thanks - you've got good eyes! Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543171#comment-13543171 ] Hadoop QA commented on HBASE-6618: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563115/HBASE-6618_3.path against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer org.apache.hadoop.hbase.ipc.TestRpcMetrics Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3826//console This message is automatically generated. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543182#comment-13543182 ] Alex Baranau commented on HBASE-6618: - bq. -1 lineLengths. The patch introduces lines longer than 100 I guess that's because of generated code (from *.proto). That should be OK, right? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541465#comment-13541465 ] Alex Baranau commented on HBASE-6618: - Created https://reviews.apache.org/r/8786. I think I will add more unit-tests. Comments are very welcome! Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541468#comment-13541468 ] Anil Gupta commented on HBASE-6618: --- Hi Alex, Actually, we decided not to use full-table scans for this type of query. Hence, i could not devote time on this. We are trying out Secondary Index in HBase. Sorry, for the late update. Wish you a Happy New Year! ~Anil Gupta Software Engineer II, Intuit, Inc Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541510#comment-13541510 ] Hadoop QA commented on HBASE-6618: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562810/HBASE-6618.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer org.apache.hadoop.hbase.filter.TestFuzzyRowFilter org.apache.hadoop.hbase.ipc.TestRpcMetrics Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3788//console This message is automatically generated. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541532#comment-13541532 ] Hadoop QA commented on HBASE-6618: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562822/HBASE-6618_2.path against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestMetricsRegionServer org.apache.hadoop.hbase.ipc.TestRpcMetrics {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3789//console This message is automatically generated. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: Filters Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HBASE-6618_2.path, HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, HBASE-6618.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442944#comment-13442944 ] Alex Baranau commented on HBASE-6618: - Weird. I can open it. Anyhow, sent it to your email. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442276#comment-13442276 ] Anil Gupta commented on HBASE-6618: --- Hi Alex, I am still unable to access the png file for the algorithm. Is there some problem with JIRA system? or Can you re-upload the image? Thanks, Thanks, Anil Gupta Software Engineer II, Intuit, Inc Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440242#comment-13440242 ] Alex Baranau commented on HBASE-6618: - Ah, sorry, haven't said anything about that. For toInc - we may not change it at every step, so if there's a missing arrow, that means nothing should be changed. Thanx for checking out! One thing that I'm not 100% sure about - is it better to adjust current FuzzyRowFilter and this functionality to it or add new. I'm leaning towards adjusting FuzzyRowFilter as this new feature fits naturally in it. Thoughts? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440268#comment-13440268 ] Zhihong Ted Yu commented on HBASE-6618: --- Enhancing existing class is fine. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439311#comment-13439311 ] Anil Gupta commented on HBASE-6618: --- Hi Alex, I agree with you idea of RangeBased Fuzzy Filter. However, I would like to take a phased approach in developing this: In your proposal, the user can provide multiple fuzzy ranges in a single scan. i.e. any 4 bytesany 6 bytes value between _0001 and 0099any 3 bytesany 4 bytes value between _001 and _099 Instead of the above, IMO lets try to make a filter for any 4 bytesany 6 bytes value between _0001 and 0099any 3 bytes or any 4 bytesany 6 bytes value between _0001 and 0099. Once we develop this then we can enhance it to use multiple fuzzy ranges. This is just my thought/approach of developing this. Let me know your opinion. From this week, at work I had to shift focus from HBase to Hive and HCatalog for another POC. So, I'll be squeezing time for this JIRA out of work schedule. I'll start looking into the current implementation of FuzzyRowFilter to get idea about implementation. Thanks, Anil Gupta Software Engineer II, Intuit, Inc Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440039#comment-13440039 ] Zhihong Ted Yu commented on HBASE-6618: --- Thanks for the update, Alex. I get your idea, though a few arrows seem to be missing (e.g. CCF is ?) in the diagram for toInc. Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438134#comment-13438134 ] Alex Baranau commented on HBASE-6618: - Just an idea. May be we should try improve existing FuzzyRowFilter by allowing to specify each fuzzy rule with: * fuzzy key start * fuzzy key end this is currently missing in FuzzyRowFilter * mask This looks flexible enough to me. E.g. one could specify rule (_0001_-_0099_)???(_001-_099), i.e. any 4 bytesany 6 bytes value between _0001_ and _0099_any 3 bytesany 4 bytes value between _001 and _099 with this definition: * _0001_???_001 * _0099_???_099 currently missing * 00111 In this case any sequence of fixed positions treated as one n-bytes value. -- Alternatively, such fuzzy rule can be specified as list of parts, each part being one of: * n fuzzy bytes * start/stop key part range (of the same length) This might be closer to human-readable definition, though the former one could be easier to deal with. Anil, as you expressed willing to work on this, what are your thoughts? May be you have smth different in your mind? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6618) Implement FuzzyRowFilter with ranges support
[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438147#comment-13438147 ] Alex Baranau commented on HBASE-6618: - Sorry for the spam, for some reason I cannot edit the comment and JIRA broke formatting for the text pieces of my previous comment (I should have checked that first, sorry). This is how it supposed to look: Just an idea. May be we should try improve existing FuzzyRowFilter by allowing to specify each fuzzy rule with: * fuzzy key start * fuzzy key end this is currently missing in FuzzyRowFilter * mask This looks flexible enough to me. E.g. one could specify rule ?\?\??(0001 - 0999)???(001 - 099), i.e. any 4 bytesany 4 bytes value between 0001 and 0999any 3 bytesany 3 bytes value between 001 and 099 with this definition: * ?\?\??0001???001 * ?\?\??0999???099 currently missing * 111000 In this case any sequence of fixed positions treated as one n-bytes value. Alternatively, such fuzzy rule can be specified as list of parts, each part being one of: * n fuzzy bytes * start/stop key part range (of the same length) This might be closer to human-readable definition, though the former one could be easier to deal with. Anil, as you expressed willing to work on this, what are your thoughts? May be you have smth different in your mind? Implement FuzzyRowFilter with ranges support Key: HBASE-6618 URL: https://issues.apache.org/jira/browse/HBASE-6618 Project: HBase Issue Type: New Feature Components: filters Reporter: Alex Baranau Priority: Minor Apart from current ability to specify fuzzy row filter e.g. for userId_actionId format as _0004 (where 0004 - actionId) it would be great to also have ability to specify the fuzzy range , e.g. _0004, ..., _0099. See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient. Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter). While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira