[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4642: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Teddy! Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Fix For: 0.13.0 Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, HIVE-4642.7.patch.txt, HIVE-4642.8.patch.txt, HIVE-4642.8-vectorization.patch, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.8.patch.txt Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, HIVE-4642.7.patch.txt, HIVE-4642.8.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.8-vectorization.patch Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, HIVE-4642.7.patch.txt, HIVE-4642.8.patch.txt, HIVE-4642.8-vectorization.patch, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.7.patch.txt I attach a rebased version of the last patch. The problem was that plan serialization does not use setter/getter methods so the checker member variable never gets assigned after deserialization. Now it is assigned on evaluate() method. It passes tests without any misleading errors. I wish that this would be the last patch. :P Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, HIVE-4642.7.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Status: Patch Available (was: Open) Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.6.patch.txt Added supports for serialization Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, HIVE-4642.6.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Status: Open (was: Patch Available) Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.5.patch.txt I uploaded 4th patch with an incorrect contents. This 5th patch corrects it. Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, HIVE-4642.5.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.4.patch.txt 4th patch contains the following changes. - Added code on AbstractFilterStringColLikeStringScalar.java to evaluate child expressions. - Removed misleading comments. Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, HIVE-4642.4.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: Hive-Vectorized-Query-Execution-Design-rev10.docx I wrote LIKE and REGEXP expressions: section in Filter operator. Following is the added text. {quote} Filter condition expressions LIKE and REGEXP expressions: LIKE and REGEXP expressions find any strings fitting a pattern. They compile a pattern on creation, and find strings on evaluation. Both kinds of expression use the Java regular expression package. REGEXP expressions use the package as it is. But LIKE expressions have different grammar, so they need conversion. “%” is converted to “.*” and “_” is converted to “.”. AbstractFilterStringColLikeStringScalar class defines common behaviors. FilterStringColLikeStringScalar class and FilterStringColRegExpStringScalar class implement differences. There are simple and frequently used patterns; such as prefix match, suffix match, middle match, exact match, and phone numbers. There are optimized implementations for them. They evaluate using byte arrays directly to avoid UTF-8 decoding load. {quote} This file is edited on Word for Mac 2011, so it may have incompatibilities. Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt, Hive-Vectorized-Query-Execution-Design-rev10.docx See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.3.patch.txt Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Status: Patch Available (was: In Progress) Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.2.patch After applying HIVE-4548, the previous patch became not available to apply on the vectorization branch. Because both of them change FilterStringColLikeStringScalar. This patch is available to apply on the vectorization branch. Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642-1.patch I wrote draft code. It needs more comments, tests, and refactoring. I agree that FA generation will be a heavy job, so I didn't implemented it. Common phone number patterns are covered with a simple fixed automaton. I will add more simple automata. There are already hard coded decisions, and more will come. So I introduced an interface that generalizes decisions. It may reduce performance little bit. Class hierarchy: AbstractFilterStringColLikeStringScalar + FilterStringColLikeStringScalar + FilterStringColRegExpStringScalar AbstractFilter...#Checker + AbstractFilter...#BeginChecker + AbstractFilter...#EndChecker + AbstractFilter...#MiddleChecker + AbstractFilter...#NoneChecker + AbstractFilter...#AnyCharChecker + AbstractFilter...#ComplexChecker + FilterStringColRegExpStringScalar#PhoneNumberChecker AbstractFilter...#CheckerFactory + Filter...Like...#LikeBeginCheckerFactory + Filter...Like...#LikeEndCheckerFactory + Filter...Like...#LikeMiddleCheckerFactory + Filter...Like...#LikeNoneCheckerFactory + Filter...Like...#LikeAnyCharCheckerFactory + Filter...Like...#LikeComplexCheckerFactory + Filter...RegExp...#RegExpBeginCheckerFactory + Filter...RegExp...#RegExpEndCheckerFactory + Filter...RegExp...#RegExpMiddleCheckerFactory + Filter...RegExp...#RegExpNoneCheckerFactory + Filter...RegExp...#RegExpAnyCharCheckerFactory + Filter...RegExp...#RegExpComplexCheckerFactory + Filter...RegExp...#RegExpPhoneNumberCheckerFactory Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira