[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647554#comment-13647554 ] Kevin Lawson commented on LUCENE-4947: -- I had some free time yesterday and updated LevenshteinAutomaton to include transpositions in its edit distance determination methods. The included tests have also been updated to accommodate the modifications and ensure correct functionality. How do I go about submitting the updated code? Push an update to my github? Simply attach the new archive here? I'm not sure where the code donation/acceptance process is at now, so I'm unsure of how to do this. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643668#comment-13643668 ] Steve Rowe commented on LUCENE-4947: The Lucene PMC received notification today from the Apache Secretary that Kevin's code grant and ICLA paperwork have been received and recorded. Now that we have Kevin's code grant and ICLA, we can start verifying headers/licensing. Since it's not clear where this software will go, I don't think it makes sense to create a branch yet. People doing the header/license verification/modification can just attach modified tarballs to this issue if necessary. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642918#comment-13642918 ] Kevin Lawson commented on LUCENE-4947: -- bq. Kevin Lawson, did you send the code grant to legal-arch...@apache.org in addition to sending it to secret...@apache.org? This is mentioned as a requirement in step 3 of the process section in http://incubator.apache.org/ip-clearance/ip-clearance-template.html. Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642945#comment-13642945 ] Steve Rowe commented on LUCENE-4947: bq. Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here. [~klawson88], I'm concerned that the Lucene PMC has not yet received notification from the Apache Secretary of either your code grant or your ICLA. We got notification earlier today for SooMyung Lee's paperwork (see LUCENE-4956), which was sent to secret...@apache.org after yours. Is it possible that you sent the paperwork to the wrong address, or that there was some other mail snafu? Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643001#comment-13643001 ] Kevin Lawson commented on LUCENE-4947: -- The address I sent it to is indeed correct (secret...@apache.org). Maybe it's something on their end (spam filter)? I've just re-sent it. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642161#comment-13642161 ] Steve Rowe commented on LUCENE-4947: bq. Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secret...@apache.org. I monitor commits to the ICLA and code grants record files, and neither the ICLA nor the code grant document has been recorded yet. I'll post on this issue once the code grant has been recorded. [~klawson88], did you send the code grant to legal-arch...@apache.org in addition to sending it to secret...@apache.org? This is mentioned as a requirement in step 3 of the process section in [http://incubator.apache.org/ip-clearance/ip-clearance-template.html]. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640499#comment-13640499 ] Steve Rowe commented on LUCENE-4947: bq. I assumed you guys would simply grab the tarballs from the GitHub links I posted. Okay, cool, I think that will work. Do you intend for these two projects to live on after they've been incorporated into Lucene? If so, then I'll fork them on Github and start making license header changes - ALv2; Kevin, do you give your consent for me to change the license headers in all files to point to ALv2? If you don't intend for these two projects to continue life separately from Lucene, then I think it will make sense for you to do the license changes in-place yourself, Kevin. Alternatively, you could grant write access to someone else to do the work. Please let us know. I have started the IP clearance form. It's online now at [http://incubator.apache.org/ip-clearance/lucene-levenshtein-automaton-mdag.html]. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640551#comment-13640551 ] Christian Moen commented on LUCENE-4947: Kevin, I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change. This change can be done by using the below header on all the source code and other relevant text files: {noformat} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ {noformat} After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.) It's a good idea to also use this MD5 hash as part of Exhibit A in the [software-grant.txt|http://www.apache.org/licenses/software-grant.txt] agreement unless you have signed and submitted this already. (If you donate the code yourself by attaching it to the JIRA as described above, I believe the hashes not being part of Exhibit A is acceptable.) Please feel free to add your comments, Steve. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640586#comment-13640586 ] Steve Rowe commented on LUCENE-4947: bq. I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change. +1 bq. After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.) I and Dawid had been advocating using Github for this, but I agree with Christian: a tarball attached to this issue by you, [~klawson88], will remove all ambiguity about what is being donated and by whom. Also, Github is not under ASF control, and in the future if that business goes under, the ASF will lose the history of this donation. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639089#comment-13639089 ] Kevin Lawson commented on LUCENE-4947: -- Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secret...@apache.org. Is there anything else that needs to be done on my part? Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639486#comment-13639486 ] Steve Rowe commented on LUCENE-4947: bq. Is there anything else that needs to be done on my part? Hi [~klawson88], What did you send to secret...@apache.org? We need to know exactly what it is you're donating, so that we can start the vetting and header modification process (assuming you have not yet done any header modification yourself). This is generally done by transferring a compressed tarball to a public place, to provide access to those who wants to inspect it, and those who will do the header modification. But maybe this could be done via Git hash(es)? I looked at a bunch of the existing examples of this process http://incubator.apache.org/ip-clearance/, and all of them go the compressed tarball route. Steve Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639492#comment-13639492 ] Dawid Weiss commented on LUCENE-4947: - Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think (?). Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639533#comment-13639533 ] Steve Rowe commented on LUCENE-4947: bq. Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think (?). I think it's possible. Kevin in his paperwork will have had to have referred to the git revision, but he wrote he sent GPG-related files, which doesn't seem compatible to me, since detached signatures will AFAIK have been generated against a local copy, not the Git repository, and so may not be reproducible by third parties (e.g. are Git checkouts bit-for-bit identical on Linux vs. Windows?). I think the intent of the signature is just integrity/identity: a way to refer to the exact bits being donated. Git hash(es) should serve the same function. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639550#comment-13639550 ] Dawid Weiss commented on LUCENE-4947: - Git checkouts may not be bit-for-bit identical (they may vary if you have custom EOL treatment in .gitattributes, for example) but a git revision hash uniquely identifies a revision in a repository (and this, to my understanding, cannot be easily forged since it includes all parent revisions). So if you git clone/export a given hash then it's essentially the same as if somebody sent you a tarball with file checksums? I don't know anything about the legal aspects (if it makes a difference whether you're pulling from an ASL-licensed copy compared to the author taking the initiative). Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639571#comment-13639571 ] Steve Rowe commented on LUCENE-4947: bq. Git checkouts may not be bit-for-bit identical Right, I mentioned this because Kevin wrote that he had sent GPG-related files to secretary@a.o, by which I assume he means detached signature(s), and if a checkout is not bit-for-bit identical, we will have to ignore those signatures, since they won't be reproducible, and that may have procedural implications: can't move beyond the step where you have to verify the signature. Of course, if the Git hash(es) are sufficient (and I agree with you, Dawid, that they seem to be), then it should be fine to ignore the signature(s) Kevin sent. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639702#comment-13639702 ] Kevin Lawson commented on LUCENE-4947: -- Sorry for the ambiguity. When I wrote and GP-related files, I meant the signatures for the ICA and software grant documents. The documents made no mention of signing the tarballs to be donated (the ip-clearance form only mentions it conditionally in step 5). I assumed you guys would simply grab the tarballs from the GitHub links I posted. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637797#comment-13637797 ] Dawid Weiss commented on LUCENE-4947: - Cool but this may be a showstopper: MDAG is licensed under the GNU General Public License (version 3). Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637943#comment-13637943 ] Christian Moen commented on LUCENE-4947: Thanks a lot for wishing to submit code! It's not possible to include your code in Lucene if it has a GPL license. Quite frankly, I don't think even think Lucene committers can even have a look at it to consider it for inclusion with a GPL license. If you have written all the code or otherwise own all copyrights, would you mind switching to Apache License 2.0? That way, I at least think it would be possible to have a close look to see if this is a good fit for Lucene. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637945#comment-13637945 ] Kevin Lawson commented on LUCENE-4947: -- Ah. It slipped my mind that the result of combining GPL and Apache licensed code must be GPL. I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely. If the former is possible, consider this post one that formally licenses both LevenshteinAutomaton and MDAG, to the Lucene development team under Apache License (version 2), irregardless of any licenses that may be included with either project, and any license notices contained in any of their constituent files. If the former is not possible, I suppose I could push up versions of each project licensed appropriately to the general public. Just tell me what action(s) need to be taken from here. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637989#comment-13637989 ] Simon Willnauer commented on LUCENE-4947: - bq. I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely. do you own all the copyrights for this GPL licensed product? I haven't looked into it yet but what we need to do here in a nutshell is: * issue a code grant http://www.apache.org/licenses/software-grant.txt * having received a http://www.apache.org/licenses/icla.txt which is good for future contributions... * run through IP clearance http://incubator.apache.org/ip-clearance/index.html usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638020#comment-13638020 ] Uwe Schindler commented on LUCENE-4947: --- Hi, My comments: Can this tool be used to replace the Code-Generator that creates the large int[] arrays? If this is not the case, the actual implementation of the Levensthein Automaton might be smaller, but there 2 backsides: - Licene issues with the external library (seems to be solved already) - The far bigger problem is: Levensthein is not the only automaton used in Lucene Queries. Fuzzy, Wildcard and Regex are all based on AutomatonTermsEnum that uses the *Bricks library* for the automaton API/representation. If we bring in here this new MDAC stuff, the whole atomaton code inside needs to be ported over or we have code duplication (and Fuzzy does no longer use the Bricks Automaton lib). Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638050#comment-13638050 ] Kevin Lawson commented on LUCENE-4947: -- bq. Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore. Of course! If you take a look at the tableFuzzySearch() method [here|https://github.com/klawson88/LevenshteinAutomaton/blob/master/final/src/com/BoxOfC/LevenshteinAutomaton/LevenshteinAutomaton.java], you'll see that it takes an MDAG (which is equivalent in structure to the automatons implemented in Brics) and simply transitions it in step with the LevenshteinAutomaton. The method can be modified easily to accept a Brics automaton, which i'm assuming has methods that implement typical automaton actions (namely transitioning and accept state determination). The main reason one might want to consider using MDAG is that typically libraries that implement the data structure (which is more widely known as a DAWG) only support creation with sorted input (and thus, do not allow for modification). I believe Brics is [no exception|http://www.brics.dk/automaton/doc/index.html?dk/brics/automaton/Automaton.html]. My MDAG library supports unsorted input and run-time modification of the structure. (The minor drawback concerning this has been addressed in the original post). Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638111#comment-13638111 ] Kevin Lawson commented on LUCENE-4947: -- bq. do you own all the copyrights for this GPL licensed product? Yes. All the materials I'm submitting were created solely by me. bq. I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through. Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secret...@apache.org now? Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638127#comment-13638127 ] Steve Rowe commented on LUCENE-4947: {quote} bq. I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through. Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secret...@apache.org now? {quote} PMC Chair here - I've never shepherded one of these things before, so I need to get up to speed. I glanced through the links Simon sent (thanks Simon), nothing seems too difficult. I'll read more thoroughly and get back to you here. One potential issue that will need to be resolved first: from my past experience, the threshold at which code grants need to be invoked seems fuzzy to me: my previous takeaway had been that the quantity of the contribution, both in number of files and in line count, is a consideration: only a couple of files, or only a couple hundred lines of code, don't warrant a code grant. I looked at the git repo you pointed to, Kevin, and it seems to have more than a couple of files, and more than a couple hundred lines of code, so I'm pretty sure Simon's right, the code grant process will have to be invoked. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638128#comment-13638128 ] Christian Moen commented on LUCENE-4947: It sounds proper to do a code grant also because the software currently has a GPL license. Thanks for following up, Steve. Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata
[ https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638175#comment-13638175 ] Steve Rowe commented on LUCENE-4947: Kevin, You can license your code under multiple licenses if you like. The simplest thing to do here is to license your code with a single license: the Apache License v2. For info on the Apache License v2, see [http://www.staging.apache.org/licenses/]. See also [http://www.apache.org/legal/resolved.html] for a list of licenses which can and cannot be included as dependencies in Apache products. AFAICT, though, once you have signed and submitted the software grant and provided a tarball of the code, your grant is under the terms of the Apache License v2, and the license headers in files committed to the Lucene codebase will be modified to include a reference to the ALv2, and exclude any other license information. FYI, attribution for individual contributions is located in a single file: lucene/CHANGES.txt, e.g. for trunk: [http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt] I'll be filling out the XML version of this (HTML) IP clearance form: [http://incubator.apache.org/ip-clearance/ip-clearance-template.html] - I encourage you to take a look, Kevin. Steve Java implementation (and improvement) of Levenshtein associated lexicon automata -- Key: LUCENE-4947 URL: https://issues.apache.org/jira/browse/LUCENE-4947 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1 Reporter: Kevin Lawson I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike! I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene. Benefits of upgrading include - Reduced code complexity - Better performance from components that were previously implemented in Python - Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation) The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance. The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton. The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org