subject:"\[jira\] \[Commented\] \(LUCENE\-4947\) Java implementation \(and improvement\) of Levenshtein associated lexicon automata"

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-05-02 Thread Kevin Lawson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647554#comment-13647554
]

Kevin Lawson commented on LUCENE-4947:
--

I had some free time yesterday and updated LevenshteinAutomaton to include
transpositions in its edit distance determination methods. The included tests
have also been updated to accommodate the modifications and ensure correct
functionality.

How do I go about submitting the updated code? Push an update to my github?
Simply attach the new archive here? I'm not sure where the code
donation/acceptance process is at now, so I'm unsure of how to do this.

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

Key: LUCENE-4947
URL: https://issues.apache.org/jira/browse/LUCENE-4947
Project: Lucene - Core
Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson
Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip

I was encouraged by Mike McCandless to open an issue concerning this after I
contacted him privately about it. Thanks Mike!
I'd like to submit my Java implementation of the Levenshtein Automaton as a
homogenous replacement for the current heterogenous, multi-component
implementation in Lucene.
Benefits of upgrading include
- Reduced code complexity
- Better performance from components that were previously implemented in
Python
- Support for on-the-fly dictionary-automaton manipulation (if you wish to
use my dictionary-automaton implementation)
The code for all the components is well structured, easy to follow, and
extensively commented. It has also been fully tested for correct
functionality and performance.
The levenshtein automaton implementation (along with the required MDAG
reference) can be found in my LevenshteinAutomaton Java library here:
https://github.com/klawson88/LevenshteinAutomaton.
The minimalistic directed acyclic graph (MDAG) which the automaton code uses
to store and step through word sets can be found here:
https://github.com/klawson88/MDAG
*Transpositions aren't currently implemented. I hope the comment filled,
editing-friendly code combined with the fact that the section in the Mihov
paper detailing transpositions is only 2 pages makes adding the functionality
trivial.
*As a result of support for on-the-fly manipulation, the MDAG
(dictionary-automaton) creation process incurs a slight speed penalty. In
order to have the best of both worlds, i'd recommend the addition of a
constructor which only takes sorted input. The complete, easy to follow
pseudo-code for the simple procedure can be found in the first article I
linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-27 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643668#comment-13643668
 ] 

Steve Rowe commented on LUCENE-4947:


The Lucene PMC received notification today from the Apache Secretary that 
Kevin's code grant and ICLA paperwork have been received and recorded.

Now that we have Kevin's code grant and ICLA, we can start verifying 
headers/licensing.  Since it's not clear where this software will go, I don't 
think it makes sense to create a branch yet.  People doing the header/license 
verification/modification can just attach modified tarballs to this issue if 
necessary.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson
 Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip


 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-26 Thread Kevin Lawson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642918#comment-13642918
]

Kevin Lawson commented on LUCENE-4947:
--

bq. Kevin Lawson, did you send the code grant to legal-arch...@apache.org in
addition to sending it to secret...@apache.org? This is mentioned as a
requirement in step 3 of the process section in
http://incubator.apache.org/ip-clearance/ip-clearance-template.html.

Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I
figured I'd notify you here.

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-26 Thread Steve Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642945#comment-13642945
]

Steve Rowe commented on LUCENE-4947:

bq. Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but
I figured I'd notify you here.

[~klawson88], I'm concerned that the Lucene PMC has not yet received
notification from the Apache Secretary of either your code grant or your ICLA.
We got notification earlier today for SooMyung Lee's paperwork (see
LUCENE-4956), which was sent to secret...@apache.org after yours.

Is it possible that you sent the paperwork to the wrong address, or that there
was some other mail snafu?

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-26 Thread Kevin Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643001#comment-13643001
 ] 

Kevin Lawson commented on LUCENE-4947:
--

The address I sent it to is indeed correct (secret...@apache.org). Maybe it's 
something on their end (spam filter)? I've just re-sent it.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson
 Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip


 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-25 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642161#comment-13642161
 ] 

Steve Rowe commented on LUCENE-4947:


bq. Just updating the thread to notify everyone that I've just e-mailed the ICA 
and code grant documents (and their GPG-related files) to secret...@apache.org.

I monitor commits to the ICLA and code grants record files, and neither the 
ICLA nor the code grant document has been recorded yet.  I'll post on this 
issue once the code grant has been recorded.

[~klawson88], did you send the code grant to legal-arch...@apache.org in 
addition to sending it to secret...@apache.org?  This is mentioned as a 
requirement in step 3 of the process section in 
[http://incubator.apache.org/ip-clearance/ip-clearance-template.html].

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson
 Attachments: LevenshteinAutomaton-master.zip, MDAG-master.zip


 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-24 Thread Steve Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640499#comment-13640499
]

Steve Rowe commented on LUCENE-4947:

bq. I assumed you guys would simply grab the tarballs from the GitHub links I
posted.

Okay, cool, I think that will work.

Do you intend for these two projects to live on after they've been incorporated
into Lucene? If so, then I'll fork them on Github and start making license
header changes - ALv2; Kevin, do you give your consent for me to change the
license headers in all files to point to ALv2?

If you don't intend for these two projects to continue life separately from
Lucene, then I think it will make sense for you to do the license changes
in-place yourself, Kevin. Alternatively, you could grant write access to
someone else to do the work. Please let us know.

I have started the IP clearance form. It's online now at
[http://incubator.apache.org/ip-clearance/lucene-levenshtein-automaton-mdag.html].

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-24 Thread Christian Moen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640551#comment-13640551
 ] 

Christian Moen commented on LUCENE-4947:


Kevin,

I think it's best that you do the license change yourself and that we don't 
have any active role in making the change since you are the only person 
entitled to make the change.

This change can be done by using the below header on all the source code and 
other relevant text files:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
{noformat}

After this has been done, please make a tarball and attach it to this JIRA and 
indicate that this is the code you wish to grant and also inform us about the 
MD5 hash of the tarball.  (This will go into the IP-clearance document and will 
be used to identify the codebase.)

It's a good idea to also use this MD5 hash as part of Exhibit A in the 
[software-grant.txt|http://www.apache.org/licenses/software-grant.txt] 
agreement unless you have signed and submitted this already.  (If you donate 
the code yourself by attaching it to the JIRA as described above, I believe the 
hashes not being part of Exhibit A is acceptable.)

Please feel free to add your comments, Steve.


 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-24 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640586#comment-13640586
 ] 

Steve Rowe commented on LUCENE-4947:


bq. I think it's best that you do the license change yourself and that we don't 
have any active role in making the change since you are the only person 
entitled to make the change.

+1

bq. After this has been done, please make a tarball and attach it to this JIRA 
and indicate that this is the code you wish to grant and also inform us about 
the MD5 hash of the tarball. (This will go into the IP-clearance document and 
will be used to identify the codebase.)

I and Dawid had been advocating using Github for this, but I agree with 
Christian: a tarball attached to this issue by you, [~klawson88], will remove 
all ambiguity about what is being donated and by whom.  Also, Github is not 
under ASF control, and in the future if that business goes under, the ASF will 
lose the history of this donation. 


 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Kevin Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639089#comment-13639089
 ] 

Kevin Lawson commented on LUCENE-4947:
--

Just updating the thread to notify everyone that I've just e-mailed the ICA and 
code grant documents (and their GPG-related files) to secret...@apache.org. 

Is there anything else that needs to be done on my part?

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639486#comment-13639486
 ] 

Steve Rowe commented on LUCENE-4947:


bq. Is there anything else that needs to be done on my part?

Hi [~klawson88],

What did you send to secret...@apache.org?  We need to know exactly what it is 
you're donating, so that we can start the vetting and header modification 
process (assuming you have not yet done any header modification yourself).

This is generally done by transferring a compressed tarball to a public place, 
to provide access to those who wants to inspect it, and those who will do the 
header modification.  But maybe this could be done via Git hash(es)?  I looked 
at a bunch of the existing examples of this process 
http://incubator.apache.org/ip-clearance/, and all of them go the compressed 
tarball route.

Steve

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639492#comment-13639492
 ] 

Dawid Weiss commented on LUCENE-4947:
-

Would it be possible to change the licensing directly on github and then make 
an import from a git revision? It's traceable after all (with parent etc) and 
it would make the process simpler I think (?).

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Steve Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639533#comment-13639533
]

Steve Rowe commented on LUCENE-4947:

bq. Would it be possible to change the licensing directly on github and then
make an import from a git revision? It's traceable after all (with parent etc)
and it would make the process simpler I think (?).

I think it's possible.

Kevin in his paperwork will have had to have referred to the git revision, but
he wrote he sent GPG-related files, which doesn't seem compatible to me, since
detached signatures will AFAIK have been generated against a local copy, not
the Git repository, and so may not be reproducible by third parties (e.g. are
Git checkouts bit-for-bit identical on Linux vs. Windows?).

I think the intent of the signature is just integrity/identity: a way to refer
to the exact bits being donated. Git hash(es) should serve the same function.

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639550#comment-13639550
 ] 

Dawid Weiss commented on LUCENE-4947:
-

Git checkouts may not be bit-for-bit identical (they may vary if you have 
custom EOL treatment in .gitattributes, for example) but a git revision hash 
uniquely identifies a revision in a repository (and this, to my understanding, 
cannot be easily forged since it includes all parent revisions). So if you git 
clone/export a given hash then it's essentially the same as if somebody sent 
you a tarball with file checksums?

I don't know anything about the legal aspects (if it makes a difference whether 
you're pulling from an ASL-licensed copy compared to the author taking the 
initiative).

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639571#comment-13639571
 ] 

Steve Rowe commented on LUCENE-4947:


bq. Git checkouts may not be bit-for-bit identical

Right, I mentioned this because Kevin wrote that he had sent GPG-related 
files to secretary@a.o, by which I assume he means detached signature(s), and 
if a checkout is not bit-for-bit identical, we will have to ignore those 
signatures, since they won't be reproducible, and that may have procedural 
implications: can't move beyond the step where you have to verify the signature.

Of course, if the Git hash(es) are sufficient (and I agree with you, Dawid, 
that they seem to be), then it should be fine to ignore the signature(s) Kevin 
sent.


 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-23 Thread Kevin Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13639702#comment-13639702
 ] 

Kevin Lawson commented on LUCENE-4947:
--

Sorry for the ambiguity. 

When I wrote and GP-related files, I meant the signatures for the ICA and 
software grant documents. The documents made no mention of signing the tarballs 
to be donated (the ip-clearance form only mentions it conditionally in step 5).

I assumed you guys would simply grab the tarballs from the GitHub links I 
posted.



 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637797#comment-13637797
 ] 

Dawid Weiss commented on LUCENE-4947:
-

Cool but this may be a showstopper: MDAG is licensed under the GNU General 
Public License (version 3).

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Christian Moen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637943#comment-13637943
 ] 

Christian Moen commented on LUCENE-4947:


Thanks a lot for wishing to submit code!

It's not possible to include your code in Lucene if it has a GPL license.  
Quite frankly, I don't think even think Lucene committers can even have a look 
at it to consider it for inclusion with a GPL license.

If you have written all the code or otherwise own all copyrights, would you 
mind switching to Apache License 2.0?  That way, I at least think it would be 
possible to have a close look to see if this is a good fit for Lucene.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Kevin Lawson (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637945#comment-13637945
 ] 

Kevin Lawson commented on LUCENE-4947:
--

Ah. It slipped my mind that the result of combining GPL and Apache licensed 
code must be GPL.

I'm fully committed providing it to the Lucene project under the Apache License 
(version 2) if possible, or if it comes down to it, changing the license 
entirely. 

If the former is possible, consider this post one that formally licenses both 
LevenshteinAutomaton and MDAG, to the Lucene development team under Apache 
License (version 2), irregardless of any licenses that may be included with 
either project, and any license notices contained in any of their constituent 
files.

If the former is not possible, I suppose I could push up versions of each 
project licensed appropriately to the general public. 

Just tell me what action(s) need to be taken from here.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637989#comment-13637989
 ] 

Simon Willnauer commented on LUCENE-4947:
-

bq. I'm fully committed providing it to the Lucene project under the Apache 
License (version 2) if possible, or if it comes down to it, changing the 
license entirely.

do you own all the copyrights for this GPL licensed product? I haven't looked 
into it yet but what we need to do here in a nutshell is:
 * issue a code grant http://www.apache.org/licenses/software-grant.txt
 * having received a http://www.apache.org/licenses/icla.txt which is good for 
future contributions...
 * run through IP clearance http://incubator.apache.org/ip-clearance/index.html

usually the PMC Chair helps here a lot but just FYI this is roughly what you 
need to go through.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638020#comment-13638020
 ] 

Uwe Schindler commented on LUCENE-4947:
---

Hi,
My comments:
Can this tool be used to replace the Code-Generator that creates the large 
int[] arrays? If this is not the case, the actual implementation of the 
Levensthein Automaton might be smaller, but there 2 backsides:
- Licene issues with the external library (seems to be solved already)
- The far bigger problem is: Levensthein is not the only automaton used in 
Lucene Queries. Fuzzy, Wildcard and Regex are all based on AutomatonTermsEnum 
that uses the *Bricks library* for the automaton API/representation. If we 
bring in here this new MDAC stuff, the whole atomaton code inside needs to be 
ported over or we have code duplication (and Fuzzy does no longer use the 
Bricks Automaton lib).

Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it 
can interact with Lucene's Automaton library? If this is not the case, we can 
no longer easily combine wildcards/prefix/fuzzy anymore.



 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Kevin Lawson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638050#comment-13638050
]

Kevin Lawson commented on LUCENE-4947:
--

bq. Alternatively, can the code be ported away from MDAC to Bricks-automaton,
so it can interact with Lucene's Automaton library? If this is not the case, we
can no longer easily combine wildcards/prefix/fuzzy anymore.

Of course! If you take a look at the tableFuzzySearch() method
[here|https://github.com/klawson88/LevenshteinAutomaton/blob/master/final/src/com/BoxOfC/LevenshteinAutomaton/LevenshteinAutomaton.java],
you'll see that it takes an MDAG (which is equivalent in structure to the
automatons implemented in Brics) and simply transitions it in step with the
LevenshteinAutomaton. The method can be modified easily to accept a Brics
automaton, which i'm assuming has methods that implement typical automaton
actions (namely transitioning and accept state determination).

The main reason one might want to consider using MDAG is that typically
libraries that implement the data structure (which is more widely known as a
DAWG) only support creation with sorted input (and thus, do not allow for
modification). I believe Brics is [no
exception|http://www.brics.dk/automaton/doc/index.html?dk/brics/automaton/Automaton.html].
My MDAG library supports unsorted input and run-time modification of the
structure. (The minor drawback concerning this has been addressed in the
original post).

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Kevin Lawson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638111#comment-13638111
]

Kevin Lawson commented on LUCENE-4947:
--

bq. do you own all the copyrights for this GPL licensed product?

Yes. All the materials I'm submitting were created solely by me.

bq. I haven't looked into it yet but what we need to do here in a nutshell
is...usually the PMC Chair helps here a lot but just FYI this is roughly what
you need to go through.

Great. From the looks of it I'd have no problem submitting those documents.
Should I wait for the PMC Chair to come in here? Or can I just submit the grant
and license agreement to secret...@apache.org now?

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Steve Rowe (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638127#comment-13638127
]

Steve Rowe commented on LUCENE-4947:

{quote}
bq. I haven't looked into it yet but what we need to do here in a nutshell
is...usually the PMC Chair helps here a lot but just FYI this is roughly what
you need to go through.
Great. From the looks of it I'd have no problem submitting those documents.
Should I wait for the PMC Chair to come in here? Or can I just submit the grant
and license agreement to secret...@apache.org now?
{quote}

PMC Chair here - I've never shepherded one of these things before, so I need to
get up to speed. I glanced through the links Simon sent (thanks Simon),
nothing seems too difficult. I'll read more thoroughly and get back to you
here.

One potential issue that will need to be resolved first: from my past
experience, the threshold at which code grants need to be invoked seems fuzzy
to me: my previous takeaway had been that the quantity of the contribution,
both in number of files and in line count, is a consideration: only a couple of
files, or only a couple hundred lines of code, don't warrant a code grant. I
looked at the git repo you pointed to, Kevin, and it seems to have more than a
couple of files, and more than a couple hundred lines of code, so I'm pretty
sure Simon's right, the code grant process will have to be invoked.

Java implementation (and improvement) of Levenshtein associated lexicon
automata
--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Christian Moen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638128#comment-13638128
 ] 

Christian Moen commented on LUCENE-4947:


It sounds proper to do a code grant also because the software currently has a 
GPL license.  Thanks for following up, Steve.

 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

2013-04-22 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638175#comment-13638175
 ] 

Steve Rowe commented on LUCENE-4947:


Kevin,

You can license your code under multiple licenses if you like.  The simplest 
thing to do here is to license your code with a single license: the Apache 
License v2.

For info on the Apache License v2, see 
[http://www.staging.apache.org/licenses/].  See also 
[http://www.apache.org/legal/resolved.html] for a list of licenses which can 
and cannot be included as dependencies in Apache products.  AFAICT, though, 
once you have signed and submitted the software grant and provided a tarball of 
the code, your grant is under the terms of the Apache License v2, and the 
license headers in files committed to the Lucene codebase will be modified to 
include a reference to the ALv2, and exclude any other license information.

FYI, attribution for individual contributions is located in a single file: 
lucene/CHANGES.txt, e.g. for trunk: 
[http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt]

I'll be filling out the XML version of this (HTML) IP clearance form: 
[http://incubator.apache.org/ip-clearance/ip-clearance-template.html] - I 
encourage you to take a look, Kevin.

Steve


 Java implementation (and improvement) of Levenshtein  associated lexicon 
 automata
 --

 Key: LUCENE-4947
 URL: https://issues.apache.org/jira/browse/LUCENE-4947
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
Reporter: Kevin Lawson

 I was encouraged by Mike McCandless to open an issue concerning this after I 
 contacted him privately about it. Thanks Mike!
 I'd like to submit my Java implementation of the Levenshtein Automaton as a 
 homogenous replacement for the current heterogenous, multi-component 
 implementation in Lucene.
 Benefits of upgrading include 
 - Reduced code complexity
 - Better performance from components that were previously implemented in 
 Python
 - Support for on-the-fly dictionary-automaton manipulation (if you wish to 
 use my dictionary-automaton implementation)
 The code for all the components is well structured, easy to follow, and 
 extensively commented. It has also been fully tested for correct 
 functionality and performance.
 The levenshtein automaton implementation (along with the required MDAG 
 reference) can be found in my LevenshteinAutomaton Java library here: 
 https://github.com/klawson88/LevenshteinAutomaton.
 The minimalistic directed acyclic graph (MDAG) which the automaton code uses 
 to store and step through word sets can be found here: 
 https://github.com/klawson88/MDAG
 *Transpositions aren't currently implemented. I hope the comment filled, 
 editing-friendly code combined with the fact that the section in the Mihov 
 paper detailing transpositions is only 2 pages makes adding the functionality 
 trivial.
 *As a result of support for on-the-fly manipulation, the MDAG 
 (dictionary-automaton) creation process incurs a slight speed penalty. In 
 order to have the best of both worlds, i'd recommend the addition of a 
 constructor which only takes sorted input. The complete, easy to follow 
 pseudo-code for the simple procedure can be found in the first article I 
 linked under the references section in the MDAG repository)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

[jira] [Commented] (LUCENE-4947) Java implementation (and improvement) of Levenshtein associated lexicon automata

26 matches

Site Navigation

Mail list logo

Footer information