Re: Use of Unicode data in Lucene
: I can implement the functionality just using the data tables from the Unicode : Consortium, including http://www.unicode.org/reports/tr39, but there's still : the issue of the Unicode data license and its compatibility with Apache 2.0. : : Does anybody know whether http://www.unicode.org/copyright.html creates an : issue? What's the process for vetting a license? Or is this something I should : be posting to a different list? The authoritative docs to be familiar with are... http://www.apache.org/legal/3party.html and http://www.apache.org/legal/resolved.html ..but it's not clear to me exactly where the Unicode copyright/licenseing rules fall into the spectrum. The best place to ask questions about license compatibility issues is legal-disc...@apache (i'm pretty sure Ken already found that out since he posted there, just mentioning it for anyone else who might be interested) -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Use of Unicode data in Lucene
Hi all, I've started working on something similar to https://issues.apache.org/jira/browse/LUCENE-1343, which is about creating a better (more universal) normalizer for words that look the same. I'd like to avoid the dependency on ICU4J, which (I think) would otherwise prevent the code from being part of the core - due to license issues, it would have to languish in contrib. I can implement the functionality just using the data tables from the Unicode Consortium, including http://www.unicode.org/reports/tr39, but there's still the issue of the Unicode data license and its compatibility with Apache 2.0. Does anybody know whether http://www.unicode.org/copyright.html creates an issue? What's the process for vetting a license? Or is this something I should be posting to a different list? Thanks, -- Ken -- Ken Krugler +1 530-210-6378 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Use of Unicode data in Lucene
Ken, Just my opinion here... i work with a lot of multilingual data with lucene. I can't imagine many serious real-world applications doing things such as search that wouldn't need ICU for something anyway... even if its not the lucene piece requiring it... I hope this doesn't discourage you from doing what you are trying to do... just my opinion. Maybe the JDK will catch up sometime soon and this won't be an issue for long. On Wed, Feb 25, 2009 at 3:22 PM, Ken Krugler kkrugler_li...@transpac.comwrote: Hi all, I've started working on something similar to https://issues.apache.org/jira/browse/LUCENE-1343, which is about creating a better (more universal) normalizer for words that look the same. I'd like to avoid the dependency on ICU4J, which (I think) would otherwise prevent the code from being part of the core - due to license issues, it would have to languish in contrib. I can implement the functionality just using the data tables from the Unicode Consortium, including http://www.unicode.org/reports/tr39, but there's still the issue of the Unicode data license and its compatibility with Apache 2.0. Does anybody know whether http://www.unicode.org/copyright.html creates an issue? What's the process for vetting a license? Or is this something I should be posting to a different list? Thanks, -- Ken -- Ken Krugler +1 530-210-6378 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Robert Muir rcm...@gmail.com