Re: Contributing the Korean Analyzer

2013-04-24 Thread Christian Moen
Hello SooMyung,

Thanks a lot!  It will be great to get Korean supported out-of-the-box in 
Lucene/Solr.

In terms of process, I'll leave this to Steve Rowe, PMC Chair, to comment on, 
but a code grant process sounds likely.

I'm seeing that the code itself has an Apache License 2.0, but could you 
elaborate on where the dictionaries originate from and what kind of licensing 
terms that are applicable?

Many thanks,


Christian Moen

On Apr 24, 2013, at 2:05 PM, smlee0...@gmail.com wrote:

 Hello,
 
 I've developed the Korean Analyzer and distributed it since 2008.
 Many people who use lucene with korean use it.
 
 I posted it to the sourceforge (http://sourceforge.net/projects/lucenekorean)
 Here is the cvs address
 d:pserver:anonym...@lucenekorean.cvs.sourceforge.net:/cvsroot/lucenekorean
 
 KoreanAnalyzer consists of Korean Morphological Analyzer, Korean Dictionary 
 and Korean Filter.
 When using lucene with korean, One thinks of CJK Analyzer.
 But CJK Analyzer is improper for korean.
 
 Korean has a specific characteristic and is needed to analyze morpheme when 
 extracting the index keyword.
 Korean Analyzer has solved the problem with the Korean Morphological Analyzer.
 
 Korean Analyzer has also the feature of spliting compound noun.
 
 Now, I want to contribute the korean analyzer to the lucene project.
 Please let me know how to contribute it.
 
 If you want to check the source code, please visit the sourceforge cvs 
 repository.
 
 Best regards.
 -- 
 
 SooMyung Lee
 Director of Research Center
 Argonet co. ltd,
 
 Manager of Luene Korean Analyzer
 http://korlucene.naver.com
 
 Contact: +82-10-6480-5710



Re: Contributing the Korean Analyzer

2013-04-24 Thread Christian Moen
Hello Soomyung,

Thanks a lot for this.  This is very good news.

Let's await the PMC Chair's suggestion on next steps.  See LUCENE-3305 to get 
an idea how the process was for Japanese.

If the process goes well, I'm happy to see how I can set aside some time after 
Lucene Revolution to work on integrating this.

Best regards,

Christian Moen
アティリカ株式会社
http://www.atilika.com

On Apr 24, 2013, at 7:40 PM, 이수명 smlee0...@gmail.com wrote:

 Hello Christian.
 
 Thanks for your reply.
 I'm happy to hear about a code grant process.
 
 To make the dictionaries, I collected words itself and word features from 
 books and internet.
 And I organized all of the information that I collected to make the korean 
 morphological analyzer.
 Therefore the dictionaries is that I made.
 
 I think It is enough to attach a file(License Notice) that describe on where 
 the dictionaries originate from and the kind of licensing (Apache License 
 2.0).
 
 If it is not enough, please leave me a message and give me some guide.
 
 thanks.
 
 Soomyung Lee
 
 
 2013/4/24 Christian Moen c...@atilika.com
 Hello SooMyung,
 
 Thanks a lot!  It will be great to get Korean supported out-of-the-box in 
 Lucene/Solr.
 
 In terms of process, I'll leave this to Steve Rowe, PMC Chair, to comment on, 
 but a code grant process sounds likely.
 
 I'm seeing that the code itself has an Apache License 2.0, but could you 
 elaborate on where the dictionaries originate from and what kind of licensing 
 terms that are applicable?
 
 Many thanks,
 
 
 Christian Moen
 
 On Apr 24, 2013, at 2:05 PM, smlee0...@gmail.com wrote:
 
 Hello,
 
 I've developed the Korean Analyzer and distributed it since 2008.
 Many people who use lucene with korean use it.
 
 I posted it to the sourceforge (http://sourceforge.net/projects/lucenekorean)
 Here is the cvs address
 d:pserver:anonym...@lucenekorean.cvs.sourceforge.net:/cvsroot/lucenekorean
 
 KoreanAnalyzer consists of Korean Morphological Analyzer, Korean Dictionary 
 and Korean Filter.
 When using lucene with korean, One thinks of CJK Analyzer.
 But CJK Analyzer is improper for korean.
 
 Korean has a specific characteristic and is needed to analyze morpheme when 
 extracting the index keyword.
 Korean Analyzer has solved the problem with the Korean Morphological 
 Analyzer.
 
 Korean Analyzer has also the feature of spliting compound noun.
 
 Now, I want to contribute the korean analyzer to the lucene project.
 Please let me know how to contribute it.
 
 If you want to check the source code, please visit the sourceforge cvs 
 repository.
 
 Best regards.
 -- 
 SooMyung Lee
 Director of Research Center
 Argonet co. ltd,
 
 Manager of Luene Korean Analyzer
 http://korlucene.naver.com
 
 Contact: +82-10-6480-5710
 
 
 
 
 -- 
 SooMyung Lee
 Director of Research Center
 Argonet co. ltd,
 
 Manager of Luene Korean Analyzer
 http://korlucene.naver.com
 
 Contact: +82-10-6480-5710


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Contributing the Korean Analyzer

2013-04-24 Thread Steve Rowe
Hi Soomyung,

I agree with Christian, this sounds fantastic!

First, we need to know a couple things:

1. Are you the only author of the code?  We need to get agreement from all 
contributors.  (When I browse CVS on the SourceForge site, the only author I 
see is smlee0818, which I assume is you.)

2. Do you need permission from your employer to make this donation?  If so, 
we'll need your employer to submit a Corporate CLA (Contributor License 
Agreement)[1] before we can accept the donation.

To get started, the first step is creating a Lucene JIRA issue here: 
https://issues.apache.org/jira/browse/LUCENE - you'll need to create an ASF 
JIRA account first if you don't already have one: click the Log In link at 
the top right of the page, then click the Sign up link where it says Not a 
member? Sign up for an account.

Once you've created a JIRA issue, you should make a compressed tarball of 
everything you want to contribute - as far as I can tell, this is everything in 
the lucenekorean sourceforge project in CVS under modules kr.dictionary, 
kr.analysis.4x, and kr.morph - and then attach it to the JIRA issue, with 
the MD5 hash for the tarball in the comment that you provide when you attach 
the tarball to the issue.

Once you've created the JIRA issue and attached your contribution, we can make 
progress on further steps that need to be taken: you should submit an 
individual CLA[2] and a code grant[3], and I (in my role as Lucene PMC chair) 
will be managing the IP clearance process[4][5].

See http://wiki.apache.org/lucene-java/HowToContribute for more information 
about contributing.

I look forward to working with you on this - thank you for contributing!

Steve

[1] http://www.apache.org/licenses/cla-corporate.txt
[1] http://www.apache.org/licenses/icla.txt
[2] http://www.apache.org/licenses/software-grant.txt
[3] http://incubator.apache.org/ip-clearance/index.html
[4] http://incubator.apache.org/ip-clearance/ip-clearance-template.html

On Apr 24, 2013, at 7:00 AM, Christian Moen c...@atilika.com wrote:

 Hello Soomyung,
 
 Thanks a lot for this.  This is very good news.
 
 Let's await the PMC Chair's suggestion on next steps.  See LUCENE-3305 to get 
 an idea how the process was for Japanese.
 
 If the process goes well, I'm happy to see how I can set aside some time 
 after Lucene Revolution to work on integrating this.
 
 Best regards,
 
 Christian Moen
 アティリカ株式会社
 http://www.atilika.com
 
 On Apr 24, 2013, at 7:40 PM, 이수명 smlee0...@gmail.com wrote:
 
 Hello Christian.
 
 Thanks for your reply.
 I'm happy to hear about a code grant process.
 
 To make the dictionaries, I collected words itself and word features from 
 books and internet.
 And I organized all of the information that I collected to make the korean 
 morphological analyzer.
 Therefore the dictionaries is that I made.
 
 I think It is enough to attach a file(License Notice) that describe on where 
 the dictionaries originate from and the kind of licensing (Apache License 
 2.0).
 
 If it is not enough, please leave me a message and give me some guide.
 
 thanks.
 
 Soomyung Lee
 
 
 2013/4/24 Christian Moen c...@atilika.com
 Hello SooMyung,
 
 Thanks a lot!  It will be great to get Korean supported out-of-the-box in 
 Lucene/Solr.
 
 In terms of process, I'll leave this to Steve Rowe, PMC Chair, to comment 
 on, but a code grant process sounds likely.
 
 I'm seeing that the code itself has an Apache License 2.0, but could you 
 elaborate on where the dictionaries originate from and what kind of 
 licensing terms that are applicable?
 
 Many thanks,
 
 
 Christian Moen
 
 On Apr 24, 2013, at 2:05 PM, smlee0...@gmail.com wrote:
 
 Hello,
 
 I've developed the Korean Analyzer and distributed it since 2008.
 Many people who use lucene with korean use it.
 
 I posted it to the sourceforge 
 (http://sourceforge.net/projects/lucenekorean)
 Here is the cvs address
 d:pserver:anonym...@lucenekorean.cvs.sourceforge.net:/cvsroot/lucenekorean
 
 KoreanAnalyzer consists of Korean Morphological Analyzer, Korean Dictionary 
 and Korean Filter.
 When using lucene with korean, One thinks of CJK Analyzer.
 But CJK Analyzer is improper for korean.
 
 Korean has a specific characteristic and is needed to analyze morpheme when 
 extracting the index keyword.
 Korean Analyzer has solved the problem with the Korean Morphological 
 Analyzer.
 
 Korean Analyzer has also the feature of spliting compound noun.
 
 Now, I want to contribute the korean analyzer to the lucene project.
 Please let me know how to contribute it.
 
 If you want to check the source code, please visit the sourceforge cvs 
 repository.
 
 Best regards.
 -- 
 SooMyung Lee
 Director of Research Center
 Argonet co. ltd,
 
 Manager of Luene Korean Analyzer
 http://korlucene.naver.com
 
 Contact: +82-10-6480-5710
 
 
 
 
 -- 
 SooMyung Lee
 Director of Research Center
 Argonet co. ltd,
 
 Manager of Luene Korean Analyzer
 http://korlucene.naver.com
 
 Contact: +82-10-6480-5710