2011/6/22 Jörn Kottmann <[email protected]>:
> On 6/11/11 5:06 PM, Olivier Grisel wrote:
>>
>> 2011/6/11 Grant Ingersoll<[email protected]>:
>>>
>>> I signed off on the BR, but a couple of questions:
>>>
>>> What do we need to do on the IP front?  Is that really a blocker for
>>> graduation?
>>>
>>> Also, I don't think the regression tests are a blocker for graduation.
>>>
>>> I did add that we need to find some more candidates for
>>> contributions/committership, which I do think is a blocker.
>>
>> I am willing to be a new candidate for committership if the opennlp
>> devs judge that the corpus-refiner tooling introduced in the other
>> thread would fit somewhere somewhere in the project (probably as a new
>> maven artifact).
>>
>> I assume that Hannes might be interested as well.
>
> Nice, lets open a new thread and speak it bit more about this contribution,
> is it something new you want to work on, or do you speak about contributing
> an existing code base?

There is indeed a proof of concept here:

  
https://github.com/ogrisel/bbuzz-semantic-hackathon/tree/master/corpus-refiner/

Currently there is only a basic command line interface. I plan to work
on a SWING version too and Hannes started to work on a HTML /
Javascript frontend.

I think the existing corpus refiner need to be able to store the
validations / corrections in a separate file or database (e.g. derby)
+ another tool to take a OpenNLP formatted corpus + a validation DB to
generate a new version of the corpus file

I could also contribute my pig scripts and UDF from
https://github.com/ogrisel/pignlproc [1] but I feel that soon enough
Spark [2] will be mature enough enough to rewrite them in scala.
Sparse just lacks an efficient JOIN/COGROUP operation to be able to do
so but this will probably soon be the case [3]. So I suggest that we
wait before considering a contribution of pignlproc code base to
opennlp.

[1] https://github.com/ogrisel/pignlproc
[2] http://www.spark-project.org/
[3] https://github.com/mesos/spark/issues/4

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to