Re: French model for POS tagging with OpenNLP

2012-01-20 Thread Jason Baldridge
I've added you.

+1 to adding others from the team. Just send your github accounts and I'll
add you.

On Fri, Jan 20, 2012 at 7:58 AM, Jörn Kottmann  wrote:

> On 1/20/12 2:46 PM, Jason Baldridge wrote:
>
>> Great! I'm thinking we should have some (perhaps minimal) documentation
>> for
>> each model (or set of models). E.g. how to obtain the data, how to support
>> the format, etc.
>>
>> Let me know your github account and I'll add you.
>>
>
> My github user is "kottmann".
>
> We should add anyone else in the team who is interested.
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: French model for POS tagging with OpenNLP

2012-01-20 Thread Jörn Kottmann

On 1/20/12 2:46 PM, Jason Baldridge wrote:

Great! I'm thinking we should have some (perhaps minimal) documentation for
each model (or set of models). E.g. how to obtain the data, how to support
the format, etc.

Let me know your github account and I'll add you.


My github user is "kottmann".

We should add anyone else in the team who is interested.

Jörn


Re: French model for POS tagging with OpenNLP

2012-01-20 Thread Jason Baldridge
Great! I'm thinking we should have some (perhaps minimal) documentation for
each model (or set of models). E.g. how to obtain the data, how to support
the format, etc.

Let me know your github account and I'll add you.

On Fri, Jan 20, 2012 at 4:03 AM, Jörn Kottmann  wrote:

> On 1/19/12 11:46 PM, Jason Baldridge wrote:
>
>> To begin the process of getting models organized and associated with clear
>> permissions, I've started the following GitHub repo:
>>
>> https://github.com/utcompling/**OpenNLP-Models
>>
>
> Jason, that is great, I would like to contribute and
> provide updates for all the models on the SourceForge site.
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: French model for POS tagging with OpenNLP

2012-01-20 Thread Jörn Kottmann

The best way to support the French Treebank is to
add format support for it directly to OpenNLP.

See this jira:
https://issues.apache.org/jira/browse/OPENNLP-342

The French Treebank can be obtained for free and with
format support you can retrain the OpenNLP models yourself.
This can be interesting for various reasons, e.g. different feature 
generation,

mixing with custom private data, new incompatible version of OpenNLP, etc.

Jörn

On 1/19/12 11:28 PM, Nicolas Hernandez wrote:

Hi Robert

We used (and still use) the French Treebank (Paris 7 Abeille) for building
machine learning models for (pre)processing French and some of them for
OpenNLP.
I say 'still use' because the French Treebank is not always consistent and
we are trying "to correct it" in some way.

About the release of the models.
Righ now, due to an unclear corpus license, the models we build are only
available for research purpose.
We are trying to see if we can release them under Apache License.
This objective is on its way.

To download them.
We do not have yet a dedicated web page for downloading the models we built
so far (even if you may find some of them already present on the web...).
If you are interested in, I can send them to you.

Best

On Thu, Jan 19, 2012 at 11:08 PM, Jason Baldridge
wrote:


Unfortunately, there is no data I'm aware of for training models for
French. There are efforts underway to get multilingual annotations going on
unrestricted texts, but they are still in the sandbox. Help with those
would be welcome!

On Thu, Jan 19, 2012 at 10:27 AM, Robert VISEUR
wrote:
Hi,

We are actually using OpenNLP for POS tagging tasks (with news articles).
Part of the articles are in French, and I see there wasn't french POS
tagging model in the common OpenNLP package. Do you know a French public
model for POS tagging in Open NLP ?

Thanks,
Best regards,
Robert.




--
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge








Re: French model for POS tagging with OpenNLP

2012-01-20 Thread Jörn Kottmann

On 1/19/12 11:46 PM, Jason Baldridge wrote:

To begin the process of getting models organized and associated with clear
permissions, I've started the following GitHub repo:

https://github.com/utcompling/OpenNLP-Models


Jason, that is great, I would like to contribute and
provide updates for all the models on the SourceForge site.

Jörn


Re: French model for POS tagging with OpenNLP

2012-01-19 Thread Jason Baldridge
That's great to hear. I thought the French Treebank licensing was pretty
clear about how artifacts that could be trained on it could be used. Please
keep us informed about the French data situation!

FWIW, while I very much want to see the creation of unrestricted data with
unrestricted annotations, I implore anyone who does find any models that
have been trained on a restricted corpus like that to only use them in
accordance with the wishes of the copyright holders. It's not just the
legally correct thing to do, but also the morally correct thing to do.

To begin the process of getting models organized and associated with clear
permissions, I've started the following GitHub repo:

https://github.com/utcompling/OpenNLP-Models

Nothing there yet, but there will be Norwegian models fairly soon. Any help
with getting more languages in there, or help with getting things set up in
general is most welcome!

-Jason

On Thu, Jan 19, 2012 at 4:28 PM, Nicolas Hernandez <
nicolas.hernan...@gmail.com> wrote:

> Hi Robert
>
> We used (and still use) the French Treebank (Paris 7 Abeille) for building
> machine learning models for (pre)processing French and some of them for
> OpenNLP.
> I say 'still use' because the French Treebank is not always consistent and
> we are trying "to correct it" in some way.
>
> About the release of the models.
> Righ now, due to an unclear corpus license, the models we build are only
> available for research purpose.
> We are trying to see if we can release them under Apache License.
> This objective is on its way.
>
> To download them.
> We do not have yet a dedicated web page for downloading the models we
> built so far (even if you may find some of them already present on the
> web...).
> If you are interested in, I can send them to you.
>
> Best
>
> On Thu, Jan 19, 2012 at 11:08 PM, Jason Baldridge <
> jasonbaldri...@gmail.com> wrote:
>
>> Unfortunately, there is no data I'm aware of for training models for
>> French. There are efforts underway to get multilingual annotations going
>> on
>> unrestricted texts, but they are still in the sandbox. Help with those
>> would be welcome!
>>
>> On Thu, Jan 19, 2012 at 10:27 AM, Robert VISEUR > >wrote:
>>
>> > Hi,
>> >
>> > We are actually using OpenNLP for POS tagging tasks (with news
>> articles).
>> > Part of the articles are in French, and I see there wasn't french POS
>> > tagging model in the common OpenNLP package. Do you know a French public
>> > model for POS tagging in Open NLP ?
>> >
>> > Thanks,
>> > Best regards,
>> > Robert.
>> >
>>
>>
>>
>> --
>> Jason Baldridge
>> Associate Professor, Department of Linguistics
>> The University of Texas at Austin
>> http://www.jasonbaldridge.com
>> http://twitter.com/jasonbaldridge
>>
>
>
>
> --
> Dr. Nicolas Hernandez
> Associate Professor (Maître de Conférences)
> Université de Nantes - LINA CNRS
> http://enicolashernandez.blogspot.com
> http://www.univ-nantes.fr/hernandez-n
> +33 (0)2 51 12 53 94
> +33 (0)2 40 30 60 67
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: French model for POS tagging with OpenNLP

2012-01-19 Thread Nicolas Hernandez
Hi Robert

We used (and still use) the French Treebank (Paris 7 Abeille) for building
machine learning models for (pre)processing French and some of them for
OpenNLP.
I say 'still use' because the French Treebank is not always consistent and
we are trying "to correct it" in some way.

About the release of the models.
Righ now, due to an unclear corpus license, the models we build are only
available for research purpose.
We are trying to see if we can release them under Apache License.
This objective is on its way.

To download them.
We do not have yet a dedicated web page for downloading the models we built
so far (even if you may find some of them already present on the web...).
If you are interested in, I can send them to you.

Best

On Thu, Jan 19, 2012 at 11:08 PM, Jason Baldridge
wrote:

> Unfortunately, there is no data I'm aware of for training models for
> French. There are efforts underway to get multilingual annotations going on
> unrestricted texts, but they are still in the sandbox. Help with those
> would be welcome!
>
> On Thu, Jan 19, 2012 at 10:27 AM, Robert VISEUR  >wrote:
>
> > Hi,
> >
> > We are actually using OpenNLP for POS tagging tasks (with news articles).
> > Part of the articles are in French, and I see there wasn't french POS
> > tagging model in the common OpenNLP package. Do you know a French public
> > model for POS tagging in Open NLP ?
> >
> > Thanks,
> > Best regards,
> > Robert.
> >
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>



-- 
Dr. Nicolas Hernandez
Associate Professor (Maître de Conférences)
Université de Nantes - LINA CNRS
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
+33 (0)2 51 12 53 94
+33 (0)2 40 30 60 67


Re: French model for POS tagging with OpenNLP

2012-01-19 Thread Jason Baldridge
Unfortunately, there is no data I'm aware of for training models for
French. There are efforts underway to get multilingual annotations going on
unrestricted texts, but they are still in the sandbox. Help with those
would be welcome!

On Thu, Jan 19, 2012 at 10:27 AM, Robert VISEUR wrote:

> Hi,
>
> We are actually using OpenNLP for POS tagging tasks (with news articles).
> Part of the articles are in French, and I see there wasn't french POS
> tagging model in the common OpenNLP package. Do you know a French public
> model for POS tagging in Open NLP ?
>
> Thanks,
> Best regards,
> Robert.
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge