Re: Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Joern Kottmann
Hello,

I don't have any numbers for you. The performance depends highly on the
model you are using, the configured feature generation and the number of
features in your training data.

To get a good number you probably have to run a test on your machines.
All modern CPUs have multiple cores these days, so you can run the same
process once per core.

Other things which might limit your throughput are the way you read the
text data and store the results.

Jörn

On Mon, 2015-03-16 at 19:04 +0530, Anuj Chopra wrote:
 hi,
 i wanted some information regarding the performance of opennlp entity
 extraction modals in documents/seconds and Mb/seconds.
 Currently I am using person, location, organisation and money extraction
 modals.
 If possible, please tell the speeds when combination of modals is used too.
 Thank you
 -anuj chopra



signature.asc
Description: This is a digitally signed message part


Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Rohit Shinde
Okay, I have no problem with that. I'll look over some other issues.

In the meantime, I think I would like to work on medical de-identification.
How would I go about starting this work? What all would I need to know?

On Mon, Mar 16, 2015 at 7:15 PM, Joern Kottmann kottm...@gmail.com wrote:

 Hello,

 thanks for your interest in OpenNLP. We already have a lot of candidates
 for those GSOC issues.

 You are welcome to suggest something you would like to work on here on
 the dev list, create an issue for it and contribute some code to solve
 it.

 The best way to get started is probably to look for an existing issue
 which sounds like you can tackle it and send us a patch for it.

 A good way to get started is probably to add support for a new corpus to
 OpenNLP. This teaches you many basics about on how to train the
 components.

 HTH,
 Jörn

 On Mon, 2015-03-16 at 09:34 +0530, Rohit Shinde wrote:
  Hello everyone,
 
  I still haven't got a reply to my previous email and I would really
  appreciate a reply to that.
 
  I would like to contribute as soon as possible.
 
  Thank you.




Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Joern Kottmann
Hello,

thanks for your interest in OpenNLP. We already have a lot of candidates
for those GSOC issues.

You are welcome to suggest something you would like to work on here on
the dev list, create an issue for it and contribute some code to solve
it.

The best way to get started is probably to look for an existing issue
which sounds like you can tackle it and send us a patch for it.

A good way to get started is probably to add support for a new corpus to
OpenNLP. This teaches you many basics about on how to train the
components.

HTH,
Jörn

On Mon, 2015-03-16 at 09:34 +0530, Rohit Shinde wrote:
 Hello everyone,
 
 I still haven't got a reply to my previous email and I would really
 appreciate a reply to that.
 
 I would like to contribute as soon as possible.
 
 Thank you.



signature.asc
Description: This is a digitally signed message part


Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Anuj Chopra
hi,
i wanted some information regarding the performance of opennlp entity
extraction modals in documents/seconds and Mb/seconds.
Currently I am using person, location, organisation and money extraction
modals.
If possible, please tell the speeds when combination of modals is used too.
Thank you
-anuj chopra


Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Rohit Shinde
I would certainly like to get involved in this then.

I looked over the paper and its results were highly positive. So does this
mean that we would be implementing their model that gave such good results?

Also, I was looking at the OpenNLP issues on the JIRA page and I really
liked this one-- https://issues.apache.org/jira/browse/OPENNLP-757

Could you tell me more about that issue? Could I work on it if possible?

I don't mind working on either project.

On Mon, Mar 16, 2015 at 11:59 AM, andy mcmurry mcmurry.a...@gmail.com
wrote:

 Opennlp is a standard lib used by many apache NLP projects. The clinical
 text engine (ctakes.apache.org) is one such use of open NLP. There is a
 medical data privacy engine (de-identification) that does medical concept
 recognition and privacy features described in the paper. We used it to
 conduct some medical studies.

 Dev list committers: I'm speaking up because this potential student is
 looking for a project, and hasn't yet found one. We could certainly use the
 help if rohit is interested.
 On Mar 15, 2015 10:13 PM, Rohit Shinde rohit.shinde12...@gmail.com
 wrote:

  Could you please elaborate a bit more on this? I didn't really get this.
  What exactly is de-identification?
 
  And what do you mean by apache sandbox?
 
  Thank you.
 
  On Mon, Mar 16, 2015 at 10:21 AM, andy mcmurry mcmurry.a...@gmail.com
  wrote:
 
   How about a project based on open NLP that is still in apache sandbox?
  
   http://www.biomedcentral.com/1472-6947/13/112
   Hello everyone,
  
   I still haven't got a reply to my previous email and I would really
   appreciate a reply to that.
  
   I would like to contribute as soon as possible.
  
   Thank you.
  
 



Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread andy mcmurry
Opennlp is a standard lib used by many apache NLP projects. The clinical
text engine (ctakes.apache.org) is one such use of open NLP. There is a
medical data privacy engine (de-identification) that does medical concept
recognition and privacy features described in the paper. We used it to
conduct some medical studies.

Dev list committers: I'm speaking up because this potential student is
looking for a project, and hasn't yet found one. We could certainly use the
help if rohit is interested.
On Mar 15, 2015 10:13 PM, Rohit Shinde rohit.shinde12...@gmail.com
wrote:

 Could you please elaborate a bit more on this? I didn't really get this.
 What exactly is de-identification?

 And what do you mean by apache sandbox?

 Thank you.

 On Mon, Mar 16, 2015 at 10:21 AM, andy mcmurry mcmurry.a...@gmail.com
 wrote:

  How about a project based on open NLP that is still in apache sandbox?
 
  http://www.biomedcentral.com/1472-6947/13/112
  Hello everyone,
 
  I still haven't got a reply to my previous email and I would really
  appreciate a reply to that.
 
  I would like to contribute as soon as possible.
 
  Thank you.