Re: Regarding performance of opennlp entity extraction modals
Hello, I don't have any numbers for you. The performance depends highly on the model you are using, the configured feature generation and the number of features in your training data. To get a good number you probably have to run a test on your machines. All modern CPUs have multiple cores these days, so you can run the same process once per core. Other things which might limit your throughput are the way you read the text data and store the results. Jörn On Mon, 2015-03-16 at 19:04 +0530, Anuj Chopra wrote: hi, i wanted some information regarding the performance of opennlp entity extraction modals in documents/seconds and Mb/seconds. Currently I am using person, location, organisation and money extraction modals. If possible, please tell the speeds when combination of modals is used too. Thank you -anuj chopra signature.asc Description: This is a digitally signed message part
Re: Student looking to contribute toward OpenNLP
Okay, I have no problem with that. I'll look over some other issues. In the meantime, I think I would like to work on medical de-identification. How would I go about starting this work? What all would I need to know? On Mon, Mar 16, 2015 at 7:15 PM, Joern Kottmann kottm...@gmail.com wrote: Hello, thanks for your interest in OpenNLP. We already have a lot of candidates for those GSOC issues. You are welcome to suggest something you would like to work on here on the dev list, create an issue for it and contribute some code to solve it. The best way to get started is probably to look for an existing issue which sounds like you can tackle it and send us a patch for it. A good way to get started is probably to add support for a new corpus to OpenNLP. This teaches you many basics about on how to train the components. HTH, Jörn On Mon, 2015-03-16 at 09:34 +0530, Rohit Shinde wrote: Hello everyone, I still haven't got a reply to my previous email and I would really appreciate a reply to that. I would like to contribute as soon as possible. Thank you.
Re: Student looking to contribute toward OpenNLP
Hello, thanks for your interest in OpenNLP. We already have a lot of candidates for those GSOC issues. You are welcome to suggest something you would like to work on here on the dev list, create an issue for it and contribute some code to solve it. The best way to get started is probably to look for an existing issue which sounds like you can tackle it and send us a patch for it. A good way to get started is probably to add support for a new corpus to OpenNLP. This teaches you many basics about on how to train the components. HTH, Jörn On Mon, 2015-03-16 at 09:34 +0530, Rohit Shinde wrote: Hello everyone, I still haven't got a reply to my previous email and I would really appreciate a reply to that. I would like to contribute as soon as possible. Thank you. signature.asc Description: This is a digitally signed message part
Regarding performance of opennlp entity extraction modals
hi, i wanted some information regarding the performance of opennlp entity extraction modals in documents/seconds and Mb/seconds. Currently I am using person, location, organisation and money extraction modals. If possible, please tell the speeds when combination of modals is used too. Thank you -anuj chopra
Re: Student looking to contribute toward OpenNLP
I would certainly like to get involved in this then. I looked over the paper and its results were highly positive. So does this mean that we would be implementing their model that gave such good results? Also, I was looking at the OpenNLP issues on the JIRA page and I really liked this one-- https://issues.apache.org/jira/browse/OPENNLP-757 Could you tell me more about that issue? Could I work on it if possible? I don't mind working on either project. On Mon, Mar 16, 2015 at 11:59 AM, andy mcmurry mcmurry.a...@gmail.com wrote: Opennlp is a standard lib used by many apache NLP projects. The clinical text engine (ctakes.apache.org) is one such use of open NLP. There is a medical data privacy engine (de-identification) that does medical concept recognition and privacy features described in the paper. We used it to conduct some medical studies. Dev list committers: I'm speaking up because this potential student is looking for a project, and hasn't yet found one. We could certainly use the help if rohit is interested. On Mar 15, 2015 10:13 PM, Rohit Shinde rohit.shinde12...@gmail.com wrote: Could you please elaborate a bit more on this? I didn't really get this. What exactly is de-identification? And what do you mean by apache sandbox? Thank you. On Mon, Mar 16, 2015 at 10:21 AM, andy mcmurry mcmurry.a...@gmail.com wrote: How about a project based on open NLP that is still in apache sandbox? http://www.biomedcentral.com/1472-6947/13/112 Hello everyone, I still haven't got a reply to my previous email and I would really appreciate a reply to that. I would like to contribute as soon as possible. Thank you.
Re: Student looking to contribute toward OpenNLP
Opennlp is a standard lib used by many apache NLP projects. The clinical text engine (ctakes.apache.org) is one such use of open NLP. There is a medical data privacy engine (de-identification) that does medical concept recognition and privacy features described in the paper. We used it to conduct some medical studies. Dev list committers: I'm speaking up because this potential student is looking for a project, and hasn't yet found one. We could certainly use the help if rohit is interested. On Mar 15, 2015 10:13 PM, Rohit Shinde rohit.shinde12...@gmail.com wrote: Could you please elaborate a bit more on this? I didn't really get this. What exactly is de-identification? And what do you mean by apache sandbox? Thank you. On Mon, Mar 16, 2015 at 10:21 AM, andy mcmurry mcmurry.a...@gmail.com wrote: How about a project based on open NLP that is still in apache sandbox? http://www.biomedcentral.com/1472-6947/13/112 Hello everyone, I still haven't got a reply to my previous email and I would really appreciate a reply to that. I would like to contribute as soon as possible. Thank you.