Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-16 Thread Richard Eckart de Castilho
Hi William,

> On 16.05.2017, at 14:35, William Colen  wrote:
> 
> I cloned DKPro code and tried Rodrigo proposed changes. Your test passes
> with it.

cool :) 

Would you like to contribute the changes to DKPro Core?

Cheers,

-- Richard


[GitHub] opennlp pull request #202: OPENNLP-1061 Add functionality to DictionaryLemma...

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/202


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] opennlp pull request #203: OPENNLP-1062: Add lemmatizer eval tests

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-16 Thread William Colen
Hi Richard,

I cloned DKPro code and tried Rodrigo proposed changes. Your test passes
with it.

Thank you
William

2017-05-15 18:51 GMT-03:00 Rodrigo Agerri :

> Hello Richard,
>
> I have tried with various corpora, including GUM, but I cannot reproduce
> that error.
>
> https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41
> d5027bddad
>
> Please note that commit O-904 changed (broke) the lemmatizer API
> substantially to make it uniform between DictionaryLemmatizer and the
> LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so
> that this line for tagging with the LemmatizerME is not required:
>
> https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d
> ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/
> tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135
>
> Also, that commit changed the LemmaSampleStream and LemmaSample classes, so
> it is possible that is affecting this class:
>
> https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d
> ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/
> tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java
>
> I understand the logic of this class correctly as it stands it will take an
> already encoded SES and will try to encoded it again?
>
> Could you please take a look and see if that could be the problem?
>
> Cheers,
>
> Rodrigo
>
> On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho <
> r...@apache.org>
> wrote:
>
> > > On 15.05.2017, at 16:35, Joern Kottmann  wrote:
> > >
> > > Richard, I believe I found the problem with the parser, would you mind
> to
> > > take a look?
> > >
> > > This PR should fix it:
> > > https://github.com/apache/opennlp/pull/199
> >
> > The parser test works nicely with the PR.
> >
> > The lemmatizer test still behaves strange.
> >
> > Cheers,
> >
> > -- Richard
> >
> >
>


[GitHub] opennlp pull request #202: OPENNLP-1061 Add functionality to DictionaryLemma...

2017-05-16 Thread ragerri
GitHub user ragerri opened a pull request:

https://github.com/apache/opennlp/pull/202

OPENNLP-1061 Add functionality to DictionaryLemmatizer to output seve…

…ral lemmas for a given word postag pair

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [X ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ X] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ X] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ X] Is your initial contribution a single, squashed commit?

### For code changes:
- [ X] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ X] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ragerri/opennlp opennlp-1061

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #202


commit f9753001dd1380c58467ae19e3294912b81f
Author: Rodrigo Agerri 
Date:   2017-05-16T10:35:22Z

OPENNLP-1061 Add functionality to DictionaryLemmatizer to output several 
lemmas for a given word postag pair




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Parser too slow

2017-05-16 Thread ajay kumar
Hello ,

I am doing some text mining experiments with thousand of English documents.

While all the openNLP components works fine the parser gets stuck at few
files and even after days of waiting it does not complete the parsing.

Any tuning can be done to skip  such files or boost the performance?
I have attached one such file.
Any help is

* appreciated*

*Thank you*
Choose category 1080p hd ( 1 ) 1100mm ( 1 ) 110th anniversay ( 1 ) 13 mp camera 
( 1 ) 13mp camera ( 2 ) 15z ultrabook ( 1 ) 190exp ( 1 ) 2 guns ( 1 ) 20 
megapixel ( 1 ) 20.7 megapixel ( 1 ) 2000 watts ( 1 ) 2012 amg customer ( 1 ) 
2012 calendar ( 1 ) 2012 ford year ( 1 ) 2012 zeighest ( 1 ) 2013 avalon ( 1 ) 
2013 beetle ( 1 ) 2013 bmw r1200 ( 1 ) 2013 bmw z4 ( 1 ) 2013 buick enclave ( 7 
) 2013 buick encore ( 3 ) 2013 buick verano ( 1 ) 2013 c4 picasso ( 1 ) 2013 
chevrolet silverado ( 1 ) 2013 chevrolet traverse ( 3 ) 2013 civic ( 1 ) 2013 
civic si ( 1 ) 2013 civic si coupe ( 1 ) 2013 cr v ( 1 ) 2013 dodge challenger 
( 1 ) 2013 dodge charger ( 1 ) 2013 ford escape ( 1 ) 2013 ford explorer ( 1 ) 
2013 ford focus ( 2 ) 2013 gmc acadia ( 1 ) 2013 gmc terrain ( 4 ) 2013 honda 
accord ( 2 ) 2013 honda cbr600rr ( 1 ) 2013 honda civic ( 3 ) 2013 honda civic 
si ( 1 ) 2013 honda crv ( 1 ) 2013 honda odyssey ( 1 ) 2013 hyundai azera ( 1 ) 
2013 hyundai elantra ( 1 ) 2013 hyundai equus ( 3 ) 2013 hyundai genesis ( 4 ) 
2013 hyundai santa fe ( 1 ) 2013 hyundai sonata ( 2 ) 2013 hyundai veloster ( 1 
) 2013 international van ( 1 ) 2013 jetta tdi ( 1 ) 2013 kia optima ( 1 ) 2013 
kia sorento ( 2 ) 2013 lexus gs ( 1 ) 2013 malibu ( 1 ) 2013 mazda2 ( 2 ) 2013 
model ( 9 ) 2013 nissan nv ( 2 ) 2013 passat ( 1 ) 2013 pathfinder ( 1 ) 2013 
raw4 ( 1 ) 2013 rt281 flr ( 1 ) 2013 santa fe ( 4 ) 2013 shot show ( 4 ) 2013 
sonata ( 1 ) 2013 sprinter ( 1 ) 2013 subaru impreza ( 1 ) 2013 superbike ( 2 ) 
2013 toyota avalon ( 5 ) 2013 toyota camry ( 1 ) 2013 toyota rav4 ( 1 ) 2013 
volkswagen ( 1 ) 2013 volkswagen beetle ( 2 ) 2013 volkswagen jetta ( 2 ) 2013 
volkswagen passat ( 1 ) 2013 xv crosstrek ( 1 ) 2014 audi models ( 1 ) 2014 
audi r8 ( 1 ) 2014 audi rs7 ( 1 ) 2014 buick lacrosse ( 1 ) 2014 cadillac cts ( 
2 ) 2014 cadillac elr ( 2 ) 2014 chevrolet corvette ( 4 ) 2014 chevrolet 
silverado ( 5 ) 2014 chevrolet ss ( 1 ) 2014 chevy silverado ( 1 ) 2014 
corvette ( 1 ) 2014 corvette stingray ( 1 ) 2014 cts sedan ( 1 ) 2014 fiesta ( 
2 ) 2014 ford transit ( 1 ) 2014 gmc sierra ( 5 ) 2014 highlander ( 1 ) 2014 
honda accord ( 2 ) 2014 honda odyssey ( 4 ) 2014 hyundai sonata ( 1 ) 2014 
kawasaki z1000 ( 1 ) 2014 kia carenza ( 1 ) 2014 kia forte ( 3 ) 2014 kia 
sorento ( 3 ) 2014 lexus is ( 3 ) 2014 mazda6 ( 3 ) 2014 mitsubishi outlander ( 
1 ) 2014 model ( 1 ) 2014 models ( 3 ) 2014 odyssey ( 1 ) 2014 pioneer ( 2 ) 
2014 porsche cayman ( 1 ) 2014 raider scl ( 1 ) 2014 silverado ( 4 ) 2014 spark 
ev ( 1 ) 2014 stingray convertible ( 1 ) 2014 subaru forester ( 2 ) 2014 toyota 
( 1 ) 2014 transit connect ( 1 ) 2014 volvo models ( 1 ) 2014 yamaha fz1 ( 1 ) 
2015 ATS Coupe ( 1 ) 2015 audi a3 ( 1 ) 2015 chevrolet suburban ( 2 ) 2015 
chevrolet tahoe ( 3 ) 2015 Chrysler 300 ( 1 ) 2015 Commemorative Edition ( 1 ) 
2015 corvette z06 ( 1 ) 2015 ford f 150 ( 1 ) 2015 golf tdi ( 1 ) 2015 Honda 
CRV ( 1 ) 2015 kia k900 ( 1 ) 2015 YZF-R1 ( 1 ) 2016 Kia Sorento ( 1 ) 2016 
Nissan TITAN ( 1 ) 2016 Toyota Mirai ( 1 ) 208 hybrid fe ( 1 ) 208dbk 15x ( 1 ) 
21 inch ( 1 ) 23m touch monitor ( 1 ) 24 megapixel ( 1 ) 27 desktop ( 1 ) 27 
inch ( 1 ) 300 ( 1 ) 300 cl ( 1 ) 3000 lumens ( 1 ) 3d aerobatic airplane ( 1 ) 
3d airplane ( 1 ) 3d audio technology ( 1 ) 3d biplane ( 1 ) 3d blu ray ( 1 ) 
3d creations ( 1 ) 3d hdmi ( 1 ) 3d hdtvs ( 1 ) 3D mapping video ( 1 ) 3d 
navigation ( 1 ) 3d object ( 1 ) 3d printing ( 2 ) 3d printing support ( 1 ) 3d 
projector ( 1 ) 3d supernatural ( 1 ) 3d surround ( 1 ) 3d systems cube ( 1 ) 
3dtv player ( 2 ) 3g 4g connectivity ( 1 ) 3g option ( 1 ) 3G Phone ( 1 ) 3g 
service ( 1 ) 3mode pack ( 1 ) 3tb drive ( 1 ) 4 black flag ( 1 ) 4 wheel drive 
( 1 ) 41 megapixel ( 3 ) 41 megapixel sensor ( 1 ) 4540 mfp printer ( 1 ) 458 
challenge ( 1 ) 458 speciale ( 2 ) 4600 series ( 1 ) 47 ronin ( 1 ) 4g 
connectivity ( 2 ) 4g enabled ( 1 ) 4k camcorder ( 1 ) 4k capture ( 1 ) 4k 
input ( 1 ) 4k ultra hd ( 6 ) 4matic ( 1 ) 4matic all wheel ( 1 ) 4x4 
capability testing ( 1 ) 4x4 design cues ( 1 ) 5 ( 1 ) 5 inch ( 1 ) 5.1 channel 
( 2 ) 500 vario ( 1 ) 5000 series ( 1 ) 50th anniversary ( 1 ) 5600 series ( 1 
) 5mp camera ( 1 ) 6 inch ( 1 ) 6400 dpi ( 1 ) 6500 dpi ( 1 ) 6600 series ( 1 ) 
6x6 showcar ( 1 ) 7 wonders of ( 3 ) 7.2 channel ( 1 ) 700 rp avendator ( 1 ) 
737 max ( 1 ) 737 winglet ( 1 ) 737max ( 1 ) 8 passengers ( 1 ) 84 inch 
television ( 2 ) 8mp camera ( 1 ) 911 carrera ( 2 ) 911 Targa 4S ( 1 ) 911 
turbo ( 1 ) 92 compact l ( 1 ) 9xr battery ( 1 ) a 45 amg ( 2 ) a world first ( 
1 ) a1 quattro ( 1 ) a3 sedan ( 1 ) a320 jetliner ( 1 ) a320ceo ( 1 

[GitHub] opennlp pull request #201: Opennlp 1060

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/201


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---