[jira] [Commented] (OPENNLP-1449) Revise log levels in OpenNLP
[ https://issues.apache.org/jira/browse/OPENNLP-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694398#comment-17694398 ] ASF GitHub Bot commented on OPENNLP-1449: - rzo1 opened a new pull request, #509: URL: https://github.com/apache/opennlp/pull/509 Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with OPENNLP- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically main)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Adjusted log messages according to review comments by @mawiesne in https://github.com/apache/opennlp/pull/492 > Revise log levels in OpenNLP > > > Key: OPENNLP-1449 > URL: https://issues.apache.org/jira/browse/OPENNLP-1449 > Project: OpenNLP > Issue Type: Sub-task >Reporter: Richard Zowalla >Assignee: Richard Zowalla >Priority: Major > > After introducing slf4j in OpenNLP, we should discuss the log levels. > We have a lot of different variants in the code base: > - System.err > - System.out > - "[WARN] ..." via System.err / System.println > - "[ERROR] ..." via System.err / System.println > Goal is to define / discuss proper log levels as provided by SLF4J (trace, > debug, etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (OPENNLP-1449) Revise log levels in OpenNLP
[ https://issues.apache.org/jira/browse/OPENNLP-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla reassigned OPENNLP-1449: Assignee: Richard Zowalla > Revise log levels in OpenNLP > > > Key: OPENNLP-1449 > URL: https://issues.apache.org/jira/browse/OPENNLP-1449 > Project: OpenNLP > Issue Type: Sub-task >Reporter: Richard Zowalla >Assignee: Richard Zowalla >Priority: Major > > After introducing slf4j in OpenNLP, we should discuss the log levels. > We have a lot of different variants in the code base: > - System.err > - System.out > - "[WARN] ..." via System.err / System.println > - "[ERROR] ..." via System.err / System.println > Goal is to define / discuss proper log levels as provided by SLF4J (trace, > debug, etc.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (OPENNLP-1448) Introduce SLF4J in OpenNLP
[ https://issues.apache.org/jira/browse/OPENNLP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Wiesner closed OPENNLP-1448. --- Resolution: Fixed > Introduce SLF4J in OpenNLP > -- > > Key: OPENNLP-1448 > URL: https://issues.apache.org/jira/browse/OPENNLP-1448 > Project: OpenNLP > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0, 2.1.1 >Reporter: Richard Zowalla >Assignee: Richard Zowalla >Priority: Major > Fix For: 2.1.2 > > > This will be the first step regarding OPENNLP-1447. > Goal is to replace System.err / System.out calls with logger output, which is > configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1448) Introduce SLF4J in OpenNLP
[ https://issues.apache.org/jira/browse/OPENNLP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Wiesner updated OPENNLP-1448: Fix Version/s: 2.1.2 Affects Version/s: 2.1.0 2.0.0 2.1.1 > Introduce SLF4J in OpenNLP > -- > > Key: OPENNLP-1448 > URL: https://issues.apache.org/jira/browse/OPENNLP-1448 > Project: OpenNLP > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0, 2.1.1 >Reporter: Richard Zowalla >Assignee: Richard Zowalla >Priority: Major > Fix For: 2.1.2 > > > This will be the first step regarding OPENNLP-1447. > Goal is to replace System.err / System.out calls with logger output, which is > configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-1448) Introduce SLF4J in OpenNLP
[ https://issues.apache.org/jira/browse/OPENNLP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694383#comment-17694383 ] ASF GitHub Bot commented on OPENNLP-1448: - mawiesne merged PR #492: URL: https://github.com/apache/opennlp/pull/492 > Introduce SLF4J in OpenNLP > -- > > Key: OPENNLP-1448 > URL: https://issues.apache.org/jira/browse/OPENNLP-1448 > Project: OpenNLP > Issue Type: Sub-task >Reporter: Richard Zowalla >Assignee: Richard Zowalla >Priority: Major > > This will be the first step regarding OPENNLP-1447. > Goal is to replace System.err / System.out calls with logger output, which is > configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-1163) Sentence detector doesn't spot abbreviations next to punctuation
[ https://issues.apache.org/jira/browse/OPENNLP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694161#comment-17694161 ] Richard Zowalla commented on OPENNLP-1163: -- Browser state issue. Sorry for the noise. > Sentence detector doesn't spot abbreviations next to punctuation > > > Key: OPENNLP-1163 > URL: https://issues.apache.org/jira/browse/OPENNLP-1163 > Project: OpenNLP > Issue Type: Bug > Components: Sentence Detector >Affects Versions: 1.8.3 > Environment: Reproduced on Windows 10 >Reporter: Gabriele Vaccari >Priority: Critical > Labels: abbreviation, sentence-detector > Attachments: it-abbr.txt, out.txt, test.txt, training-set.txt > > > The Sentence Detector trained with an abbreviations list (see attachment) > fails to spot them within a text if they are preceded by a punctuation mark. > In Italian, words starting with a vowel may be preceded by an article plus > apostrophe sign (single quote). Example: L'ARTICOLO (the article). The term > ARTICOLO, especially in legal text, is frequently abbreviated to ART. > Repro steps: > 1) add the "art." abbreviation in the abbreviations XML file (enclosed, > ctrl+F "art.", case insensitive) > 2) train a model for the Italian language (training set enclosed) with the > following command: > opennlp SentenceDetectorTrainer -abbDict "it-abbr.txt" -lang it -model > it-sen.bin -data training-set.txt -encoding UTF-8 > 3) run the model against a test text with the following command: > opennlp SentenceDetector it-sen.bin < test.txt > Even though the abbreviation "art." was included in the XML file, the > sentence detector breaks the sentence on instances of this abbreviation > preceded by article and apostrophe (e.g. nell'art., dall'art., dell'art.). > See also the enclosed output file out.txt, lines 6-7, 12-13, 13-14 and 16-17. > The issue isn't observed if the apostrophe (single quote) is replaced by a > space character. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1163) Sentence detector doesn't spot abbreviations next to punctuation
[ https://issues.apache.org/jira/browse/OPENNLP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla updated OPENNLP-1163: - Fix Version/s: (was: 2.1.2) > Sentence detector doesn't spot abbreviations next to punctuation > > > Key: OPENNLP-1163 > URL: https://issues.apache.org/jira/browse/OPENNLP-1163 > Project: OpenNLP > Issue Type: Bug > Components: Sentence Detector >Affects Versions: 1.8.3 > Environment: Reproduced on Windows 10 >Reporter: Gabriele Vaccari >Priority: Critical > Labels: abbreviation, sentence-detector > Attachments: it-abbr.txt, out.txt, test.txt, training-set.txt > > > The Sentence Detector trained with an abbreviations list (see attachment) > fails to spot them within a text if they are preceded by a punctuation mark. > In Italian, words starting with a vowel may be preceded by an article plus > apostrophe sign (single quote). Example: L'ARTICOLO (the article). The term > ARTICOLO, especially in legal text, is frequently abbreviated to ART. > Repro steps: > 1) add the "art." abbreviation in the abbreviations XML file (enclosed, > ctrl+F "art.", case insensitive) > 2) train a model for the Italian language (training set enclosed) with the > following command: > opennlp SentenceDetectorTrainer -abbDict "it-abbr.txt" -lang it -model > it-sen.bin -data training-set.txt -encoding UTF-8 > 3) run the model against a test text with the following command: > opennlp SentenceDetector it-sen.bin < test.txt > Even though the abbreviation "art." was included in the XML file, the > sentence detector breaks the sentence on instances of this abbreviation > preceded by article and apostrophe (e.g. nell'art., dall'art., dell'art.). > See also the enclosed output file out.txt, lines 6-7, 12-13, 13-14 and 16-17. > The issue isn't observed if the apostrophe (single quote) is replaced by a > space character. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (OPENNLP-1163) Sentence detector doesn't spot abbreviations next to punctuation
[ https://issues.apache.org/jira/browse/OPENNLP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla reopened OPENNLP-1163: -- Assignee: (was: Martin Wiesner) > Sentence detector doesn't spot abbreviations next to punctuation > > > Key: OPENNLP-1163 > URL: https://issues.apache.org/jira/browse/OPENNLP-1163 > Project: OpenNLP > Issue Type: Bug > Components: Sentence Detector >Affects Versions: 1.8.3 > Environment: Reproduced on Windows 10 >Reporter: Gabriele Vaccari >Priority: Critical > Labels: abbreviation, sentence-detector > Fix For: 2.1.2 > > Attachments: it-abbr.txt, out.txt, test.txt, training-set.txt > > > The Sentence Detector trained with an abbreviations list (see attachment) > fails to spot them within a text if they are preceded by a punctuation mark. > In Italian, words starting with a vowel may be preceded by an article plus > apostrophe sign (single quote). Example: L'ARTICOLO (the article). The term > ARTICOLO, especially in legal text, is frequently abbreviated to ART. > Repro steps: > 1) add the "art." abbreviation in the abbreviations XML file (enclosed, > ctrl+F "art.", case insensitive) > 2) train a model for the Italian language (training set enclosed) with the > following command: > opennlp SentenceDetectorTrainer -abbDict "it-abbr.txt" -lang it -model > it-sen.bin -data training-set.txt -encoding UTF-8 > 3) run the model against a test text with the following command: > opennlp SentenceDetector it-sen.bin < test.txt > Even though the abbreviation "art." was included in the XML file, the > sentence detector breaks the sentence on instances of this abbreviation > preceded by article and apostrophe (e.g. nell'art., dall'art., dell'art.). > See also the enclosed output file out.txt, lines 6-7, 12-13, 13-14 and 16-17. > The issue isn't observed if the apostrophe (single quote) is replaced by a > space character. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (OPENNLP-1163) Sentence detector doesn't spot abbreviations next to punctuation
[ https://issues.apache.org/jira/browse/OPENNLP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla closed OPENNLP-1163. Fix Version/s: 2.1.2 Assignee: Martin Wiesner Resolution: Won't Fix > Sentence detector doesn't spot abbreviations next to punctuation > > > Key: OPENNLP-1163 > URL: https://issues.apache.org/jira/browse/OPENNLP-1163 > Project: OpenNLP > Issue Type: Bug > Components: Sentence Detector >Affects Versions: 1.8.3 > Environment: Reproduced on Windows 10 >Reporter: Gabriele Vaccari >Assignee: Martin Wiesner >Priority: Critical > Labels: abbreviation, sentence-detector > Fix For: 2.1.2 > > Attachments: it-abbr.txt, out.txt, test.txt, training-set.txt > > > The Sentence Detector trained with an abbreviations list (see attachment) > fails to spot them within a text if they are preceded by a punctuation mark. > In Italian, words starting with a vowel may be preceded by an article plus > apostrophe sign (single quote). Example: L'ARTICOLO (the article). The term > ARTICOLO, especially in legal text, is frequently abbreviated to ART. > Repro steps: > 1) add the "art." abbreviation in the abbreviations XML file (enclosed, > ctrl+F "art.", case insensitive) > 2) train a model for the Italian language (training set enclosed) with the > following command: > opennlp SentenceDetectorTrainer -abbDict "it-abbr.txt" -lang it -model > it-sen.bin -data training-set.txt -encoding UTF-8 > 3) run the model against a test text with the following command: > opennlp SentenceDetector it-sen.bin < test.txt > Even though the abbreviation "art." was included in the XML file, the > sentence detector breaks the sentence on instances of this abbreviation > preceded by article and apostrophe (e.g. nell'art., dall'art., dell'art.). > See also the enclosed output file out.txt, lines 6-7, 12-13, 13-14 and 16-17. > The issue isn't observed if the apostrophe (single quote) is replaced by a > space character. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (OPENNLP-1229) stem function giving wrong output
[ https://issues.apache.org/jira/browse/OPENNLP-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla closed OPENNLP-1229. Fix Version/s: 2.1.2 Resolution: Won't Fix > stem function giving wrong output > - > > Key: OPENNLP-1229 > URL: https://issues.apache.org/jira/browse/OPENNLP-1229 > Project: OpenNLP > Issue Type: Bug > Components: Stemmer > Environment: Ubuntu-18.04, JDK-8 >Reporter: Divya Rani >Assignee: Martin Wiesner >Priority: Minor > Fix For: 2.1.2 > > > As opennlp is using PorterStemmer for stemming PorterStemmer seems to be > stemming "this" -> "thi". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-1229) stem function giving wrong output
[ https://issues.apache.org/jira/browse/OPENNLP-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694159#comment-17694159 ] ASF GitHub Bot commented on OPENNLP-1229: - rzo1 merged PR #507: URL: https://github.com/apache/opennlp/pull/507 > stem function giving wrong output > - > > Key: OPENNLP-1229 > URL: https://issues.apache.org/jira/browse/OPENNLP-1229 > Project: OpenNLP > Issue Type: Bug > Components: Stemmer > Environment: Ubuntu-18.04, JDK-8 >Reporter: Divya Rani >Assignee: Martin Wiesner >Priority: Minor > > As opennlp is using PorterStemmer for stemming PorterStemmer seems to be > stemming "this" -> "thi". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OPENNLP-1473) Add .asf.yaml
[ https://issues.apache.org/jira/browse/OPENNLP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno P. Kinoshita resolved OPENNLP-1473. - Resolution: Fixed > Add .asf.yaml > - > > Key: OPENNLP-1473 > URL: https://issues.apache.org/jira/browse/OPENNLP-1473 > Project: OpenNLP > Issue Type: Task > Components: Documentation >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita >Priority: Trivial > Fix For: 2.1.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-1473) Add .asf.yaml
[ https://issues.apache.org/jira/browse/OPENNLP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694064#comment-17694064 ] ASF GitHub Bot commented on OPENNLP-1473: - kinow merged PR #508: URL: https://github.com/apache/opennlp/pull/508 > Add .asf.yaml > - > > Key: OPENNLP-1473 > URL: https://issues.apache.org/jira/browse/OPENNLP-1473 > Project: OpenNLP > Issue Type: Task > Components: Documentation >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita >Priority: Trivial > Fix For: 2.1.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-141) Tokenizers alpha numeric optimization only recognizes a-z as alpha chars
[ https://issues.apache.org/jira/browse/OPENNLP-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694010#comment-17694010 ] ASF GitHub Bot commented on OPENNLP-141: mawiesne commented on PR #506: URL: https://github.com/apache/opennlp/pull/506#issuecomment-1446297075 > > > Couldn't find a good reference for Italian or Spanish > > > > > > So better create a separate Jira issue for this and consider it an improvement? This way, we could verify and/or ask the community for support before making too many assumptions. > > Good idea! Done: https://issues.apache.org/jira/browse/OPENNLP-1474 Thanks Bruno! > Tokenizers alpha numeric optimization only recognizes a-z as alpha chars > > > Key: OPENNLP-141 > URL: https://issues.apache.org/jira/browse/OPENNLP-141 > Project: OpenNLP > Issue Type: Bug > Components: Tokenizer >Affects Versions: tools-1.5.0-sourceforge >Reporter: Jörn Kottmann >Assignee: Martin Wiesner >Priority: Minor > > The Tokenizer has an optimization which skips tokens which are only made of > numerics or alpha chars. In foreign languages the alpha chars contain umlauts > and other letters which are not included in the a-z range. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (OPENNLP-1461) Create OpenNLP on Jenkins
[ https://issues.apache.org/jira/browse/OPENNLP-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla reassigned OPENNLP-1461: Assignee: Bruno P. Kinoshita (was: Jeff Zemerick) > Create OpenNLP on Jenkins > - > > Key: OPENNLP-1461 > URL: https://issues.apache.org/jira/browse/OPENNLP-1461 > Project: OpenNLP > Issue Type: Sub-task >Reporter: Richard Zowalla >Assignee: Bruno P. Kinoshita >Priority: Major > > Create OpenNLP on https://ci-builds.apache.org/ > It might be just possible for you after logging in _or_ requires an INFRA > ticket. Didn't find any documentation about .it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OPENNLP-1461) Create OpenNLP on Jenkins
[ https://issues.apache.org/jira/browse/OPENNLP-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693987#comment-17693987 ] Richard Zowalla commented on OPENNLP-1461: -- https://issues.apache.org/jira/browse/INFRA-24253 > Create OpenNLP on Jenkins > - > > Key: OPENNLP-1461 > URL: https://issues.apache.org/jira/browse/OPENNLP-1461 > Project: OpenNLP > Issue Type: Sub-task >Reporter: Richard Zowalla >Assignee: Bruno P. Kinoshita >Priority: Major > > Create OpenNLP on https://ci-builds.apache.org/ > It might be just possible for you after logging in _or_ requires an INFRA > ticket. Didn't find any documentation about .it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1470) Modernize maven.yml to fix deprecation warnings
[ https://issues.apache.org/jira/browse/OPENNLP-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla updated OPENNLP-1470: - Fix Version/s: 2.1.2 > Modernize maven.yml to fix deprecation warnings > --- > > Key: OPENNLP-1470 > URL: https://issues.apache.org/jira/browse/OPENNLP-1470 > Project: OpenNLP > Issue Type: Task > Components: Build, Packaging and Test >Affects Versions: 2.1.0, 2.1.1 >Reporter: Martin Wiesner >Assignee: Martin Wiesner >Priority: Trivial > Fix For: 2.1.2 > > > In the build output of GH actions we find warnings like this: > {quote}Warning: The `save-state` command is deprecated and will be disabled > soon. Please upgrade to using Environment Files. For more information see: > [https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/] > {quote} > and > {quote}Warning: The `set-output` command is deprecated and will be disabled > soon. Please upgrade to using Environment Files. For more information see: > [https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/] > {quote} > Therefore, _maven.yml_ needs some modernizations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (OPENNLP-1470) Modernize maven.yml to fix deprecation warnings
[ https://issues.apache.org/jira/browse/OPENNLP-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Zowalla closed OPENNLP-1470. > Modernize maven.yml to fix deprecation warnings > --- > > Key: OPENNLP-1470 > URL: https://issues.apache.org/jira/browse/OPENNLP-1470 > Project: OpenNLP > Issue Type: Task > Components: Build, Packaging and Test >Affects Versions: 2.1.0, 2.1.1 >Reporter: Martin Wiesner >Assignee: Martin Wiesner >Priority: Trivial > Fix For: 2.1.2 > > > In the build output of GH actions we find warnings like this: > {quote}Warning: The `save-state` command is deprecated and will be disabled > soon. Please upgrade to using Environment Files. For more information see: > [https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/] > {quote} > and > {quote}Warning: The `set-output` command is deprecated and will be disabled > soon. Please upgrade to using Environment Files. For more information see: > [https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/] > {quote} > Therefore, _maven.yml_ needs some modernizations. -- This message was sent by Atlassian Jira (v8.20.10#820010)