Re: [VOTE] UIMA-AS 2.10.3 RC1

2018-03-19 Thread Jaroslaw Cwiklik
- Built from source - OK
- Spot checked signatures - OK
- issues fixed - OK
- Checked README, RELEASE_NOTES,LICENSE and NOTICE - OK
- Ran runRemoteAsyncAE.sh - OK
- Ran UIMA-AS extended tests which includes a new test which validates
deployment of async aggregate with a delegate which uses Jms Service
Descriptor - OK
- Created deployment descriptor which use Jms Service Descriptor to connect
to Room Number Annotator. Launch two instances of it to test ClientID bug
found in UIMA-AS v2.10.2 - OK
- Imported uima examples into eclipse (oxygen) and tested UIMA Deploy AS
Service.launch and UIMA Run Remote Async AE.launch - OK
- Installed UIMA-AS eclipse plugins - OK
- Documentation  - OK

[x] +1 OK to release

Jerry

On Fri, Mar 16, 2018 at 3:04 PM, Jerry Cwiklik  wrote:

> Hi,
>
> the UIMA-AS 2.10.3 release candidate 1 is ready for voting.
>
> This version contains the following fixes:
>
> - Modified client code to assign unique ClientID to broker connection
> which fixes InvalidClientIDException
> - Fixed ClassCastException when async aggregate initializes delegate with
> JMS Service Descriptor
> - Fixed broken classpath and logging for UIMA-AS run configurations
> - Fixed eclipse update site pom to copy *.jar.pack.gz files to plugins
> folder
>
> The list of changes in Jira:
> http://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%2
> 02.10.3AS%20AND%20project%20%3D%20UIMA
>
> The source and binary zip/tars are staged to
> http://dist.apache.org/repos/dist/dev/uima/uima-as/2.10.3/RC1/
>
> The eclipse update subsite is here:
> http://dist.apache.org/repos/dist/dev/uima/eclipse-update-site/uima-as/
>
> The Maven artifacts are here:
> http://repository.apache.org/content/repositories/orgapacheuima-1191
>
> The SVN tags are here:
> http://svn.apache.org/repos/asf/uima/uima-as/tags/uima-as-2.10.3/
>
> See http://uima.apache.org/testing-builds.html for suggestions on how to
> test release candidates.
>
> Please vote on release:
>
> [ ] +1 OK to release
> [ ] 0   Don't care
> [ ] -1 Not OK to release, because ...
>
> --
> Jerry Cwiklik
> Apache UIMA
>
>


[jira] [Updated] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5752:
---
Description: 
The change / fix in UIMA-4556 cause some problems when using a CSV file with 
whitespaces.

When we have a dictionary with whitespaces between words and

>> Param PARAM_DICT_REMOVE_WS is TRUE:

When WS are visible in the token stream:
 - words with spacers are not recognized (as expected).

When WS are NOT visible in the token stream:
 - all items in the dictionary will be recognized
 - all items will also be recognized if you add whitespaces between words. For 
example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.

>> Param PARAM_DICT_REMOVE_WS is FALSE:

When WS are visible in the token stream:
 - not all entries in the dictionary will be recognized

When WS are NOT visible in the token stream:
 - also not all entries in the dictionary will be recognized

The problem that this cause is that the default value to ignore whitespaces is 
always true (hardcoded).
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}
This is not correct because if you want to use whitespaces (if they are 
important) that won't  work. The matcher should use the same value as set in 
the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
method.

-I attached a patch to fix this issue.-

I'm working on a patch.

  was:
The change / fix in UIMA-4556 cause some problems when using a CSV file with 
whitespaces.

When we have a dictionary with whitespaces between words and

>> Param PARAM_DICT_REMOVE_WS is TRUE:

When WS are visible in the token stream:
 - words with spacers are not recognized (as expected).

When WS are NOT visible in the token stream:
 - all items in the dictionary will be recognized
 - all items will also be recognized if you add whitespaces between words. For 
example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.

>> Param PARAM_DICT_REMOVE_WS is FALSE:

When WS are visible in the token stream:
 - not all entries in the dictionary will be recognized

When WS are NOT visible in the token stream:
 - also not all entries in the dictionary will be recognized



The problem that this cause is that the default value to ignore whitespaces is 
always true (hardcoded).
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}
This is not correct because if you want to use whitespaces (if they are 
important) that won't  work. The matcher should use the same value as set in 
the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
method.

I attached a patch to fix this issue.


> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> -I attached a patch to fix this issue.-
> I'm working on a patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405055#comment-16405055
 ] 

Jasper Huzen edited comment on UIMA-5752 at 3/19/18 8:16 PM:
-

Patch removed because it was not complete. 


was (Author: feaster83):
Patch is not complete ---> MarkFastAction and others should also be fixed. (I 
will look to that)

> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> -I attached a patch to fix this issue.-
> I'm working on a patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5752:
---
Attachment: (was: UIMA-5752.patch)

> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> I attached a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405055#comment-16405055
 ] 

Jasper Huzen edited comment on UIMA-5752 at 3/19/18 4:21 PM:
-

Patch is not complete ---> MarkFastAction and others should also be fixed. (I 
will look to that)


was (Author: feaster83):
Patch is not complete ---> MarkFastAction should also be fixed. 

> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
> Attachments: UIMA-5752.patch
>
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> I attached a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405055#comment-16405055
 ] 

Jasper Huzen commented on UIMA-5752:


Patch is not complete ---> MarkFastAction should also be fixed. 

> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
> Attachments: UIMA-5752.patch
>
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> I attached a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5752:
---
Attachment: UIMA-5752.patch

> Problem with matching items in MarkTable with whitespacers visible
> --
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Jasper Huzen
>Priority: Major
> Attachments: UIMA-5752.patch
>
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with 
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
>  - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
>  - all items in the dictionary will be recognized
>  - all items will also be recognized if you add whitespaces between words. 
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
>  - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
>  - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces 
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are 
> important) that won't  work. The matcher should use the same value as set in 
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
> method.
> I attached a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (UIMA-5752) Problem with matching items in MarkTable with whitespacers visible

2018-03-19 Thread Jasper Huzen (JIRA)
Jasper Huzen created UIMA-5752:
--

 Summary: Problem with matching items in MarkTable with 
whitespacers visible
 Key: UIMA-5752
 URL: https://issues.apache.org/jira/browse/UIMA-5752
 Project: UIMA
  Issue Type: Bug
  Components: Ruta
Affects Versions: 2.6.1ruta
Reporter: Jasper Huzen


The change / fix in UIMA-4556 cause some problems when using a CSV file with 
whitespaces.

When we have a dictionary with whitespaces between words and
 * Param PARAM_DICT_REMOVE_WS is TRUE:

When WS are visible in the token stream:
- words with spacers are not recognized (as expected).

When WS are NOT visible in the token stream:
- all items in the dictionary will be recognized
- all items will also be recognized if you add whitespaces between words. For 
example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
 * Param PARAM_DICT_REMOVE_WS is FALSE:

When WS are visible in the token stream:
- not all entries in the dictionary will be recognized

When WS are NOT visible in the token stream:
- also not all entries in the dictionary will be recognized


The problem that this cause is that the default value to ignore whitespaces is 
always true (hardcoded).
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}
This is not correct because if you want to use whitespaces (if they are 
important) that won't  work. The matcher should use the same value as set in 
the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
method.

I attached a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5723:
---
Comment: was deleted

(was: The change / fix in UIMA-4556 cause some problems when using a CSV file 
with whitespaces.

When setting param PARAM_DICT_REMOVE_WS to TRUE and don't have WS visible in 
the token stream:
- all items in the dictionary will be recognized
- all items will also be recognized if you add whitespaces between words. For 
example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.

If whitespaces are visible, words with spacers won't be recognized. 

The problem that this cause is that the default hardcored value to ignore 
whitespaces is always true:
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}

This is not correct because if you want to use whitespaces (if they are 
important) that won't be work. This matcher should use the same value as set in 
the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
method.

I attached a patch to fix this issue. [^UIMA-5723.patch])

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---
>
> Key: UIMA-5723
> URL: https://issues.apache.org/jira/browse/UIMA-5723
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Andreas Thiel
>Assignee: Peter Klügl
>Priority: Major
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like 
> this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature 
> of the resulting annotation is not filled by the string following the 
> semicolon. Instead, it remains empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering 
> zelfstandigen}} is detected and processed as expected with feature 
> WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or 
> uppercase letters following lowercase chars) is present in the first column 
> in the CSV file? Or is it a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding 
> uppercase/lowercase distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5723:
---
Attachment: (was: UIMA-5723.patch)

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---
>
> Key: UIMA-5723
> URL: https://issues.apache.org/jira/browse/UIMA-5723
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Andreas Thiel
>Assignee: Peter Klügl
>Priority: Major
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like 
> this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature 
> of the resulting annotation is not filled by the string following the 
> semicolon. Instead, it remains empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering 
> zelfstandigen}} is detected and processed as expected with feature 
> WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or 
> uppercase letters following lowercase chars) is present in the first column 
> in the CSV file? Or is it a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding 
> uppercase/lowercase distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column

2018-03-19 Thread Jasper Huzen (JIRA)

[ 
https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404876#comment-16404876
 ] 

Jasper Huzen commented on UIMA-5723:


The change / fix in UIMA-4556 cause some problems when using a CSV file with 
whitespaces.

When setting param PARAM_DICT_REMOVE_WS to TRUE and don't have WS visible in 
the token stream:
- all items in the dictionary will be recognized
- all items will also be recognized if you add whitespaces between words. For 
example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.

If whitespaces are visible, words with spacers won't be recognized. 

The problem that this cause is that the default hardcored value to ignore 
whitespaces is always true:
{code:java}
private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
{code}

This is not correct because if you want to use whitespaces (if they are 
important) that won't be work. This matcher should use the same value as set in 
the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS 
method.

I attached a patch to fix this issue. [^UIMA-5723.patch]

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---
>
> Key: UIMA-5723
> URL: https://issues.apache.org/jira/browse/UIMA-5723
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Andreas Thiel
>Assignee: Peter Klügl
>Priority: Major
> Attachments: UIMA-5723.patch
>
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like 
> this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature 
> of the resulting annotation is not filled by the string following the 
> semicolon. Instead, it remains empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering 
> zelfstandigen}} is detected and processed as expected with feature 
> WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or 
> uppercase letters following lowercase chars) is present in the first column 
> in the CSV file? Or is it a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding 
> uppercase/lowercase distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column

2018-03-19 Thread Jasper Huzen (JIRA)

 [ 
https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Huzen updated UIMA-5723:
---
Attachment: UIMA-5723.patch

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---
>
> Key: UIMA-5723
> URL: https://issues.apache.org/jira/browse/UIMA-5723
> Project: UIMA
>  Issue Type: Bug
>  Components: Ruta
>Affects Versions: 2.6.1ruta
>Reporter: Andreas Thiel
>Assignee: Peter Klügl
>Priority: Major
> Attachments: UIMA-5723.patch
>
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like 
> this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature 
> of the resulting annotation is not filled by the string following the 
> semicolon. Instead, it remains empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering 
> zelfstandigen}} is detected and processed as expected with feature 
> WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or 
> uppercase letters following lowercase chars) is present in the first column 
> in the CSV file? Or is it a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding 
> uppercase/lowercase distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)