Re: UIMAv3 & WebAnno - success !

2018-02-04 Thread Richard Eckart de Castilho
On 24.01.2018, at 19:26, Richard Eckart de Castilho  wrote:
> 
> On 23.01.2018, at 04:33, Marshall Schor  wrote:
>> 
>> I manually updated the DocumentMetaData.java JCas class to the current style,
>> and now, I can load a document in the Annotation view, without errors :-).
>> 
>> I'll commit these changes tomorrow.
> 
> Cool! Looking forward to trying this out!

I finally got to trying out the latest UIMA v3 SNAPSHOT (i.e. 1 commit beyond
the latest RC) with WebAnno. Seems it works nicely now.

I tried adding and deleting various annotations with no error - so it looks
like the UIMAv3 CasCompleteSerializer now also has nicely stable IDs!

Great job!

Thanks for taking into account this use-case!

Best,

-- Richard

Re: UIMAv3 & WebAnno - success !

2018-01-27 Thread Richard Eckart de Castilho
On 24.01.2018, at 19:26, Richard Eckart de Castilho  wrote:
> 
> On 23.01.2018, at 04:33, Marshall Schor  wrote:
>> 
>> I manually updated the DocumentMetaData.java JCas class to the current style,
>> and now, I can load a document in the Annotation view, without errors :-).
>> 
>> I'll commit these changes tomorrow.
> 
> Cool! Looking forward to trying this out!

Just wanted to let you know that I am tryout out the latest UIMAv3 SNAPSHOT.
Looks like I have to regenerate additional DKPro Core classes to be compatible
with the post-alpha code.

I'll post updates when I get ahead.

Cheers,

-- Richard

Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-24 Thread Marshall Schor
strike that - I put in https://issues.apache.org/jira/browse/UIMA-5708
to fix this.

-Marshall


On 1/24/2018 5:23 PM, Marshall Schor wrote:
> I think this already happens (except for this one unusual case).
>
> I'm probably not going to fix the unusual case due to other priorities... :-) 
> )
>
> -Marshall
>
>
> On 1/24/2018 1:25 PM, Richard Eckart de Castilho wrote:
>> On 23.01.2018, at 03:48, Marshall Schor  wrote:
>>> I'm going to try to do that, but it's taking a very long time to download 
>>> the
>>> dkpro uima-v3 project...
>> Would it be a good idea to add a version information to generated JCas
>> classes and have UIMA check that and generate an error? Similar as when
>> a JVM tried loading a class file it is not compatible with.
>>
>> Cheers,
>>
>> -- Richard
>



Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-24 Thread Marshall Schor
I think this already happens (except for this one unusual case).

I'm probably not going to fix the unusual case due to other priorities... :-) )

-Marshall


On 1/24/2018 1:25 PM, Richard Eckart de Castilho wrote:
> On 23.01.2018, at 03:48, Marshall Schor  wrote:
>> I'm going to try to do that, but it's taking a very long time to download the
>> dkpro uima-v3 project...
> Would it be a good idea to add a version information to generated JCas
> classes and have UIMA check that and generate an error? Similar as when
> a JVM tried loading a class file it is not compatible with.
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno - success !

2018-01-24 Thread Richard Eckart de Castilho
On 23.01.2018, at 04:33, Marshall Schor  wrote:
> 
> I manually updated the DocumentMetaData.java JCas class to the current style,
> and now, I can load a document in the Annotation view, without errors :-).
> 
> I'll commit these changes tomorrow.

Cool! Looking forward to trying this out!

-- Richard

Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-24 Thread Richard Eckart de Castilho
On 23.01.2018, at 03:48, Marshall Schor  wrote:
> 
> I'm going to try to do that, but it's taking a very long time to download the
> dkpro uima-v3 project...

Would it be a good idea to add a version information to generated JCas
classes and have UIMA check that and generate an error? Similar as when
a JVM tried loading a class file it is not compatible with.

Cheers,

-- Richard

Re: UIMAv3 & WebAnno - success !

2018-01-22 Thread Marshall Schor
I manually updated the DocumentMetaData.java JCas class to the current style,
and now, I can load a document in the Annotation view, without errors :-).

I'll commit these changes tomorrow.

-Marshall

On 1/22/2018 9:48 PM, Marshall Schor wrote:
> This bug was due to a wrong design - actually adding JCas implied features 
> into
> the type system. This breaks various "binary" serialization/deserialization
> schemes, which require an exact match between the type system and the 
> serialized
> form.
>
> This is now fixed.
>
> The next problem is a housekeeping one:  The DKPro being used as a uima v3 
> alpha
> version of the JCas class for DocumentMetaData.
>
> This needs regenerating for the current design.
>
> I'm going to try to do that, but it's taking a very long time to download the
> dkpro uima-v3 project...
>
> -Marshall
>
>
> On 1/18/2018 5:47 PM, Marshall Schor wrote:
>> found an incredibly stupid bug in the code that was supposed to add the extra
>> JCas supplied features.
>>
>> Guess a better test case is needed!.
>>
>> After fixing that, the next bug is a cas complete deserialization issue...
>>
>> investigating...
>>
>> -Marshall
>>
>>
>> On 1/18/2018 2:06 PM, Marshall Schor wrote:
>>> Got to the point where I'm getting a JCas feature offset incompatibility -
>>> starting debug...
>>>
>>> -Marshall
>>>
>



Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-22 Thread Marshall Schor
This bug was due to a wrong design - actually adding JCas implied features into
the type system. This breaks various "binary" serialization/deserialization
schemes, which require an exact match between the type system and the serialized
form.

This is now fixed.

The next problem is a housekeeping one:  The DKPro being used as a uima v3 alpha
version of the JCas class for DocumentMetaData.

This needs regenerating for the current design.

I'm going to try to do that, but it's taking a very long time to download the
dkpro uima-v3 project...

-Marshall


On 1/18/2018 5:47 PM, Marshall Schor wrote:
> found an incredibly stupid bug in the code that was supposed to add the extra
> JCas supplied features.
>
> Guess a better test case is needed!.
>
> After fixing that, the next bug is a cas complete deserialization issue...
>
> investigating...
>
> -Marshall
>
>
> On 1/18/2018 2:06 PM, Marshall Schor wrote:
>> Got to the point where I'm getting a JCas feature offset incompatibility -
>> starting debug...
>>
>> -Marshall
>>
>



Re: UIMAv3 & WebAnno - bugs in v3?

2018-01-18 Thread Marshall Schor
yes, looks like a bug.  There is feature validation code, but it only checks if
the feature is appropriate for the type, not whether the feature's range is
appropriate for the caller.

Added Jira https://issues.apache.org/jira/browse/UIMA-5706

-Marshall


On 1/18/2018 5:22 PM, Richard Eckart de Castilho wrote:
> On 18.01.2018, at 22:52, Richard Eckart de Castilho  wrote:
>> On 18.01.2018, at 20:06, Marshall Schor  wrote:
>>> Got to the point where I'm getting a JCas feature offset incompatibility -
>>> starting debug...
>> Meanwhile, I'm working off the things you found...
> It seems that UIMAv3 allows code such as this:
>
>   FeatureStructure fsVal = 
> aFS.getFeatureValue(aFS.getType().getFeatureByBaseName(aFeatureName));
>
> where the aFeatureName is e.g. "end" (i.e. a non-FS feature). UIMAv3 seems to
> simply return null in this case.
>
> UIMAv2 had thrown an exception in this case.
>
> Bug?
>
> Cheers,
>
> -- Richard
>
>



Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-18 Thread Marshall Schor
There's always a non-official way, dependent on the version (that can change). 
See the Sofa.class method setLocalSofaData.

It, in turn, calls a method which checks if it's been set (and if so, throws).

But you "could" just call the method it would have called... 
(   _setStringValueNcWj(wrapGetIntCatchException(_FH_sofaString), aString);   )

The DocumentAnnotation singleton needs to be updated as a side effect (the end
value changes if the sofa length changes).

We could make some kind of official thing here, if it is warranted.

-Marshall


On 1/18/2018 4:52 PM, Richard Eckart de Castilho wrote:
> On 18.01.2018, at 20:06, Marshall Schor  wrote:
>> Got to the point where I'm getting a JCas feature offset incompatibility -
>> starting debug...
> Great :)
>
> Meanwhile, I'm working off the things you found...
>
> Btw. trying to run the WebAnno UIMAv3 build from the command line, I
> found that the tests of the WebAnno remote API fail now. The reason
> seems to be that you blocked a sneaky little way that I used to change
> the SOFA string even after the CAS has been locked down:
>
> org.apache.uima.cas.CASRuntimeException: Can''t use standard set methods with 
> SofaFS features.
>   at org.apache.uima.jcas.cas.Sofa.setStringValue(Sofa.java:267) 
> ~[classes/:na]
>   at 
> org.apache.uima.cas.impl.CASImpl.ll_setStringValue(CASImpl.java:3291) 
> ~[classes/:na]
>   at 
> de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.forceSetFeatureValue(RemoteApiController2.java:1066)
>  ~[classes/:na]
>   at 
> de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.createCompatibleCas(RemoteApiController2.java:1003)
>  ~[classes/:na]
>   at 
> de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.annotationsCreate(RemoteApiController2.java:703)
>  ~[classes/:na]
>
> The code triggering this is here (for your entertainment):
>
> // Just in case we really had to chomp off a trailing line break from 
> the annotation CAS,
> // make sure we copy over the proper text from the initial CAS
> // NOT AT HOME THIS YOU SHOULD TRY
> // SETTING THE SOFA STRING FORCEFULLY FOLLOWING THE DARK SIDE IS!
> forceSetFeatureValue(annotationCas.getSofa(), 
> CAS.FEATURE_BASE_NAME_SOFASTRING,
> initialCas.getDocumentText());
>
> So Master Schor... you defeated me.
>
> I introduced this because it can happen that an annotation file uploaded 
> through the
> remote API might have a trailing line break while the corresponding reference 
> document 
> that is already in WebAnno does not (or vice versa) - and I am here trying to 
> fix this
> situation to ensure that the sofa strings are equal.
>
> Assuming you had to do that, how would you patch the sofa string?
>
> Cheers,
>
> -- Richard 



Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-18 Thread Marshall Schor
found an incredibly stupid bug in the code that was supposed to add the extra
JCas supplied features.

Guess a better test case is needed!.

After fixing that, the next bug is a cas complete deserialization issue...

investigating...

-Marshall


On 1/18/2018 2:06 PM, Marshall Schor wrote:
> Got to the point where I'm getting a JCas feature offset incompatibility -
> starting debug...
>
> -Marshall
>



Re: UIMAv3 & WebAnno

2018-01-18 Thread Marshall Schor
I didn't know about skipTests; thanks!

-Marshall


On 1/18/2018 5:01 PM, Richard Eckart de Castilho wrote:
> On 18.01.2018, at 19:22, Marshall Schor  wrote:
>> Could build from command line with -Dmaven.skip.tests  except for one 
>> failure:
>>
>> webanno-ui-curation has a "dependency" even if tests are being skipped on
>> webanno-curation test jar.
>>
>> Workaround: I built webanno-curation with testing, then resumed the other 
>> build
>> (without testing).
> That should work if you use "-DskipTests" instead of "-Dmaven.skip.tests".
> The former only skips the unit test. The latter skips the entire test phase
> including the compilation and packaging of the test code/artifacts.
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno - bugs in v3?

2018-01-18 Thread Richard Eckart de Castilho
On 18.01.2018, at 22:52, Richard Eckart de Castilho  wrote:
> 
> On 18.01.2018, at 20:06, Marshall Schor  wrote:
>> 
>> Got to the point where I'm getting a JCas feature offset incompatibility -
>> starting debug...
> 
> Meanwhile, I'm working off the things you found...

It seems that UIMAv3 allows code such as this:

  FeatureStructure fsVal = 
aFS.getFeatureValue(aFS.getType().getFeatureByBaseName(aFeatureName));

where the aFeatureName is e.g. "end" (i.e. a non-FS feature). UIMAv3 seems to
simply return null in this case.

UIMAv2 had thrown an exception in this case.

Bug?

Cheers,

-- Richard



Re: UIMAv3 & WebAnno

2018-01-18 Thread Richard Eckart de Castilho
On 18.01.2018, at 19:22, Marshall Schor  wrote:
> 
> Could build from command line with -Dmaven.skip.tests  except for one failure:
> 
> webanno-ui-curation has a "dependency" even if tests are being skipped on
> webanno-curation test jar.
> 
> Workaround: I built webanno-curation with testing, then resumed the other 
> build
> (without testing).

That should work if you use "-DskipTests" instead of "-Dmaven.skip.tests".
The former only skips the unit test. The latter skips the entire test phase
including the compilation and packaging of the test code/artifacts.

Cheers,

-- Richard

Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-18 Thread Richard Eckart de Castilho
On 18.01.2018, at 20:06, Marshall Schor  wrote:
> 
> Got to the point where I'm getting a JCas feature offset incompatibility -
> starting debug...

Great :)

Meanwhile, I'm working off the things you found...

Btw. trying to run the WebAnno UIMAv3 build from the command line, I
found that the tests of the WebAnno remote API fail now. The reason
seems to be that you blocked a sneaky little way that I used to change
the SOFA string even after the CAS has been locked down:

org.apache.uima.cas.CASRuntimeException: Can''t use standard set methods with 
SofaFS features.
at org.apache.uima.jcas.cas.Sofa.setStringValue(Sofa.java:267) 
~[classes/:na]
at 
org.apache.uima.cas.impl.CASImpl.ll_setStringValue(CASImpl.java:3291) 
~[classes/:na]
at 
de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.forceSetFeatureValue(RemoteApiController2.java:1066)
 ~[classes/:na]
at 
de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.createCompatibleCas(RemoteApiController2.java:1003)
 ~[classes/:na]
at 
de.tudarmstadt.ukp.clarin.webanno.webapp.remoteapi.RemoteApiController2.annotationsCreate(RemoteApiController2.java:703)
 ~[classes/:na]

The code triggering this is here (for your entertainment):

// Just in case we really had to chomp off a trailing line break from 
the annotation CAS,
// make sure we copy over the proper text from the initial CAS
// NOT AT HOME THIS YOU SHOULD TRY
// SETTING THE SOFA STRING FORCEFULLY FOLLOWING THE DARK SIDE IS!
forceSetFeatureValue(annotationCas.getSofa(), 
CAS.FEATURE_BASE_NAME_SOFASTRING,
initialCas.getDocumentText());

So Master Schor... you defeated me.

I introduced this because it can happen that an annotation file uploaded 
through the
remote API might have a trailing line break while the corresponding reference 
document 
that is already in WebAnno does not (or vice versa) - and I am here trying to 
fix this
situation to ensure that the sofa strings are equal.

Assuming you had to do that, how would you patch the sofa string?

Cheers,

-- Richard 

Re: UIMAv3 & WebAnno

2018-01-18 Thread Richard Eckart de Castilho
On 18.01.2018, at 17:47, Marshall Schor  wrote:
> 
> How would you recommend fixing this? 
>  - update the reference file to include the indentifier (might be many other
> cascading changes needed?)
>  - remove the "identifier" feature from the
> de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity ?
> (but probably will break something, assuming it was added for a reason! 
> :-) )

The identifier feature was added after DKPro Core 1.7.0 and it is to be used
for entity linking (i.e. connecting a named entity with some canonical concept
URI).

I am updating all the test reference data to include the identifier feature.

Actually, the better approach would be to use a dedicated type system for the
TSV test suite - but that entails even more refactoring at the moment, so I
am pushing that off to the future.

-- Richard

Re: UIMAv3 & WebAnno - back to debugging JCas feature setup

2018-01-18 Thread Marshall Schor
Got to the point where I'm getting a JCas feature offset incompatibility -
starting debug...

-Marshall


Re: UIMAv3 & WebAnno - small UX issue

2018-01-18 Thread Marshall Schor
When webanno is started (not headless), after some time it puts up a little
window with a "Shutdown" button.

On Windows, the box and the button are "selected".

This has an unfortunate consequence.  If while waiting for webanno to
initialize, you have another window open (like an email composition window) and
are typing, then what happens is that in the middle of typing some word, the
WebAnno dialog box/buttom appear and become selected, and the next keystroke
goes to that window / button, essentially "pushing" (by accident) the shutdown
button, which then shuts things down!

-Marshall



Re: UIMAv3 & WebAnno

2018-01-18 Thread Marshall Schor
Could build from command line with -Dmaven.skip.tests  except for one failure:

webanno-ui-curation has a "dependency" even if tests are being skipped on
webanno-curation test jar.

Workaround: I built webanno-curation with testing, then resumed the other build
(without testing).

-Marshall


On 1/18/2018 10:50 AM, Marshall Schor wrote:
> Fixed this problem. 
>
> It was due to many places in webAnno where the code fragment:
>
> feature.toString()  was expected to return the fully-qualified feature name.
>
> But v3 augmented the feature "toString" to provide more information about a 
> feature.
>
> Fix was to change all occurrances of feature.toString() to feature.getName().
>
> Now the webanno-io-tsv first test succeeds, although it still is reporting 
> lots
> of updating of Annotation "end" values while the annotation is in the index
> (UIMA is recovering these, one at a time).
>
> on to the next problem...
>
>
> On 1/17/2018 4:52 PM, Marshall Schor wrote:
>> I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
>> exception, to move on to the next issue :-)
>>
>> Next issue: the first runPipeLine() in that same test now fails, saying:
>>
>> Caused by: java.io.IOException: Target file
>> [target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
>> exists and overwriting not enabled.
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
>>     ... 38 more
>>
>> I got around that by erasing the target/ directory, then doing a
>> maven-update-project to cause an Eclispe rebuild of the project. Now when I 
>> run
>> it I get a beyond the above error.  The next error is:
>>
>> java.io.IOException: example2.tsv This is not a valid TSV File. check this 
>> line:
>> 1-1    Ms.    Sofa
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>>     at
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>>     at
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>>     at
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>>     at
>> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>>     at
>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)
>>
>> The file in question has these as its first few lines:
>>
>>  # de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity | sofa | begin | 
>> end |
>> value | identifier # 

Re: UIMAv3 & WebAnno

2018-01-18 Thread Marshall Schor
found one other:  aFeature.toString()  needed changing to aFeature.getName()

-M


On 1/18/2018 10:50 AM, Marshall Schor wrote:
> Fixed this problem. 
>
> It was due to many places in webAnno where the code fragment:
>
> feature.toString()  was expected to return the fully-qualified feature name.
>
> But v3 augmented the feature "toString" to provide more information about a 
> feature.
>
> Fix was to change all occurrances of feature.toString() to feature.getName().
>
> Now the webanno-io-tsv first test succeeds, although it still is reporting 
> lots
> of updating of Annotation "end" values while the annotation is in the index
> (UIMA is recovering these, one at a time).
>
> on to the next problem...
>
>
> On 1/17/2018 4:52 PM, Marshall Schor wrote:
>> I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
>> exception, to move on to the next issue :-)
>>
>> Next issue: the first runPipeLine() in that same test now fails, saying:
>>
>> Caused by: java.io.IOException: Target file
>> [target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
>> exists and overwriting not enabled.
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
>>     ... 38 more
>>
>> I got around that by erasing the target/ directory, then doing a
>> maven-update-project to cause an Eclispe rebuild of the project. Now when I 
>> run
>> it I get a beyond the above error.  The next error is:
>>
>> java.io.IOException: example2.tsv This is not a valid TSV File. check this 
>> line:
>> 1-1    Ms.    Sofa
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>>     at
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>>     at
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>>     at
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>>     at
>> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>>     at
>> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
>>     at
>> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)
>>
>> The file in question has these as its first few lines:
>>
>>  # de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity | sofa | begin | 
>> end |
>> value | identifier # de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS 
>> |
>> sofa | begin | end | PosValue | coarseValue #
>> de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency | sofa |
>> begin | end | DependencyType | flavor |
>> 

Re: UIMAv3 & WebAnno

2018-01-18 Thread Marshall Schor
Next problem:  The type in webAnno v3:
de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity
has features
  sofa
  begin
  end
  value
  identifier  <<< Surprise! not in the "reference" compare files.

It looks like "identifier" was added for v3?

It makes tests which compare tsv files where the reference doesn't have it, 
fail.

The miscompare:
  actual:
#FORMAT=WebAnno TSV 3.2
#T_SP=de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity|value|identifier

  expected:
#FORMAT=WebAnno TSV 3.2
#T_SP=de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity|value

  You can see the "identifier" is missing from the "reference" file.

How would you recommend fixing this? 
 - update the reference file to include the indentifier (might be many other
cascading changes needed?)
 - remove the "identifier" feature from the
de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity ?
    (but probably will break something, assuming it was added for a reason! :-) 
)

-Marshall


On 1/18/2018 10:50 AM, Marshall Schor wrote:
> Fixed this problem. 
>
> It was due to many places in webAnno where the code fragment:
>
> feature.toString()  was expected to return the fully-qualified feature name.
>
> But v3 augmented the feature "toString" to provide more information about a 
> feature.
>
> Fix was to change all occurrances of feature.toString() to feature.getName().
>
> Now the webanno-io-tsv first test succeeds, although it still is reporting 
> lots
> of updating of Annotation "end" values while the annotation is in the index
> (UIMA is recovering these, one at a time).
>
> on to the next problem...
>
>
> On 1/17/2018 4:52 PM, Marshall Schor wrote:
>> I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
>> exception, to move on to the next issue :-)
>>
>> Next issue: the first runPipeLine() in that same test now fails, saying:
>>
>> Caused by: java.io.IOException: Target file
>> [target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
>> exists and overwriting not enabled.
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
>>     ... 38 more
>>
>> I got around that by erasing the target/ directory, then doing a
>> maven-update-project to cause an Eclispe rebuild of the project. Now when I 
>> run
>> it I get a beyond the above error.  The next error is:
>>
>> java.io.IOException: example2.tsv This is not a valid TSV File. check this 
>> line:
>> 1-1    Ms.    Sofa
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>>     at
>> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>>     at
>> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>>     at
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>>     at
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>>     at
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>>     at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>>     at
>> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>>     at
>> 

Re: UIMAv3 & WebAnno

2018-01-18 Thread Marshall Schor
Fixed this problem. 

It was due to many places in webAnno where the code fragment:

feature.toString()  was expected to return the fully-qualified feature name.

But v3 augmented the feature "toString" to provide more information about a 
feature.

Fix was to change all occurrances of feature.toString() to feature.getName().

Now the webanno-io-tsv first test succeeds, although it still is reporting lots
of updating of Annotation "end" values while the annotation is in the index
(UIMA is recovering these, one at a time).

on to the next problem...


On 1/17/2018 4:52 PM, Marshall Schor wrote:
> I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
> exception, to move on to the next issue :-)
>
> Next issue: the first runPipeLine() in that same test now fails, saying:
>
> Caused by: java.io.IOException: Target file
> [target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
> exists and overwriting not enabled.
>     at
> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
>     at
> de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
>     at
> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
>     ... 38 more
>
> I got around that by erasing the target/ directory, then doing a
> maven-update-project to cause an Eclispe rebuild of the project. Now when I 
> run
> it I get a beyond the above error.  The next error is:
>
> java.io.IOException: example2.tsv This is not a valid TSV File. check this 
> line:
> 1-1    Ms.    Sofa
>     at
> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
>     at
> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
>     at
> de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
>     at
> de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
>     at
> de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>     at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>     at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>     at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>     at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>     at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>     at
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>     at
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>     at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
>     at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
>     at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
>     at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)
>
> The file in question has these as its first few lines:
>
>  # de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity | sofa | begin | 
> end |
> value | identifier # de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS |
> sofa | begin | end | PosValue | coarseValue #
> de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency | sofa |
> begin | end | DependencyType | flavor |
> AttachTo=de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS
> #id=1
> #text=Ms. Haag plays Elianti .
> 1-1    Ms.    Sofa
>    sofaNum: 1
>    sofaID: "_InitialView"
>    mimeType: "text"
>    sofaArray: 
>    sofaString: "Ms. Haag plays Elianti .
> Rolls-Royce Motor 

Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
no apologies needed; we're all kind of on the bleeding edge testing uv3 :-)

Glad to help out if I find small fixes for webanno :-)

Cheers. -Marshall


On 1/17/2018 4:37 PM, Richard Eckart de Castilho wrote:
>> On 17.01.2018, at 22:12, Marshall Schor  wrote:
>>
>> I put in an exclude for the slf4j-log4j12, and went to the next issue:
>>
>> Tests in webanno-io-tsv fail.  The first one is failing here:
>> WebAnnoTsv2ReaderWriterTest, line  65 (runPipeLine(reader, writer).
>>
>> It fails because it's updating an "end" value for an annotation that's 
>> already
>> in the index, causing the message which follows.
>> UIMA normally recovers from these things, but a global flag was configured:
>> "uima.exception_when_fs_update_corrupts_index".
>>
>> System.getProperty("uima.exception_when_fs_update_corrupts_index")
>>  (java.lang.String) true
>>
>> I can't see where this is being set, though.  Any ideas?  Is the updating of 
>> the
>> annotation:end while the item is indexed, the way it is designed to work?
> This is set in DkproTestContext which is included in many tests as a JUnit 
> @Rule.
>
> Again, I likely didn't notice this because I only did an Eclipse build on this
> branch, not a Maven build. The presently maintained versions of WebAnno still
> depend on DKPro Core 1.7.0 which did not set this property - so we never hit
> this so far.
>
> Again, I'll have a look at it.
>
> Sorry, I sure hoped you'd have a smoother experience doing this build. But
> you know... you're a bit beyond the bleeding edge on the WebAnno and DKPro 
> Core
> branches that use UIMAv3 ;)
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
I changed the testcase for WebAnnoTsv2ReaderWriterTest to turn off the
exception, to move on to the next issue :-)

Next issue: the first runPipeLine() in that same test now fails, saying:

Caused by: java.io.IOException: Target file
[target\test-output\WebAnnoTsv2ReaderWriterTest-test\example2.tsv] already
exists and overwriting not enabled.
    at
de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:230)
    at
de.tudarmstadt.ukp.dkpro.core.api.io.JCasFileWriter_ImplBase.getOutputStream(JCasFileWriter_ImplBase.java:155)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Writer.process(WebannoTsv2Writer.java:101)
    ... 38 more

I got around that by erasing the target/ directory, then doing a
maven-update-project to cause an Eclispe rebuild of the project. Now when I run
it I get a beyond the above error.  The next error is:

java.io.IOException: example2.tsv This is not a valid TSV File. check this line:
1-1    Ms.    Sofa
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:159)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
    at
de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:81)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:538)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:760)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:460)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:206)

The file in question has these as its first few lines:

 # de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity | sofa | begin | end |
value | identifier # de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS |
sofa | begin | end | PosValue | coarseValue #
de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency | sofa |
begin | end | DependencyType | flavor |
AttachTo=de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS
#id=1
#text=Ms. Haag plays Elianti .
1-1    Ms.    Sofa
   sofaNum: 1
   sofaID: "_InitialView"
   mimeType: "text"
   sofaArray: 
   sofaString: "Ms. Haag plays Elianti .
Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
   sofaURI:     0    3    B-PER   
B-de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity_    NNP   
de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS_    Sofa
   sofaNum: 1
   sofaID: "_InitialView"
   mimeType: "text"
   sofaArray: 
   sofaString: "Ms. Haag plays Elianti .
Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
   sofaURI:     0    14    SUBJ   
de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency_    1-3   
1-2    Haag    Sofa
   sofaNum: 1
   sofaID: "_InitialView"
   mimeType: "text"
   sofaArray: 
   sofaString: "Ms. Haag plays Elianti .
Rolls-Royce Motor Cars Inc. said it expects its U.S. sa..."
   sofaURI:     4    8    I-PER   

Re: UIMAv3 & WebAnno

2018-01-17 Thread Richard Eckart de Castilho

> On 17.01.2018, at 22:12, Marshall Schor  wrote:
> 
> I put in an exclude for the slf4j-log4j12, and went to the next issue:
> 
> Tests in webanno-io-tsv fail.  The first one is failing here:
> WebAnnoTsv2ReaderWriterTest, line  65 (runPipeLine(reader, writer).
> 
> It fails because it's updating an "end" value for an annotation that's already
> in the index, causing the message which follows.
> UIMA normally recovers from these things, but a global flag was configured:
> "uima.exception_when_fs_update_corrupts_index".
> 
> System.getProperty("uima.exception_when_fs_update_corrupts_index")
>  (java.lang.String) true
> 
> I can't see where this is being set, though.  Any ideas?  Is the updating of 
> the
> annotation:end while the item is indexed, the way it is designed to work?

This is set in DkproTestContext which is included in many tests as a JUnit 
@Rule.

Again, I likely didn't notice this because I only did an Eclipse build on this
branch, not a Maven build. The presently maintained versions of WebAnno still
depend on DKPro Core 1.7.0 which did not set this property - so we never hit
this so far.

Again, I'll have a look at it.

Sorry, I sure hoped you'd have a smoother experience doing this build. But
you know... you're a bit beyond the bleeding edge on the WebAnno and DKPro Core
branches that use UIMAv3 ;)

Cheers,

-- Richard

Re: UIMAv3 & WebAnno

2018-01-17 Thread Richard Eckart de Castilho
On 17.01.2018, at 21:24, Marshall Schor  wrote:
> 
> first build issue: Building webanno-io-tsv, caused by slf4j-log4j12 (the very
> old log4j).

Hm. I'll have a look at it. No idea how I missed that on my machine. Probably
I didn't do a Maven build, only an Eclipse build.

-- Richard

Re: UIMAv3 & WebAnno

2018-01-17 Thread Richard Eckart de Castilho

> On 17.01.2018, at 20:36, Marshall Schor  wrote:
> 
> I got to the end of your instructions OK, I think.
> 
> It showed me a page, after "import" action, with "details" tab, having
> name/project type/description/script direction/ 
> The "Console" (I'm using eclipse "run" to run this) has a whole bunch of
> messages about imports, tag creation, etc.  No error messages I think, so far.

Good.

> pushing the "save" button on the "details" tab gives error = hibernate 
> exception
> - constraint violation exception, could not execute statement,
> Caused by: caused by ... (top one) org.hsqldb.HsqlException: integrity
> constraint violation: unique constraint or index violation;
> UK3K75VVU7MEVYVVB5MAY5LJ8K7 table: PROJECT
> at org.hsqldb.error.Error.error(Unknown Source) ~[hsqldb-2.4.0.jar:2.4.0]

Hm, I need to look into that. Actually, there is no need to press save 
immediately
after import - but it should also not cause an exception. Probably there is a 
but
which causes WebAnno to try to save the project *again* under the same name 
(which
is not allowed).

> Switched to the Annotation page, saw 3 documents - 2 in blue, the middle one 
> in
> red.
> tried opening Mockingjay2.tsv.  Got a popup : Do you want to leave this site? 
> I
> chose Leave, and it then said:
> Internal Error, with a link "return to home page".
> Console shows more errors, the top caused by seems to be in hibernate , eg
> Caused by: org.hibernate.TransientObjectException: object references an 
> unsaved
> transient instance - save the transient instance before flushing:
> de.tudarmstadt.ukp.clarin.webanno.model.Project
> 
> Maybe I need to configure hibernate? 

No, nothing should need to be configured by you. I'll try reproducing and 
fixing this.

Thanks for these reports!

-- Richard

Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
I put in an exclude for the slf4j-log4j12, and went to the next issue:

Tests in webanno-io-tsv fail.  The first one is failing here:
WebAnnoTsv2ReaderWriterTest, line  65 (runPipeLine(reader, writer).

It fails because it's updating an "end" value for an annotation that's already
in the index, causing the message which follows.
UIMA normally recovers from these things, but a global flag was configured:
"uima.exception_when_fs_update_corrupts_index".

System.getProperty("uima.exception_when_fs_update_corrupts_index")
     (java.lang.String) true

I can't see where this is being set, though.  Any ideas?  Is the updating of the
annotation:end while the item is indexed, the way it is designed to work?

-Marshall
=== test =
2018-01-17 15:52:35 INFO WebannoTsv2Reader - Scanning
[file:/C:/au/gitClones/webanno/webanno-io-tsv/src/test/resources/tsv2/]
2018-01-17 15:52:35 INFO WebannoTsv2Reader - Found [1] resources to be read
2018-01-17 15:54:31 INFO WebannoTsv2Reader - 0 of 1:
file:/C:/au/gitClones/webanno/webanno-io-tsv/src/test/resources/tsv2/example2.tsv
2018-01-17 15:54:31 WARN uima - While FS was in the index, the feature
"uima.tcas.Annotation:end", which is used as a key in one or more indexes, was
modified
 FS = "NamedEntity
   sofa: _InitialView
   begin: 0
   end: 3
   value: "PER"
   identifier: "
java.lang.Throwable
    at 
org.apache.uima.cas.impl.CASImpl.featModWhileInIndexReport(CASImpl.java:2985)
    at 
org.apache.uima.cas.impl.CASImpl.featModWhileInIndexReport(CASImpl.java:2977)
    at
org.apache.uima.cas.impl.CASImpl.checkForInvalidFeatureSetting(CASImpl.java:2865)
    at 
org.apache.uima.cas.impl.CASImpl.setWithCheckAndJournal(CASImpl.java:1828)
    at
org.apache.uima.cas.impl.FeatureStructureImplC._setIntValueNfcCJ(FeatureStructureImplC.java:684)
    at
org.apache.uima.cas.impl.FeatureStructureImplC._setIntValueNfc(FeatureStructureImplC.java:460)
    at org.apache.uima.jcas.tcas.Annotation.setEnd(Annotation.java:123)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.createSpanAnnotation(WebannoTsv2Reader.java:506)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.setAnnotations(WebannoTsv2Reader.java:176)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.convertToCas(WebannoTsv2Reader.java:78)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebannoTsv2Reader.getNext(WebannoTsv2Reader.java:547)
    at
de.tudarmstadt.ukp.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase.getNext(JCasResourceCollectionReader_ImplBase.java:36)
    at
org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:100)
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.WebAnnoTsv2ReaderWriterTest.test(WebAnnoTsv2ReaderWriterTest.java:65)


Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
first build issue: Building webanno-io-tsv, caused by slf4j-log4j12 (the very
old log4j).

I see the webanno pom has two "excludes" for slf4j-log4j12.
But running dependency:tree on webanno-io-tsv shows
...dkpro.core:...testing-asl has this dependency.  May I exclude it?

More details below:

test failure,
Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar
AND bound slf4j-log4j12.jar on the class path, preempting StackOverflowError.
See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
    at
de.tudarmstadt.ukp.clarin.webanno.tsv.internal.tsv3x.Tsv3XSchemaAnalyzerTest.testAnalyze(Tsv3XSchemaAnalyzerTest.java:33)

The manual for slf4j says:


log4j-over-slf4j.jar and slf4j-log4j12.jar cannot be present 
simultaneously

The presence of /slf4j-log4j12.jar/, that is the log4j binding for SLF4J, will
force all SLF4J calls to be delegated to log4j. The presence
of /log4j-over-slf4j.jar/ will in turn delegate all log4j API calls to their
SLF4J equivalents. If both are present simultaneously, slf4j calls will be
delegated to log4j, and log4j calls redirected to SLF4j, resulting in an endless
loop .



Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
perhaps best to ignore this message...

I never "built" it from the command line, so although Eclispe showed no errors,
it probably wasn't built.

Now building it. 

-Marshall



On 1/17/2018 2:36 PM, Marshall Schor wrote:
> I got to the end of your instructions OK, I think.
>
> It showed me a page, after "import" action, with "details" tab, having
> name/project type/description/script direction/ 
> The "Console" (I'm using eclipse "run" to run this) has a whole bunch of
> messages about imports, tag creation, etc.  No error messages I think, so far.
>
> pushing the "save" button on the "details" tab gives error = hibernate 
> exception
> - constraint violation exception, could not execute statement,
> Caused by: caused by ... (top one) org.hsqldb.HsqlException: integrity
> constraint violation: unique constraint or index violation;
> UK3K75VVU7MEVYVVB5MAY5LJ8K7 table: PROJECT
>     at org.hsqldb.error.Error.error(Unknown Source) ~[hsqldb-2.4.0.jar:2.4.0]
>
> Switched to the Annotation page, saw 3 documents - 2 in blue, the middle one 
> in
> red.
> tried opening Mockingjay2.tsv.  Got a popup : Do you want to leave this site? 
> I
> chose Leave, and it then said:
> Internal Error, with a link "return to home page".
> Console shows more errors, the top caused by seems to be in hibernate , eg
> Caused by: org.hibernate.TransientObjectException: object references an 
> unsaved
> transient instance - save the transient instance before flushing:
> de.tudarmstadt.ukp.clarin.webanno.model.Project
>
> Maybe I need to configure hibernate? 
> -Marshall
>
>
> On 1/16/2018 5:35 PM, Richard Eckart de Castilho wrote:
>> On 16.01.2018, at 20:08, Marshall Schor  wrote:
>>> Project build error: Non-resolvable import POM: Could not find artifact
>>> de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core-asl:pom:2.0.0-SNAPSHOT
>>>
>>> Is it reasonable for me to try to build and test this myself, or does doing 
>>> that
>>> entail needing a lot of infrastructure setup, etc. that would be 
>>> unreasonable to
>>> try to do?
>> Thanks for try to have a look at this!
>>
>> It should be pretty straight-forward to run WebAnno without much 
>> infrastructure
>> (unless there are bugs). I'd say it is worth try.
>>
>> Before you try, please pull the latest changes from the WebAnno 
>> "feature/4.0.x/issue800-uima-v3" branch:
>> - I have just (hopefully) fixed a problem that prevented starting up with 
>> the embedded database.
>> - I have also added a repository declaration for the DKPro Core 
>> 2.0.0-SNAPSHOT artifacts
>>
>> The easiest way to run is is locating the class 
>> "de.tudarmstadt.ukp.clarin.webanno.webapp.WebAnno"
>> in the "webanno-webapp" module and running that as a Java application. 
>> WebAnno will start up using
>> an embedded database and an embedded web server. One it has started, you 
>> should be able to access it
>> at http://localhost:8080 and should be able to log in as user "admin" with 
>> password "admin".
>>
>> You can download a demo project here: 
>> https://webanno.github.io/webanno/examples/demo-en.zip
>> When you download it, make sure that your browser does not automatically 
>> extract the archive.
>>
>> This ZIP includes serialized CASes amongst other data that makes up a 
>> WebAnno project.
>>
>> To import it, go to the "Project" page in WebAnno and under "Import 
>> project", activate the 
>> "create missing users" checkbox and choose the demo-en.zip file from you 
>> local disk drive. 
>>
>> After the import, go back to the main page via the "Home" link and then 
>> choose "Annotation".
>> One the annotation page, choose a document from the project you have just 
>> imported.
>>
>> Most likely at this point, you should see if there are any CAS loading 
>> problems.
>>
>> I hope that you reach this point without trouble and curious what might lie 
>> beyond it.
>>
>> If you don't have the leisure to try it out: I have it on my todo list... 
>> but it might
>> take until end of the month until I actually get to it. Sorry for being a 
>> bit slow.
>>
>> Cheers,
>>
>> -- Richard
>



Re: UIMAv3 & WebAnno

2018-01-17 Thread Marshall Schor
I got to the end of your instructions OK, I think.

It showed me a page, after "import" action, with "details" tab, having
name/project type/description/script direction/ 
The "Console" (I'm using eclipse "run" to run this) has a whole bunch of
messages about imports, tag creation, etc.  No error messages I think, so far.

pushing the "save" button on the "details" tab gives error = hibernate exception
- constraint violation exception, could not execute statement,
Caused by: caused by ... (top one) org.hsqldb.HsqlException: integrity
constraint violation: unique constraint or index violation;
UK3K75VVU7MEVYVVB5MAY5LJ8K7 table: PROJECT
    at org.hsqldb.error.Error.error(Unknown Source) ~[hsqldb-2.4.0.jar:2.4.0]

Switched to the Annotation page, saw 3 documents - 2 in blue, the middle one in
red.
tried opening Mockingjay2.tsv.  Got a popup : Do you want to leave this site? I
chose Leave, and it then said:
Internal Error, with a link "return to home page".
Console shows more errors, the top caused by seems to be in hibernate , eg
Caused by: org.hibernate.TransientObjectException: object references an unsaved
transient instance - save the transient instance before flushing:
de.tudarmstadt.ukp.clarin.webanno.model.Project

Maybe I need to configure hibernate? 
-Marshall


On 1/16/2018 5:35 PM, Richard Eckart de Castilho wrote:
> On 16.01.2018, at 20:08, Marshall Schor  wrote:
>> Project build error: Non-resolvable import POM: Could not find artifact
>> de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core-asl:pom:2.0.0-SNAPSHOT
>>
>> Is it reasonable for me to try to build and test this myself, or does doing 
>> that
>> entail needing a lot of infrastructure setup, etc. that would be 
>> unreasonable to
>> try to do?
> Thanks for try to have a look at this!
>
> It should be pretty straight-forward to run WebAnno without much 
> infrastructure
> (unless there are bugs). I'd say it is worth try.
>
> Before you try, please pull the latest changes from the WebAnno 
> "feature/4.0.x/issue800-uima-v3" branch:
> - I have just (hopefully) fixed a problem that prevented starting up with the 
> embedded database.
> - I have also added a repository declaration for the DKPro Core 
> 2.0.0-SNAPSHOT artifacts
>
> The easiest way to run is is locating the class 
> "de.tudarmstadt.ukp.clarin.webanno.webapp.WebAnno"
> in the "webanno-webapp" module and running that as a Java application. 
> WebAnno will start up using
> an embedded database and an embedded web server. One it has started, you 
> should be able to access it
> at http://localhost:8080 and should be able to log in as user "admin" with 
> password "admin".
>
> You can download a demo project here: 
> https://webanno.github.io/webanno/examples/demo-en.zip
> When you download it, make sure that your browser does not automatically 
> extract the archive.
>
> This ZIP includes serialized CASes amongst other data that makes up a WebAnno 
> project.
>
> To import it, go to the "Project" page in WebAnno and under "Import project", 
> activate the 
> "create missing users" checkbox and choose the demo-en.zip file from you 
> local disk drive. 
>
> After the import, go back to the main page via the "Home" link and then 
> choose "Annotation".
> One the annotation page, choose a document from the project you have just 
> imported.
>
> Most likely at this point, you should see if there are any CAS loading 
> problems.
>
> I hope that you reach this point without trouble and curious what might lie 
> beyond it.
>
> If you don't have the leisure to try it out: I have it on my todo list... but 
> it might
> take until end of the month until I actually get to it. Sorry for being a bit 
> slow.
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno

2018-01-16 Thread Richard Eckart de Castilho
On 16.01.2018, at 20:08, Marshall Schor  wrote:
> 
> Project build error: Non-resolvable import POM: Could not find artifact
> de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core-asl:pom:2.0.0-SNAPSHOT
> 
> Is it reasonable for me to try to build and test this myself, or does doing 
> that
> entail needing a lot of infrastructure setup, etc. that would be unreasonable 
> to
> try to do?

Thanks for try to have a look at this!

It should be pretty straight-forward to run WebAnno without much infrastructure
(unless there are bugs). I'd say it is worth try.

Before you try, please pull the latest changes from the WebAnno 
"feature/4.0.x/issue800-uima-v3" branch:
- I have just (hopefully) fixed a problem that prevented starting up with the 
embedded database.
- I have also added a repository declaration for the DKPro Core 2.0.0-SNAPSHOT 
artifacts

The easiest way to run is is locating the class 
"de.tudarmstadt.ukp.clarin.webanno.webapp.WebAnno"
in the "webanno-webapp" module and running that as a Java application. WebAnno 
will start up using
an embedded database and an embedded web server. One it has started, you should 
be able to access it
at http://localhost:8080 and should be able to log in as user "admin" with 
password "admin".

You can download a demo project here: 
https://webanno.github.io/webanno/examples/demo-en.zip
When you download it, make sure that your browser does not automatically 
extract the archive.

This ZIP includes serialized CASes amongst other data that makes up a WebAnno 
project.

To import it, go to the "Project" page in WebAnno and under "Import project", 
activate the 
"create missing users" checkbox and choose the demo-en.zip file from you local 
disk drive. 

After the import, go back to the main page via the "Home" link and then choose 
"Annotation".
One the annotation page, choose a document from the project you have just 
imported.

Most likely at this point, you should see if there are any CAS loading problems.

I hope that you reach this point without trouble and curious what might lie 
beyond it.

If you don't have the leisure to try it out: I have it on my todo list... but 
it might
take until end of the month until I actually get to it. Sorry for being a bit 
slow.

Cheers,

-- Richard

Re: UIMAv3 & WebAnno

2018-01-16 Thread Marshall Schor
I thought I'd try downloading the branch of webanno labeled uv3 and seeing if I
could build it.
It seems to need some SNAPSHOT repos which I don't have set up for access, e.g.:

Project build error: Non-resolvable import POM: Could not find artifact
de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core-asl:pom:2.0.0-SNAPSHOT

Is it reasonable for me to try to build and test this myself, or does doing that
entail needing a lot of infrastructure setup, etc. that would be unreasonable to
try to do?

Cheers. -Marshall

On 1/3/2018 6:16 PM, Richard Eckart de Castilho wrote:
> Hi again,
>
> I have once again switched my local environment to a UIMA v3 mode:
>
> - UIMA SDK v3 (3.0.1-beta-SNAPSHOT v3 branch)
> - uimaFIT (3.0.0-SNAPSHOT v3 branch)
> - DKPro Core (2.0.x branch)
> - WebAnno (feature/issue1115-uimav3 branch)
>
> Last time, I ran into trouble because the IDs loaded from serialized CAS 
> files were no longer accessible.
> I programmatically set "uima.default_v2_id_references" to "true" during 
> startup now to avoid that.
>
>
> But what seems to be happening even before getting there is that I run again 
> in JCas <-> Type System problems.
> When a user opens a document for annotation in WebAnno, WebAnno loads the 
> serialized CAS (CasCompleteSerializer),
> serializes the CAS into a byte array (compressed form 6), creates a new CAS 
> with the current type system definition,
> and deserializes the data again into that CAS. The idea is that the lenient 
> loading of the compressed form 6 allows
>
>   a) new types / features to be added in that way
>   b) unreachable FSes to be garbage collected
>
> So, it is not an uncommon case here that the data stored with the 
> CasCompleteSerializer used a different type system than the CAs into which it 
> is loaded - and in fact it can be the case that the data stored with the 
> CasCompleteSerializer had used different JCas wrappers at the time then what 
> is available at the time of loading
> the data again. Afaik in there should be no truely incompatible changes in 
> the type system though - i.e. only new features / types were added; no 
> features were removed. Still, I get a lot of this type of error:
>
>> org.apache.uima.cas.CASRuntimeException: The JCas cannot be initialized.  
>> The following errors occurred: 
>> In JCAS class 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures",
>>  UIMA field 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures:verbForm"
>>  was set up when this class was previously loaded and initialized, to have 
>> an adjusted offset of "-1" but now the feature has a different adjusted 
>> offset of "5"; this may be due to something else other than type system 
>> commit actions loading and initializing the JCas class, or to having a 
>> different non-compatible type system for this class, trying to use a common 
>> JCas cover class, which is not supported. 
>>  
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.reportErrors(FSClassRegistry.java:870)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.loadJCasForTSandClassLoader(FSClassRegistry.java:342)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.getGeneratorsForClassLoader(FSClassRegistry.java:904)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.getGeneratorsForClassLoader(TypeSystemImpl.java:2651)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.commit(TypeSystemImpl.java:1393) 
>> ~[classes/:?]
>>  at org.apache.uima.cas.impl.CASImpl.commitTypeSystem(CASImpl.java:1607) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.doCreateCas(CasCreationUtils.java:614) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:362) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:313) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.fit.factory.JCasFactory.createJCas(JCasFactory.java:147) 
>> ~[classes/:?]
>>  at 
>> de.tudarmstadt.ukp.clarin.webanno.api.dao.AnnotationSchemaServiceImpl.upgradeCas(AnnotationSchemaServiceImpl.java:640)
>>  ~[classes/:?]
> I have the feeling that this is what happens:
>
> 1) a CasCompleteSerialized-CAS is loaded - it was created at a time when the 
> MorphologicalFeatures did not yet have a feature called "verbForm".
> 2) I create a new JCas, now using a type system description where 
> MorphologicalFeatures includes the "verbForm" feature
>
> At step 2, the above error seems to be triggered. I actually do not even get 
> to the point where I would temporarily serialize into form 6 and back. The 
> code already crashes when trying to set up the target task with the updated 
> type system.
>
> Any ideas?
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno

2018-01-16 Thread Marshall Schor
I found a set of edge cases this change exposed, involved in properly setting up
the JCas classes and type systems.

These are now fixed under Jira https://issues.apache.org/jira/browse/UIMA-5704

UIMA v3 builds OK (for me) in both eclipse and command line.
(Command line differs in the sequencing of the test cases).

This version now supports merging extra features from JCas classes into type
systems
(and also has implemented the new semi-built-in Int2FS map type).

So, it might work nicely with WebAnno; if not, I'm hoping to see the next set of
bugs soon!

Cheers. -Marshall


On 1/12/2018 1:47 PM, Marshall Schor wrote:
> oops, some build errors.  (Strange - it built fine in Eclipse, but fails in
> Jenkins).  Working on these
>
>
> On 1/11/2018 2:02 PM, Marshall Schor wrote:
>> A new version of v3 is now committed, with changes to support some use cases
>> where different type systems are loaded into an existing UIMA instance (using
>> the same class loader) where some user-defined JCas classes are already 
>> loaded.
>>
>> The new version supports the case where the JCas defines some additional
>> features for a type and the type system being loaded has a subset of those, 
>> by
>> adding any features only defined in the JCas class to the type system, just 
>> as
>> it is being committed.
>>
>> Not all possible use cases are handled, but the hope is that common ones 
>> are. 
>> Those not handled will produce error messages when the type system is 
>> committed, if the new type system can't be made to conform to the already 
>> loaded
>> JCas classes.
>>
>> It would be great if this could be tested with WebAnno, to see if it covers 
>> the
>> use cases in that platform.
>>
>> -Marshall
>>
>>
>



Re: UIMAv3 & WebAnno

2018-01-12 Thread Marshall Schor
oops, some build errors.  (Strange - it built fine in Eclipse, but fails in
Jenkins).  Working on these


On 1/11/2018 2:02 PM, Marshall Schor wrote:
> A new version of v3 is now committed, with changes to support some use cases
> where different type systems are loaded into an existing UIMA instance (using
> the same class loader) where some user-defined JCas classes are already 
> loaded.
>
> The new version supports the case where the JCas defines some additional
> features for a type and the type system being loaded has a subset of those, by
> adding any features only defined in the JCas class to the type system, just as
> it is being committed.
>
> Not all possible use cases are handled, but the hope is that common ones are. 
> Those not handled will produce error messages when the type system is 
> committed, if the new type system can't be made to conform to the already 
> loaded
> JCas classes.
>
> It would be great if this could be tested with WebAnno, to see if it covers 
> the
> use cases in that platform.
>
> -Marshall
>
>



Re: UIMAv3 & WebAnno

2018-01-11 Thread Marshall Schor
A new version of v3 is now committed, with changes to support some use cases
where different type systems are loaded into an existing UIMA instance (using
the same class loader) where some user-defined JCas classes are already loaded.

The new version supports the case where the JCas defines some additional
features for a type and the type system being loaded has a subset of those, by
adding any features only defined in the JCas class to the type system, just as
it is being committed.

Not all possible use cases are handled, but the hope is that common ones are. 
Those not handled will produce error messages when the type system is 
committed, if the new type system can't be made to conform to the already loaded
JCas classes.

It would be great if this could be tested with WebAnno, to see if it covers the
use cases in that platform.

-Marshall



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-11 Thread Marshall Schor
Hi,

Now I understand what you meant by JCas first :-).

I see how it could solve the problem for a single type, in isolation (without
consideration for super/subtypes).

However, if the type had subtypes, then things break down.  This is because of
the following constraints.
Assume type T and subtype of T: TS1 and another subtype of T: TS2

The nature of inheritance requires that TS1 and TS2 both contain all the
features of T.
Because instances of TS2 and TS1 could be cast to T, the JCas for T could be
used to retrieve the (common -to-TS1) features.
This constrains the "offsets" of those to be the same in TS1 and TS2.
This in turn, implies that the offsets for T come before the feature slots for
TS1 and TS2 (there could be different numbers of those features in TS1 and TS2).

So building such a type system would work nicely, so far.

Now consider a new type system with type T' (named the same as T, but having an
extra feature, not in the JCas).
TS1 and TS2 now have an extra feature being inherited, which must occupy the
same slot.
But the offsets for TS1 and TS2's own features were assigned, already, following
the features for the old definition for T.

So now things are no longer working. 

-Marshall



B

On 1/10/2018 3:15 PM, Richard Eckart de Castilho wrote:
>> Some use cases with comments:
>>
>> 1) Type T loaded with features f1, f2, f3,  JCas loaded with f1, f2, f3
>> Followed by: Type T loaded with features f1, f3.
>>
>> This causes at the 2nd Type T commit time, the augmentation of type T with
>> feature f2.
>> But, the (current) impl just does an "addFeature" API call.  The result is 
>> that
>> without extra work, the features in the type system will be ordered as f1, 
>> f3,
>> f2.  And the assigned offsets could be different. 
>>
>> To fix this, the algorithm which assigns offsets will need to see if the
>> corresponding JCas class (if any) has offsets already assigned, and try to 
>> use
>> those.
> This is why I suggested to use "JCas first": the order of the features should 
> be
> defined by the JCas (i.e. they come first) while features defined in other 
> TSDs
> get appended after that.
>
>> 2) Type T having supertype TS; Type T has 1 feature, f1, JCas for Type T has 
>> 1
>> feature f1.  TS has no features, no JCas for TS or JCas for TS has no 
>> features. 
>> Followed by: Type TS is loaded, having one feature (not in the JCas if there 
>> is
>> one for TS).
>>
>> This causes the features for type T (which includes all the features of its
>> supertype), to have offsets shifted down.
>> For example if T has feature f1 with offset "3",  it would now have offset 
>> "4"
>> (accounting for the space taken by the TS feature).
> I believe this could also be resolved by using "JCas first": first all the 
> slots
> for features defined in any of the JCas classes in the inheritance hierarchy
> are assigned and afterwards the features define in other TSDs are appended.
>
> I believe that by using "JCas first", the slots for the JCas class features
> are always fixed, independent of what other TSDs they are combined with.
>
> Does "JCas first" now sound more sensible?
>
> ... or maybe I am misunderstanding something basic (which is entirely 
> possible).
>
> Cheers,
>
> -- Richard
>
>



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-10 Thread Richard Eckart de Castilho
> Some use cases with comments:
> 
> 1) Type T loaded with features f1, f2, f3,  JCas loaded with f1, f2, f3
> Followed by: Type T loaded with features f1, f3.
> 
> This causes at the 2nd Type T commit time, the augmentation of type T with
> feature f2.
> But, the (current) impl just does an "addFeature" API call.  The result is 
> that
> without extra work, the features in the type system will be ordered as f1, f3,
> f2.  And the assigned offsets could be different. 
> 
> To fix this, the algorithm which assigns offsets will need to see if the
> corresponding JCas class (if any) has offsets already assigned, and try to use
> those.

This is why I suggested to use "JCas first": the order of the features should be
defined by the JCas (i.e. they come first) while features defined in other TSDs
get appended after that.

> 2) Type T having supertype TS; Type T has 1 feature, f1, JCas for Type T has 1
> feature f1.  TS has no features, no JCas for TS or JCas for TS has no 
> features. 
> Followed by: Type TS is loaded, having one feature (not in the JCas if there 
> is
> one for TS).
> 
> This causes the features for type T (which includes all the features of its
> supertype), to have offsets shifted down.
> For example if T has feature f1 with offset "3",  it would now have offset "4"
> (accounting for the space taken by the TS feature).

I believe this could also be resolved by using "JCas first": first all the slots
for features defined in any of the JCas classes in the inheritance hierarchy
are assigned and afterwards the features define in other TSDs are appended.

I believe that by using "JCas first", the slots for the JCas class features
are always fixed, independent of what other TSDs they are combined with.

Does "JCas first" now sound more sensible?

... or maybe I am misunderstanding something basic (which is entirely possible).

Cheers,

-- Richard



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-10 Thread Marshall Schor
Another "failure" use case:

- Load type T, with feature f1. 
- Load JCas for type T with feature f2. 
--  (merged type T has f1, f2, used to assign offsets)

Next, load type T with features f1, and f3.
- At commit, this would be "merged" with the JCas, to give f1, f3, and f2.
- But f2 already has an offset assigned, which would break the existing assign
algorithm (which assigns sequentially in the order of the feature structures).  

To attempt to overcome this in some cases, an algorithm would be needed which
attempted to assign offsets, constrained by any existing offsets present in any
/all of the JCas Classes for this type and its supertypes.

-Marshall

On 1/10/2018 10:42 AM, Marshall Schor wrote:
> The initial implementation requires features in the type system have an 
> ordering
> that is consistent with what got assigned when the JCas was loaded.
>
> Some use cases with comments:
>
> 1) Type T loaded with features f1, f2, f3,  JCas loaded with f1, f2, f3
> Followed by: Type T loaded with features f1, f3.
>
> This causes at the 2nd Type T commit time, the augmentation of type T with
> feature f2.
> But, the (current) impl just does an "addFeature" API call.  The result is 
> that
> without extra work, the features in the type system will be ordered as f1, f3,
> f2.  And the assigned offsets could be different. 
>
> To fix this, the algorithm which assigns offsets will need to see if the
> corresponding JCas class (if any) has offsets already assigned, and try to use
> those.
>
> 2) Type T having supertype TS; Type T has 1 feature, f1, JCas for Type T has 1
> feature f1.  TS has no features, no JCas for TS or JCas for TS has no 
> features. 
> Followed by: Type TS is loaded, having one feature (not in the JCas if there 
> is
> one for TS).
>
> This causes the features for type T (which includes all the features of its
> supertype), to have offsets shifted down.
> For example if T has feature f1 with offset "3",  it would now have offset "4"
> (accounting for the space taken by the TS feature).
>
> 
>
> Because of these issues, I'm wondering if it's really worth the time and
> complexity to implement this "partial" solution, given that there are 
> "complete"
> solutions of the following form:
>
> 1) Require users doing this kind of operation to first load a "merged" type
> system, creating a maximal-featured version (at least for all types / 
> supertypes
> having user-defined JCas classes) over all type systems that will be 
> processed,
> and use that to load (for the first time) the JCas classes.  When subset type
> systems are loaded subsequently by the application, they might cause failures
> (see supertype example use-case above).  To get around that, the application
> would need to change to always use the maximal type system for all loaded 
> CASs. 
> Some deserializations allow deserializing a CAS with a subset-type-system 
> into a
> CAS with a maximal type system.
>
> 2) Require users who want to have different type systems to load them using
> different class loaders (for the JCas classes).   This should work for all 
> cases.
> ==
>
> 2 questions for the user community:
>
> A) Does the user community think this enhancement is of sufficient value, with
> all of its limitations, to be worth doing?  I could go either way on this ,
> personally.
>
> B) Is the extra work to figure out a mapping for case 1 at the top (arranging
> the ordering of features to attempt to preserve the fixed values for the 
> loaded
> JCas offsets) worth doing?  (If not done, it would still be "checked", and 
> users
> would know a situation arose needed them to fix).
> My feeling is that this is not worth the effort for the few cases it might 
> enable.
>
> -Marshall
>
> On 1/9/2018 4:53 PM, Marshall Schor wrote:
>> I did an initial implementation, ignoring Pear files.
>>
>> I think the "feature expansion" when loading PEAR-classpath specified JCas
>> classes can't reasonably be done (because by the time you lazily get around 
>> to
>> loading these, the type system is committed).
>>
>> So, I plan to have the pear loading path operate like before, with no feature
>> expansion.
>>
>> I kind of doubt this will be a real issue in actual practice (he said 
>> hopefully
>> :-) ).
>>
>> Still need to fix up some test cases, but it's looking promising...
>>
>> -Marshall
>>
>>
>> On 1/8/2018 2:47 PM, Marshall Schor wrote:
>>> In working out the details, the following difficulty emerges:
>>>
>>> In the general case, a pipeline is associated with a class loader (used to 
>>> load
>>> JCas classes).
>>> When the pipeline contains "PEARs", each pear can specify it's own class 
>>> loader,
>>> and therefore, it's own set of JCas classes.
>>>
>>> So, at type system commit time, with this proposal, it would be necessary to
>>> find all of the class loaders that Pears might be using.  This 
>>> unfortunately is
>>> not possible in general, because the 

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-10 Thread Marshall Schor
The initial implementation requires features in the type system have an ordering
that is consistent with what got assigned when the JCas was loaded.

Some use cases with comments:

1) Type T loaded with features f1, f2, f3,  JCas loaded with f1, f2, f3
Followed by: Type T loaded with features f1, f3.

This causes at the 2nd Type T commit time, the augmentation of type T with
feature f2.
But, the (current) impl just does an "addFeature" API call.  The result is that
without extra work, the features in the type system will be ordered as f1, f3,
f2.  And the assigned offsets could be different. 

To fix this, the algorithm which assigns offsets will need to see if the
corresponding JCas class (if any) has offsets already assigned, and try to use
those.

2) Type T having supertype TS; Type T has 1 feature, f1, JCas for Type T has 1
feature f1.  TS has no features, no JCas for TS or JCas for TS has no features. 
Followed by: Type TS is loaded, having one feature (not in the JCas if there is
one for TS).

This causes the features for type T (which includes all the features of its
supertype), to have offsets shifted down.
For example if T has feature f1 with offset "3",  it would now have offset "4"
(accounting for the space taken by the TS feature).



Because of these issues, I'm wondering if it's really worth the time and
complexity to implement this "partial" solution, given that there are "complete"
solutions of the following form:

1) Require users doing this kind of operation to first load a "merged" type
system, creating a maximal-featured version (at least for all types / supertypes
having user-defined JCas classes) over all type systems that will be processed,
and use that to load (for the first time) the JCas classes.  When subset type
systems are loaded subsequently by the application, they might cause failures
(see supertype example use-case above).  To get around that, the application
would need to change to always use the maximal type system for all loaded CASs. 
Some deserializations allow deserializing a CAS with a subset-type-system into a
CAS with a maximal type system.

2) Require users who want to have different type systems to load them using
different class loaders (for the JCas classes).   This should work for all 
cases.
==

2 questions for the user community:

A) Does the user community think this enhancement is of sufficient value, with
all of its limitations, to be worth doing?  I could go either way on this ,
personally.

B) Is the extra work to figure out a mapping for case 1 at the top (arranging
the ordering of features to attempt to preserve the fixed values for the loaded
JCas offsets) worth doing?  (If not done, it would still be "checked", and users
would know a situation arose needed them to fix).
My feeling is that this is not worth the effort for the few cases it might 
enable.

-Marshall

On 1/9/2018 4:53 PM, Marshall Schor wrote:
> I did an initial implementation, ignoring Pear files.
>
> I think the "feature expansion" when loading PEAR-classpath specified JCas
> classes can't reasonably be done (because by the time you lazily get around to
> loading these, the type system is committed).
>
> So, I plan to have the pear loading path operate like before, with no feature
> expansion.
>
> I kind of doubt this will be a real issue in actual practice (he said 
> hopefully
> :-) ).
>
> Still need to fix up some test cases, but it's looking promising...
>
> -Marshall
>
>
> On 1/8/2018 2:47 PM, Marshall Schor wrote:
>> In working out the details, the following difficulty emerges:
>>
>> In the general case, a pipeline is associated with a class loader (used to 
>> load
>> JCas classes).
>> When the pipeline contains "PEARs", each pear can specify it's own class 
>> loader,
>> and therefore, it's own set of JCas classes.
>>
>> So, at type system commit time, with this proposal, it would be necessary to
>> find all of the class loaders that Pears might be using.  This unfortunately 
>> is
>> not possible in general, because the Pears are associated with a particular
>> pipeline, and you can load a type system and create a CAS without referring 
>> to a
>> particular pipeline. 
>>
>> In the current implementation, the presence of a Pear in the pipeline is
>> discovered (if and) when the pear is entered for the first time, and at that
>> time (lazily) the loading of that Pear's JCas classes happens.
>>
>> Various limitations are possible, I suppose (e.g., not allowing a Pear 
>> version
>> of JCas class to have new features, for example).
>>
>> Still thinking about this...
>>
>> -Marshall
>>
>>
>> On 1/8/2018 10:16 AM, Marshall Schor wrote:
>>> After a lot of thought, here's a proposal, along the lines Richard suggests:
>>>
>>> The basic idea is to have the JCas classes, if they exist for some type, 
>>> augment
>>> that type with features defined only in the JCas class.
>>>
>>> This augmentation would be done 

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-09 Thread Marshall Schor
I did an initial implementation, ignoring Pear files.

I think the "feature expansion" when loading PEAR-classpath specified JCas
classes can't reasonably be done (because by the time you lazily get around to
loading these, the type system is committed).

So, I plan to have the pear loading path operate like before, with no feature
expansion.

I kind of doubt this will be a real issue in actual practice (he said hopefully
:-) ).

Still need to fix up some test cases, but it's looking promising...

-Marshall


On 1/8/2018 2:47 PM, Marshall Schor wrote:
> In working out the details, the following difficulty emerges:
>
> In the general case, a pipeline is associated with a class loader (used to 
> load
> JCas classes).
> When the pipeline contains "PEARs", each pear can specify it's own class 
> loader,
> and therefore, it's own set of JCas classes.
>
> So, at type system commit time, with this proposal, it would be necessary to
> find all of the class loaders that Pears might be using.  This unfortunately 
> is
> not possible in general, because the Pears are associated with a particular
> pipeline, and you can load a type system and create a CAS without referring 
> to a
> particular pipeline. 
>
> In the current implementation, the presence of a Pear in the pipeline is
> discovered (if and) when the pear is entered for the first time, and at that
> time (lazily) the loading of that Pear's JCas classes happens.
>
> Various limitations are possible, I suppose (e.g., not allowing a Pear version
> of JCas class to have new features, for example).
>
> Still thinking about this...
>
> -Marshall
>
>
> On 1/8/2018 10:16 AM, Marshall Schor wrote:
>> After a lot of thought, here's a proposal, along the lines Richard suggests:
>>
>> The basic idea is to have the JCas classes, if they exist for some type, 
>> augment
>> that type with features defined only in the JCas class.
>>
>> This augmentation would be done at type system commit time, and would really
>> modify the type system being committed to have the extra features.  Because 
>> the
>> type system would be modified to include these extra features, the Feature
>> Structures made with these "augmented" types would be larger (because they 
>> would
>> have slots for these features).  This insures that subtypes' features won't
>> overlap / collide with the expanded features.
>>
>> I'll work out the details, and see if I can make this change.
>>
>> -Marshall
>>
>>
>> On 1/5/2018 2:05 PM, Richard Eckart de Castilho wrote:
>>> On 05.01.2018, at 17:16, Marshall Schor  wrote:
 Based on Web Annot's use case, I'm thinking thorough alternatives.
>>> "WebAnno" ;)
>>>
 One way to support this would be to have the user code tell the UIMA 
 framework
 that no reachable instances of JCas classes exist; the user would be 
 responsible
 for guaranteeing this.
>>> There may be no way for the user code to know if this is the case or not or 
>>> to 
>>> enforce this to be the case. 
>>>
 The other choice would be to not support this (because of the inherent 
 dangers)
 and instead require users having multiple type systems with JCas classes
 specifying features only in some versions of those type systems, first 
 load the
 JCas classes with the feature-maximal versions of the types.

 I think I favor the 2nd approach, as it is much safer. 

 What do others think we should do?
>>> The current line of thinking seems to assume that:
>>>
>>> 1) a type system definition is loaded (maybe from an XML file)
>>> 2) a CAS is created using the TSD
>>> 3) the JCas classes are loaded and are initialized according to the TSD
>>>
>>> The suggestion to "first load a feature-maximal version of the types" seems
>>> to be following that line. I.e. the TSD loaded in 1) should cover all
>>> the features also covered by the JCas classes.
>>>
>>> How about a slightly different approach:
>>>
>>> 1) a type system definition is loaded (maybe from an XML file)
>>> 1a) the JCas classes are loaded and their definitions are merged with the
>>> TSD
>>> 2) a CAS is created using the merged TSD
>>> 3) the JCas classes are initialized with the now feature-maximal type system
>>>
>>> An error would/should be thrown if in step 1a the JCas classes
>>> and the TSD are inherently incompatible. 
>>>
>>> In this case, the JCas classes would be an additional source of type system
>>> information. Thinking this further, one could even initialize a CAS without
>>> providing any TSD, simply by having UIMA inspect the available JCas classes
>>> (e.g. through classpath scanning or by providing the framework with a list
>>> of classes). To complete this, the JCas classes could be enhanced with
>>> Java annotations to carry any information included in TSDs which is 
>>> currently
>>> not included in a machine-readable way in the JCas classes, e.g. type and
>>> feature description text. As such, a set of suitably annotated JCas classes
>>> could 

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-08 Thread Marshall Schor


On 1/8/2018 1:31 PM, Richard Eckart de Castilho wrote:
> On 08.01.2018, at 16:16, Marshall Schor  wrote:
>> After a lot of thought, here's a proposal, along the lines Richard suggests:
>>
>> The basic idea is to have the JCas classes, if they exist for some type, 
>> augment
>> that type with features defined only in the JCas class.
>>
>> This augmentation would be done at type system commit time, and would really
>> modify the type system being committed to have the extra features.  Because 
>> the
>> type system would be modified to include these extra features, the Feature
>> Structures made with these "augmented" types would be larger (because they 
>> would
>> have slots for these features).  This insures that subtypes' features won't
>> overlap / collide with the expanded features.
>>
>> I'll work out the details, and see if I can make this change.
> After some though, I believe the problem with the availability and ordering of
> features can be sidestepped if we consider the JCas classes as a canonical 
> source
> for type system definitions.

I'm not sure we need to say which is the canonical source and which is the
augmenting source.  In the proposal, both are used. 

Note that in both V2 and V3, JCas class definitions are optional.
In V3, the "built-in" ones are always present, and used. 

It is perfectly OK (and is often done, for cases where the type system is not
knowable at "compile time", for example, for general purpose annotators designed
to "discover" at run time the type system in use, etc.), for JCas types *not* to
exist that correspond to user-defined types.


>
> JCas classes represent a pretty strong and rigid contract on the type system 
> and
> there can only be one set of the available through a particular classloader 
> at any given time.
> XML TSDs on the other hand are comparably flexible and a dime a dozend. 
> Arbitrary
> numbers of them can be merged and used to initialize a CAS.
>
> So my suggestion would be: when using the JCas API, then JCas classes are 
> treated
> as the canonical source for the type system definition. 

I believe to make things work, both the type system definition and the JCas
definition(s) need to be used.  I'm missing what the "canonical" part does.  It
might be something that gives "priority" to two different definitions that
conflict, but the current code instead is treating that as an error which needs
to be resolved (e.g., you can't have a feature with a range of "uima.cas.String"
in the type definition, and that same feature having a range of
"uima.cas.Integer" in the JCas.) 

I think many users use a combination of JCas APIs and pure CAS APIs.  They use
the JCas APIs for common things like
annotator begin/ end, but write general purpose annotators that work with
arbitrary subtypes of these, where the type is unknown at compile time, and
therefore cannot have a custom JCas class definition.

> They define which types
> exist, which parent types they have, and what is the order of the features. If
> a user provides additional TSDs when initializing a CAS, then these are merged
> on top of the definitions sourced from the JCas classes. In this way, features
> defined in JCas classes can never be missing and they always have a defined 
> order,
> irrespective of the presence of any other TSDs. 
See my other note mentioning issues around Pears - essentially multiple class
loaders per pipeline.
> If any addition features are
> defined in TSDs, then they need to be access through the CAS API anyway. I 
> believe
> there would also be no issues with subtypes in this "JCas first" scenario.
I'm not seeing there's any difference in a "JCas first", versus "consider both"
approach.
>
> This approach would also avoid that accessing features defined in JCas but not
> defined in an XML TSD would trigger an error, since the features are defined
> via their presence in the JCas class.

I think this suggestion is the same as what I was proposing (except for calling
out one of the sources as "first"). 
I don't think it matters which is "first" - the type system description or the
JCas version. 
The proposal uses both of these, and if a feature is defined in both, it is
required to be the same.
>
> A potential downside is, that users who initialize CAS with a small XML TSD 
> but
> who have rich JCas classes on the classpath might end up with more memory 
> usage
> than they asked for - I assume that would rarely happen. 
I agree.  I mentioned that in the "proposal".
> This could be mitigated
> by only initializing JCas classes if their types are actually defined in the
> user-provided TSD at initialization time. 
Good point.  It is often true by default, because if the JCas class is not
referenced in loaded coded, it won't be loaded, if
there's no type definition corresponding to it.  This would happen in the
implementation anyway, because the code that triggers the JCas loading and
augmentation of the type system, is type-system-commit, which 

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-08 Thread Marshall Schor
In working out the details, the following difficulty emerges:

In the general case, a pipeline is associated with a class loader (used to load
JCas classes).
When the pipeline contains "PEARs", each pear can specify it's own class loader,
and therefore, it's own set of JCas classes.

So, at type system commit time, with this proposal, it would be necessary to
find all of the class loaders that Pears might be using.  This unfortunately is
not possible in general, because the Pears are associated with a particular
pipeline, and you can load a type system and create a CAS without referring to a
particular pipeline. 

In the current implementation, the presence of a Pear in the pipeline is
discovered (if and) when the pear is entered for the first time, and at that
time (lazily) the loading of that Pear's JCas classes happens.

Various limitations are possible, I suppose (e.g., not allowing a Pear version
of JCas class to have new features, for example).

Still thinking about this...

-Marshall


On 1/8/2018 10:16 AM, Marshall Schor wrote:
> After a lot of thought, here's a proposal, along the lines Richard suggests:
>
> The basic idea is to have the JCas classes, if they exist for some type, 
> augment
> that type with features defined only in the JCas class.
>
> This augmentation would be done at type system commit time, and would really
> modify the type system being committed to have the extra features.  Because 
> the
> type system would be modified to include these extra features, the Feature
> Structures made with these "augmented" types would be larger (because they 
> would
> have slots for these features).  This insures that subtypes' features won't
> overlap / collide with the expanded features.
>
> I'll work out the details, and see if I can make this change.
>
> -Marshall
>
>
> On 1/5/2018 2:05 PM, Richard Eckart de Castilho wrote:
>> On 05.01.2018, at 17:16, Marshall Schor  wrote:
>>> Based on Web Annot's use case, I'm thinking thorough alternatives.
>> "WebAnno" ;)
>>
>>> One way to support this would be to have the user code tell the UIMA 
>>> framework
>>> that no reachable instances of JCas classes exist; the user would be 
>>> responsible
>>> for guaranteeing this.
>> There may be no way for the user code to know if this is the case or not or 
>> to 
>> enforce this to be the case. 
>>
>>> The other choice would be to not support this (because of the inherent 
>>> dangers)
>>> and instead require users having multiple type systems with JCas classes
>>> specifying features only in some versions of those type systems, first load 
>>> the
>>> JCas classes with the feature-maximal versions of the types.
>>>
>>> I think I favor the 2nd approach, as it is much safer. 
>>>
>>> What do others think we should do?
>> The current line of thinking seems to assume that:
>>
>> 1) a type system definition is loaded (maybe from an XML file)
>> 2) a CAS is created using the TSD
>> 3) the JCas classes are loaded and are initialized according to the TSD
>>
>> The suggestion to "first load a feature-maximal version of the types" seems
>> to be following that line. I.e. the TSD loaded in 1) should cover all
>> the features also covered by the JCas classes.
>>
>> How about a slightly different approach:
>>
>> 1) a type system definition is loaded (maybe from an XML file)
>> 1a) the JCas classes are loaded and their definitions are merged with the
>> TSD
>> 2) a CAS is created using the merged TSD
>> 3) the JCas classes are initialized with the now feature-maximal type system
>>
>> An error would/should be thrown if in step 1a the JCas classes
>> and the TSD are inherently incompatible. 
>>
>> In this case, the JCas classes would be an additional source of type system
>> information. Thinking this further, one could even initialize a CAS without
>> providing any TSD, simply by having UIMA inspect the available JCas classes
>> (e.g. through classpath scanning or by providing the framework with a list
>> of classes). To complete this, the JCas classes could be enhanced with
>> Java annotations to carry any information included in TSDs which is currently
>> not included in a machine-readable way in the JCas classes, e.g. type and
>> feature description text. As such, a set of suitably annotated JCas classes
>> could be converted to a TSD XML and vice versa.
>>
>> The above assumes that JCas classes are loaded and initialized eagerly, but 
>> probably it could be adapted to a situation where the classes are loaded 
>> lazily.
>>
>> Cheers,
>>
>> -- Richard
>>
>>
>



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-08 Thread Richard Eckart de Castilho
On 07.01.2018, at 22:40, Marshall Schor  wrote:
> 
> I agree this could be an issue if you need to pass same-named but
> differently-defined JCas classes among these different type systems.  If this 
> is
> the case, I'd be curious about the semantics - I would guess that only the
> "common" part of the JCas class (common to all different type systems) would 
> be
> being accessed.  If that's true, I'm wondering if a better approach (but
> certainly more work) is to refactor so that this common part is a different
> common (unchanging) type?

That sounds spooky :) As far as I know, that's no territory I had ventured in 
so far.

Cheers,

-- Richard

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-08 Thread Richard Eckart de Castilho
On 08.01.2018, at 16:16, Marshall Schor  wrote:
> 
> After a lot of thought, here's a proposal, along the lines Richard suggests:
> 
> The basic idea is to have the JCas classes, if they exist for some type, 
> augment
> that type with features defined only in the JCas class.
> 
> This augmentation would be done at type system commit time, and would really
> modify the type system being committed to have the extra features.  Because 
> the
> type system would be modified to include these extra features, the Feature
> Structures made with these "augmented" types would be larger (because they 
> would
> have slots for these features).  This insures that subtypes' features won't
> overlap / collide with the expanded features.
> 
> I'll work out the details, and see if I can make this change.

After some though, I believe the problem with the availability and ordering of
features can be sidestepped if we consider the JCas classes as a canonical 
source
for type system definitions.

JCas classes represent a pretty strong and rigid contract on the type system and
the can only be one set of the available through the classloader at any given 
time.
XML TSDs on the other hand are comparably flexible and a dime a dozend. 
Arbitrary
numbers of them can be merged and used to initialize a CAS.

So my suggestion would be: when using the JCas API, then JCas classes are 
treated
as the canonical source for the type system definition. They define which types
exist, which parent types they have, and what is the order of the features. If
a user provides additional TSDs when initializing a CAS, then these are merged
on top of the definitions sourced from the JCas classes. In this way, features
defined in JCas classes can never be missing and they always have a defined 
order,
irrespective of the presence of any other TSDs. If any addition features are
defined in TSDs, then they need to be access through the CAS API anyway. I 
believe
there would also be no issues with subtypes in this "JCas first" scenario.

This approach would also avoid that accessing features defined in JCas but not
defined in an XML TSD would trigger an error, since the features are defined
via their presence in the JCas class.

A potential downside is, that users who initialize CAS with a small XML TSD but
who have rich JCas classes on the classpath might end up with more memory usage
than they asked for - I assume that would rarely happen. This could be mitigated
by only initializing JCas classes if their types are actually defined in the
user-provided TSD at initialization time. Finally, users who really do not want
to have any JCas classes affect their CASes could maybe entirely disable JCas
for a given CAS instance - I thought years ago, I had seen an option somewhere
to do that, but I don't find it at the moment.

What do you think?

Cheers,

-- Richard

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-08 Thread Marshall Schor
After a lot of thought, here's a proposal, along the lines Richard suggests:

The basic idea is to have the JCas classes, if they exist for some type, augment
that type with features defined only in the JCas class.

This augmentation would be done at type system commit time, and would really
modify the type system being committed to have the extra features.  Because the
type system would be modified to include these extra features, the Feature
Structures made with these "augmented" types would be larger (because they would
have slots for these features).  This insures that subtypes' features won't
overlap / collide with the expanded features.

I'll work out the details, and see if I can make this change.

-Marshall


On 1/5/2018 2:05 PM, Richard Eckart de Castilho wrote:
> On 05.01.2018, at 17:16, Marshall Schor  wrote:
>> Based on Web Annot's use case, I'm thinking thorough alternatives.
> "WebAnno" ;)
>
>> One way to support this would be to have the user code tell the UIMA 
>> framework
>> that no reachable instances of JCas classes exist; the user would be 
>> responsible
>> for guaranteeing this.
> There may be no way for the user code to know if this is the case or not or 
> to 
> enforce this to be the case. 
>
>> The other choice would be to not support this (because of the inherent 
>> dangers)
>> and instead require users having multiple type systems with JCas classes
>> specifying features only in some versions of those type systems, first load 
>> the
>> JCas classes with the feature-maximal versions of the types.
>>
>> I think I favor the 2nd approach, as it is much safer. 
>>
>> What do others think we should do?
> The current line of thinking seems to assume that:
>
> 1) a type system definition is loaded (maybe from an XML file)
> 2) a CAS is created using the TSD
> 3) the JCas classes are loaded and are initialized according to the TSD
>
> The suggestion to "first load a feature-maximal version of the types" seems
> to be following that line. I.e. the TSD loaded in 1) should cover all
> the features also covered by the JCas classes.
>
> How about a slightly different approach:
>
> 1) a type system definition is loaded (maybe from an XML file)
> 1a) the JCas classes are loaded and their definitions are merged with the
> TSD
> 2) a CAS is created using the merged TSD
> 3) the JCas classes are initialized with the now feature-maximal type system
>
> An error would/should be thrown if in step 1a the JCas classes
> and the TSD are inherently incompatible. 
>
> In this case, the JCas classes would be an additional source of type system
> information. Thinking this further, one could even initialize a CAS without
> providing any TSD, simply by having UIMA inspect the available JCas classes
> (e.g. through classpath scanning or by providing the framework with a list
> of classes). To complete this, the JCas classes could be enhanced with
> Java annotations to carry any information included in TSDs which is currently
> not included in a machine-readable way in the JCas classes, e.g. type and
> feature description text. As such, a set of suitably annotated JCas classes
> could be converted to a TSD XML and vice versa.
>
> The above assumes that JCas classes are loaded and initialized eagerly, but 
> probably it could be adapted to a situation where the classes are loaded 
> lazily.
>
> Cheers,
>
> -- Richard
>
>



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-07 Thread Marshall Schor
Hi,

I think that the answer depends on the meaning of the question :-)

It *is* supported if the question is:

  - step 1, 2, 3, 4 + then starting up a new pipe line in a new class loader.

It is *not* supported, (even in v2) if the question is:

  - step 1, 2, 3, 4 where the JCas classes were loaded with the first type
system description, and then,
    in the same class loader, another type system with the 2nd order is used to
create another CAS.



I thought of another use case which also breaks the assumption I was trying to
exploit in the "fix" where feature offsets get initialized to the next
sequential slot(s).

Suppose Type T is defined with slots f1, f2, at first, but a later type system
has f1, f2, f3, and f4.
Suppose Type Ts is a subtype of Type T, with slots s1, s2.

Now if type T (f1, f2) is loaded, and the JCas has f1, f2, f3, and f4,   slot f3
and f4 are assigned to the (maybe non-existent except for subtype's existence)
next 2 sequential slots.  But if that type system includes Ts, then s1 slot will
overlay f3, and s2 overlays f4.

So a further restriction to make this work is that the Types with extra JCas
slots can't have subtypes...

This use case affects the work-around proposed earlier, where you arrange the
maximal feature type system definition to be loaded first, and the JCas is
loaded against that.   If subsequently different type systems are loaded with
both fewer features for T and subtypes Ts, that same  "overlap" problem happens.

===
So, I think that with a lot of restrictions, some form of this can be made to
work, but I certainly agree that it's fragile.
===
I'm wondering how hard it would be to have the different type systems run with
different JCas class loaders.  This probably could be done by only specifying
new class loaders for the "UIMA Extension Class Loaders", which are used to load
the JCas classes and the annotators.

I agree this could be an issue if you need to pass same-named but
differently-defined JCas classes among these different type systems.  If this is
the case, I'd be curious about the semantics - I would guess that only the
"common" part of the JCas class (common to all different type systems) would be
being accessed.  If that's true, I'm wondering if a better approach (but
certainly more work) is to refactor so that this common part is a different
common (unchanging) type?

-Marshall

On 1/7/2018 2:22 PM, Richard Eckart de Castilho wrote:
> On 07.01.2018, at 06:06, Marshall Schor  wrote:
>> This only works when the different type definitions for type T keep the 
>> slots in
>> the same order, with no omissions.
>> (The JCas initialization for a new type system checks this.)  For example, 
>> valid
>> definions for type T would be
>>   T with features (none)
>>   T with features f1
>>   T with features f1, f2
>>   T with features f1, f2, f3
>>   T with features f1, f2, f3, f4
>>   T with features f1, f2, f3, f4, f5 ...
>>
>> You could not have T with features f2, f3  (skipping f1).  And the feature
>> definitions would need to be in this exact order (it would not work for T 
>> with
>> features f2, f1).
>>
>> Does this cover the case(s) in WebAnno?
> It might. But it still looks like a rather fragile construction that
> might better be avoided. What does the order of the features depend on?
> Alphanumeric sorting? Order in the JCas class / TSD XML?
>
> Let's say I do this: 
>
> - define a TSD in XML with features in the order f1, f2, f3
> - generate JCas classes from that XML
> - refactor the XML for some reason, reordering the features into f3, f2, f1
> - *not* regenerate the JCas classes (since actually the information content
>   didn't change and the order of fields in Java classes normally don't matter
>   anyway)
>
> Is such a thing supported in v3? Was it supported in v2?
>
> Cheers,
>
> -- Richard
>
>
>



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-07 Thread Richard Eckart de Castilho
On 07.01.2018, at 06:06, Marshall Schor  wrote:
> 
> This only works when the different type definitions for type T keep the slots 
> in
> the same order, with no omissions.
> (The JCas initialization for a new type system checks this.)  For example, 
> valid
> definions for type T would be
>   T with features (none)
>   T with features f1
>   T with features f1, f2
>   T with features f1, f2, f3
>   T with features f1, f2, f3, f4
>   T with features f1, f2, f3, f4, f5 ...
> 
> You could not have T with features f2, f3  (skipping f1).  And the feature
> definitions would need to be in this exact order (it would not work for T with
> features f2, f1).
> 
> Does this cover the case(s) in WebAnno?

It might. But it still looks like a rather fragile construction that
might better be avoided. What does the order of the features depend on?
Alphanumeric sorting? Order in the JCas class / TSD XML?

Let's say I do this: 

- define a TSD in XML with features in the order f1, f2, f3
- generate JCas classes from that XML
- refactor the XML for some reason, reordering the features into f3, f2, f1
- *not* regenerate the JCas classes (since actually the information content
  didn't change and the order of fields in Java classes normally don't matter
  anyway)

Is such a thing supported in v3? Was it supported in v2?

Cheers,

-- Richard




Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-06 Thread Marshall Schor
Let's not give up yet :-)

I'm thinking of an approach now which would cover some of the cases, perhaps
including WebAnno's.

It would work like this
  (follows your earlier idea that "extra" features in JCas be pre-setup
   to work if and when a type system defining those features is used
   with this JCas definition):

1) Some type system (perhaps not having all the features for a type specified)
would be created.
2) A CAS would be instantiated from this, causing the associated JCas classes to
be loaded

Let's assume those JCas classes include class T with features f1, f2, f3, and 
f4.
If the loaded type system only had feature f1 and f2, the code right now reports
that f3 and f4 aren't in the type system, but continues.  It also initializes
the feature offsets for those in the JCas class to -1, so (accidental) refs to
these throw exceptions.

Now, instead of that, let's suppose the feature offsets get initialized to the
next sequential slot(s).  Accessing these with this type system at run time
would still throw errors, as it should, because the slot arrays are only
allocated for the number of slots defined in the type.

However, a subsequent time a different type system defining f1, f2, f3 and f4
for type T is in use, would find the loaded JCas class would fit just right.

This only works when the different type definitions for type T keep the slots in
the same order, with no omissions.
(The JCas initialization for a new type system checks this.)  For example, valid
definions for type T would be
  T with features (none)
  T with features f1
  T with features f1, f2
  T with features f1, f2, f3
  T with features f1, f2, f3, f4
  T with features f1, f2, f3, f4, f5 ...

You could not have T with features f2, f3  (skipping f1).  And the feature
definitions would need to be in this exact order (it would not work for T with
features f2, f1).

Does this cover the case(s) in WebAnno?

-
Re: giving up on using JCas with WebAnno.  The JCas was always a somewhat more
"static" / "compile-time" description of types, than the non-JCas APIs, which
had generic methods which had arguments like Type(s) and Feature(s).  For
applications where the app really had no idea about the types, the JCas probably
is not a good fit. 

There's a hybrid approach - using the JCas for the "top of the type hierarchy" -
so for example, if you had subtypes of Annotation (not known at compile time for
the app), you could assign instances of those to Annotation, and then use the
getBegin() etc. APIs, while also using the non-JCas APIs to access the rest of
the features (as needed), based on what types are actually being used.  In this
case the app has no knowledge (at compile time) of the types (beyond the fact
that some of them are subtypes of Annotation).

-Marshall
 
On 1/6/2018 9:18 AM, Richard Eckart de Castilho wrote:
> On 06.01.2018, at 00:10, Marshall Schor  wrote:
>> Here's the specifics:  if the maximal-type system for type T has features f1,
>> f2, f3, f4, f5, and the JCas class defines all these features, then the load 
>> of
>> that JCas class will bind those.
>>
>> A subsequent switch to a type system with f1, f2, will work.
>>
>> But a subsequent switch to a type system with f1, f3 won't work (because the
>> offset for f3 is set to, say "3", and the length of the feature slots 
>> allocated
>> is only 2.
>>
>> To work around these, the application needs to use class loader isolation to
>> force reloading of the JCas classes.
> Hm, I am wondering if this problem is actually already present in v2 and for
> some reason I just never hit it.
>
> I am not sure if/how I could sensibly use classloader isolation in WebAnno.
> JCases are passed around all the time and operations on them happen in many
> locations in the code. And not only that - FeatureStructures extracted from
> the JCas are also passed around a lot (although always limited to a single
> web request so that the FSes do not become invalid).
>
> It would be very tricky to determine when to reload JCas classes. Reloading
> the JCas classes every time an operation of the JCas is executed probably
> would introduce a serious overhead - and it would (probably) break the
> FeatureStructures that have already been extracted from a CAS and are
> being passed around.
>
> Maybe I could try isolating those places where legacy data is loaded...
>
> I suppose, the easiest and safest would to be to give up on using JCas
> entirely in WebAnno and use only the CAS API - which might also actually
> be slower than JCas in UIMAv3. I might end up manually writing wrapper
> classes for certain annotation types that internally use the CAS API.
>
> Best,
>
> -- Richard



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-06 Thread Richard Eckart de Castilho
On 06.01.2018, at 00:10, Marshall Schor  wrote:
> 
> Here's the specifics:  if the maximal-type system for type T has features f1,
> f2, f3, f4, f5, and the JCas class defines all these features, then the load 
> of
> that JCas class will bind those.
> 
> A subsequent switch to a type system with f1, f2, will work.
> 
> But a subsequent switch to a type system with f1, f3 won't work (because the
> offset for f3 is set to, say "3", and the length of the feature slots 
> allocated
> is only 2.
> 
> To work around these, the application needs to use class loader isolation to
> force reloading of the JCas classes.

Hm, I am wondering if this problem is actually already present in v2 and for
some reason I just never hit it.

I am not sure if/how I could sensibly use classloader isolation in WebAnno.
JCases are passed around all the time and operations on them happen in many
locations in the code. And not only that - FeatureStructures extracted from
the JCas are also passed around a lot (although always limited to a single
web request so that the FSes do not become invalid).

It would be very tricky to determine when to reload JCas classes. Reloading
the JCas classes every time an operation of the JCas is executed probably
would introduce a serious overhead - and it would (probably) break the
FeatureStructures that have already been extracted from a CAS and are
being passed around.

Maybe I could try isolating those places where legacy data is loaded...

I suppose, the easiest and safest would to be to give up on using JCas
entirely in WebAnno and use only the CAS API - which might also actually
be slower than JCas in UIMAv3. I might end up manually writing wrapper
classes for certain annotation types that internally use the CAS API.

Best,

-- Richard

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-05 Thread Marshall Schor
I perhaps overstated the problem.

Here's the specifics:  if the maximal-type system for type T has features f1,
f2, f3, f4, f5, and the JCas class defines all these features, then the load of
that JCas class will bind those.

A subsequent switch to a type system with f1, f2, will work.

But a subsequent switch to a type system with f1, f3 won't work (because the
offset for f3 is set to, say "3", and the length of the feature slots allocated
is only 2.

To work around these, the application needs to use class loader isolation to
force reloading of the JCas classes.
(Or, as mentioned earlier, we could have an API to recalculate the offsets, but
that could really break things if instances of that JCas class exist (perhaps
running in other threads) expecting the other definitions.

So I think any client code doing type system switching, and using JCas classes
where the features are changing, needs to be aware and handle this (currently).

A fix along the lines suggested where the JCas classes augment the type system,
could help, but it would still be exposed to edge cases like this:

TypeSystem 1 defines type T with f1; JCas augments this with f100.

Later a typesystem is loaded which defines types f1, f2, and f100 (in that
order). The JCas for T has bound f100 to (say) slot #2, which is where this type
system has stuck "f2" - so it has the wrong binding for f100.

-Marshall


On 1/5/2018 4:25 PM, Richard Eckart de Castilho wrote:
> On 05.01.2018, at 21:29, Marshall Schor  wrote:
>> So, the only work-around I see at the moment is to use class loader 
>> isolation,
>> reloading the JCas classes each time.
> Is this something that client code would have to be aware of / to trigger?
>
> Cheers,
>
> -- Richard



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-05 Thread Richard Eckart de Castilho
On 05.01.2018, at 21:29, Marshall Schor  wrote:
> 
> So, the only work-around I see at the moment is to use class loader isolation,
> reloading the JCas classes each time.

Is this something that client code would have to be aware of / to trigger?

Cheers,

-- Richard

Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-05 Thread Marshall Schor
Hi Richard,

This sounds like an interesting idea - to have the JCas classes (perhaps
augmented with some annotations) serve as additional type definitions.

I believe that the current implementation won't work even if the maximal-feature
definition was loaded first.  This is because the offsets would be computed for
that definition, and then when a subsequent type system was loaded (with fewer
features), the feature offsets would remain the same (for the maximal Features)
but the arrays they would be indexing into would be smaller (the size of the
loaded type system).

So, the only work-around I see at the moment is to use class loader isolation,
reloading the JCas classes each time. 

I'll look more into how to design the a fix around your idea (merging the JCas
imputed type info).

Cheers. -Marshall


On 1/5/2018 2:05 PM, Richard Eckart de Castilho wrote:
> On 05.01.2018, at 17:16, Marshall Schor  wrote:
>> Based on Web Annot's use case, I'm thinking thorough alternatives.
> "WebAnno" ;)
>
>> One way to support this would be to have the user code tell the UIMA 
>> framework
>> that no reachable instances of JCas classes exist; the user would be 
>> responsible
>> for guaranteeing this.
> There may be no way for the user code to know if this is the case or not or 
> to 
> enforce this to be the case. 
>
>> The other choice would be to not support this (because of the inherent 
>> dangers)
>> and instead require users having multiple type systems with JCas classes
>> specifying features only in some versions of those type systems, first load 
>> the
>> JCas classes with the feature-maximal versions of the types.
>>
>> I think I favor the 2nd approach, as it is much safer. 
>>
>> What do others think we should do?
> The current line of thinking seems to assume that:
>
> 1) a type system definition is loaded (maybe from an XML file)
> 2) a CAS is created using the TSD
> 3) the JCas classes are loaded and are initialized according to the TSD
>
> The suggestion to "first load a feature-maximal version of the types" seems
> to be following that line. I.e. the TSD loaded in 1) should cover all
> the features also covered by the JCas classes.
>
> How about a slightly different approach:
>
> 1) a type system definition is loaded (maybe from an XML file)
> 1a) the JCas classes are loaded and their definitions are merged with the
> TSD
> 2) a CAS is created using the merged TSD
> 3) the JCas classes are initialized with the now feature-maximal type system
>
> An error would/should be thrown if in step 1a the JCas classes
> and the TSD are inherently incompatible. 
>
> In this case, the JCas classes would be an additional source of type system
> information. Thinking this further, one could even initialize a CAS without
> providing any TSD, simply by having UIMA inspect the available JCas classes
> (e.g. through classpath scanning or by providing the framework with a list
> of classes). To complete this, the JCas classes could be enhanced with
> Java annotations to carry any information included in TSDs which is currently
> not included in a machine-readable way in the JCas classes, e.g. type and
> feature description text. As such, a set of suitably annotated JCas classes
> could be converted to a TSD XML and vice versa.
>
> The above assumes that JCas classes are loaded and initialized eagerly, but 
> probably it could be adapted to a situation where the classes are loaded 
> lazily.
>
> Cheers,
>
> -- Richard
>
>



Re: Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-05 Thread Richard Eckart de Castilho
On 05.01.2018, at 17:16, Marshall Schor  wrote:
> 
> Based on Web Annot's use case, I'm thinking thorough alternatives.

"WebAnno" ;)

> One way to support this would be to have the user code tell the UIMA framework
> that no reachable instances of JCas classes exist; the user would be 
> responsible
> for guaranteeing this.

There may be no way for the user code to know if this is the case or not or to 
enforce this to be the case. 

> The other choice would be to not support this (because of the inherent 
> dangers)
> and instead require users having multiple type systems with JCas classes
> specifying features only in some versions of those type systems, first load 
> the
> JCas classes with the feature-maximal versions of the types.
> 
> I think I favor the 2nd approach, as it is much safer. 
> 
> What do others think we should do?

The current line of thinking seems to assume that:

1) a type system definition is loaded (maybe from an XML file)
2) a CAS is created using the TSD
3) the JCas classes are loaded and are initialized according to the TSD

The suggestion to "first load a feature-maximal version of the types" seems
to be following that line. I.e. the TSD loaded in 1) should cover all
the features also covered by the JCas classes.

How about a slightly different approach:

1) a type system definition is loaded (maybe from an XML file)
1a) the JCas classes are loaded and their definitions are merged with the
TSD
2) a CAS is created using the merged TSD
3) the JCas classes are initialized with the now feature-maximal type system

An error would/should be thrown if in step 1a the JCas classes
and the TSD are inherently incompatible. 

In this case, the JCas classes would be an additional source of type system
information. Thinking this further, one could even initialize a CAS without
providing any TSD, simply by having UIMA inspect the available JCas classes
(e.g. through classpath scanning or by providing the framework with a list
of classes). To complete this, the JCas classes could be enhanced with
Java annotations to carry any information included in TSDs which is currently
not included in a machine-readable way in the JCas classes, e.g. type and
feature description text. As such, a set of suitably annotated JCas classes
could be converted to a TSD XML and vice versa.

The above assumes that JCas classes are loaded and initialized eagerly, but 
probably it could be adapted to a situation where the classes are loaded lazily.

Cheers,

-- Richard



Re: UIMAv3 & WebAnno

2018-01-05 Thread Richard Eckart de Castilho
Hi,

> On 04.01.2018, at 22:18, Marshall Schor  wrote:
> 
> Hi Richard,
> 
> Here's one idea:  Since I thought this had been fixed a while ago, and you
> seemed (previously) to get beyond this point, I'm wondering if the build you 
> dd
> for "trunk" somehow got mixed up levels.  I see the build is from
> 3.0.1-beta-SNAPSHOT v3 branch - but I'm guessing that's some local folder you
> have ( didn't see it in
> https://svn.apache.org/repos/asf/uima/uv3/uimaj-v3/branches)?

I built from https://svn.apache.org/repos/asf/uima/uv3/uimaj-v3/trunk
which has the version "3.0.1-beta-SNAPSHOT" in the pom.xml.

> I'm going to try to set up a test case to see if I can reproduce this.  What 
> I'm
> planning to do is to have two type systems, T1, and T2, where T1 has a type 
> with
> no features, and T2 has the same type with a feature.
> 
> I'll make a JCas class which has the type defined with the feature.
> 
> Then I'll create a CAS with T1, and confirm the JCas class loads and has the
> feature offset for the feature set to -1 (to cause a runtime exception if
> referenced).
> 
> Then I'll create a CAS with T2.  If this test case matches what's happening 
> for
> you, that should trigger the exception you see.

That sounds like it should be able to reproduce the problem.

I could imagine this issue to appear in scenarios where JCas classes exist
and CASes with (slightly) different type systems are deserialized from disk
and the type system of the in-memory CAS is reinitialized from the file
stored on disk - so not necessarily a problem limited to WebAnno.

From my perspective as a UIMA v2 user, the JCas classes are a convenience that
allows for type-safe access to the CAS. But at least in v2, it seems to be 
absolutely possible to use JCas classes even with CASes that have been 
initialized with a slightly different type system, i.e. with more or less
features than the JCas class actually offers. In case there are more features,
I can always access them through the CAS API. In case there are less features,
then the getters/setters in the JCas class for these features would fail - but
only when I actually try to call them. I think at some point, UIMAv2 started to 
log
warnings if a JCas class had getters/setters for features that were not actually
present in the type system...

> JCas Type "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token" 
> implements getters and setters for feature "morph", but the type system 
> doesn't define that feature.


... but other than that, everything still worked fine since I didn't actually 
access
these features - and if they would be accessed, they would also have been added
to the type system in the respective CAS before.

So to summarize - in the UIMAv2 environment that WebAnno creates, we have:

- one set of JCas classes that usually partially overlaps with the 
types/features
  with which the CASes are initialized
- at any time, any number of CAS instances, potentially each with a different
  type system may exist in memory
- a particular CAS instance is (usually) accessed only from a single thread
  (this might change soon as asynchronous events are being introduced)
- for some types/features, the JCas classes are used for access, for other
  types/features the CAS API is used
- when a CAS is passed around, it is usually passed around as a JCas object
  and jcas.getCas() is called when the CAS API should be used

Now, on the other hand, one might argue that this "wild" mixing of JCas classes
with type systems deviating from the one from which the JCas classes were 
originally
created is a bad practice. Personally, I found it convenient because at least 
for
some types/features, I could use a convenient Java-like type safe access. The
alternative would be to completely stop using JCas (at least in WebAnno) and 
work
only via the CAS API.

I could try to do a workaround that creates a CAS at application startup with 
the
type system from which the JCas classes were built in order to initialize the 
JCas
class registry. But no idea if that would fix the issue / whether I would have 
to
do that on every thread that is spawned or if it would be sufficient to do it 
once
on any arbitrary thread... it doesn't sound like a particularly attractive 
solution
though.

Cheers,

-- Richard



Design choices for changing type systems with loaded JCas classes [was Re: UIMAv3 & WebAnno}

2018-01-05 Thread Marshall Schor
Based on Web Annot's use case, I'm thinking thorough alternatives.

The issue is a sequence related one, with type systems changing:

1) a type system is created, defining type T, with no features
2) a CAS is produced with that type system, causing JCas classes to be looked-up
and loaded

At that time, the JCas class is set up with feature offset constants,
corresponding to how the features are laid out in the type system.
Type T has 0 features, but the JCas type for T might define some.  This is
allowed, but the feature offsets in the JCas class are set to -1, which will
cause a runtime exception if the feature is used (e.g., someone does
myJCasType.getMyFeature() call).

So far - all OK (provided the non-existent features are not accessed, of 
course).

3) a different type system is created, defining type T with feature f1
4) a CAS is produced with that type system.

Currently this throws an error when the JCas classes are checked for conformance
with the type system. 

A mechanism in the JCas already allows the JCas classes to be "updated" for
feature changes.
However, once the JCas class is loaded with a valid type system, no updating is
allowed (for reasons see below).

The consequence of this is that a user wanting to use one set of JCas classes
with multiple different type systems must use the maximal-feature version of the
type systems (which could be created using a type system merge operation, for
example), as the *first* type system that causes the JCas classes to be loaded.

We could modify the JCas setup to allow updating an already loaded/initialized
JCas class for a new type system with different features present. But it seems
that this has some difficulties; it would only be a valid transformation if no
instances of the JCas class exist.  There's currently no way to tell if this is
true.  In the general case, the JCas classes could be being used for multiple
independent pipelines, running in multiple threads; so even if one CAS was
"reset", others might not be; and furthermore, POJO code could be holding on to
some JCas instances.

One way to support this would be to have the user code tell the UIMA framework
that no reachable instances of JCas classes exist; the user would be responsible
for guaranteeing this.

The other choice would be to not support this (because of the inherent dangers)
and instead require users having multiple type systems with JCas classes
specifying features only in some versions of those type systems, first load the
JCas classes with the feature-maximal versions of the types.

I think I favor the 2nd approach, as it is much safer. 

What do others think we should do?

-Marshall


Re: UIMAv3 & WebAnno

2018-01-05 Thread Marshall Schor
I think I may have found the issue.

Working on a proper fix...

-Marshall

On 1/3/2018 6:16 PM, Richard Eckart de Castilho wrote:
> Hi again,
>
> I have once again switched my local environment to a UIMA v3 mode:
>
> - UIMA SDK v3 (3.0.1-beta-SNAPSHOT v3 branch)
> - uimaFIT (3.0.0-SNAPSHOT v3 branch)
> - DKPro Core (2.0.x branch)
> - WebAnno (feature/issue1115-uimav3 branch)
>
> Last time, I ran into trouble because the IDs loaded from serialized CAS 
> files were no longer accessible.
> I programmatically set "uima.default_v2_id_references" to "true" during 
> startup now to avoid that.
>
>
> But what seems to be happening even before getting there is that I run again 
> in JCas <-> Type System problems.
> When a user opens a document for annotation in WebAnno, WebAnno loads the 
> serialized CAS (CasCompleteSerializer),
> serializes the CAS into a byte array (compressed form 6), creates a new CAS 
> with the current type system definition,
> and deserializes the data again into that CAS. The idea is that the lenient 
> loading of the compressed form 6 allows
>
>   a) new types / features to be added in that way
>   b) unreachable FSes to be garbage collected
>
> So, it is not an uncommon case here that the data stored with the 
> CasCompleteSerializer used a different type system than the CAs into which it 
> is loaded - and in fact it can be the case that the data stored with the 
> CasCompleteSerializer had used different JCas wrappers at the time then what 
> is available at the time of loading
> the data again. Afaik in there should be no truely incompatible changes in 
> the type system though - i.e. only new features / types were added; no 
> features were removed. Still, I get a lot of this type of error:
>
>> org.apache.uima.cas.CASRuntimeException: The JCas cannot be initialized.  
>> The following errors occurred: 
>> In JCAS class 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures",
>>  UIMA field 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures:verbForm"
>>  was set up when this class was previously loaded and initialized, to have 
>> an adjusted offset of "-1" but now the feature has a different adjusted 
>> offset of "5"; this may be due to something else other than type system 
>> commit actions loading and initializing the JCas class, or to having a 
>> different non-compatible type system for this class, trying to use a common 
>> JCas cover class, which is not supported. 
>>  
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.reportErrors(FSClassRegistry.java:870)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.loadJCasForTSandClassLoader(FSClassRegistry.java:342)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.getGeneratorsForClassLoader(FSClassRegistry.java:904)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.getGeneratorsForClassLoader(TypeSystemImpl.java:2651)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.commit(TypeSystemImpl.java:1393) 
>> ~[classes/:?]
>>  at org.apache.uima.cas.impl.CASImpl.commitTypeSystem(CASImpl.java:1607) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.doCreateCas(CasCreationUtils.java:614) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:362) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:313) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.fit.factory.JCasFactory.createJCas(JCasFactory.java:147) 
>> ~[classes/:?]
>>  at 
>> de.tudarmstadt.ukp.clarin.webanno.api.dao.AnnotationSchemaServiceImpl.upgradeCas(AnnotationSchemaServiceImpl.java:640)
>>  ~[classes/:?]
> I have the feeling that this is what happens:
>
> 1) a CasCompleteSerialized-CAS is loaded - it was created at a time when the 
> MorphologicalFeatures did not yet have a feature called "verbForm".
> 2) I create a new JCas, now using a type system description where 
> MorphologicalFeatures includes the "verbForm" feature
>
> At step 2, the above error seems to be triggered. I actually do not even get 
> to the point where I would temporarily serialize into form 6 and back. The 
> code already crashes when trying to set up the target task with the updated 
> type system.
>
> Any ideas?
>
> Cheers,
>
> -- Richard



Re: UIMAv3 & WebAnno

2018-01-04 Thread Marshall Schor
Hi Richard,

Here's one idea:  Since I thought this had been fixed a while ago, and you
seemed (previously) to get beyond this point, I'm wondering if the build you did
for "trunk" somehow got mixed up levels.  I see the build is from
3.0.1-beta-SNAPSHOT v3 branch - but I'm guessing that's some local folder you
have ( didn't see it in
https://svn.apache.org/repos/asf/uima/uv3/uimaj-v3/branches)?

Also, the technique for garbage collecting by using compressed serialization
needs a fixup - if you have set the global mode to the v2Refs, then the
serializations include the non-reachables. You can easily work around this by
surrounding the serialize code with a try-with-resources to turn-off the v2 
mode.

I'm going to try to set up a test case to see if I can reproduce this.  What I'm
planning to do is to have two type systems, T1, and T2, where T1 has a type with
no features, and T2 has the same type with a feature.

I'll make a JCas class which has the type defined with the feature.

Then I'll create a CAS with T1, and confirm the JCas class loads and has the
feature offset for the feature set to -1 (to cause a runtime exception if
referenced).

Then I'll create a CAS with T2.  If this test case matches what's happening for
you, that should trigger the exception you see. 

If I've misunderstood what's going on and you think this is not the right test,
please let me know :-).

-Marshall


On 1/3/2018 6:16 PM, Richard Eckart de Castilho wrote:
> Hi again,
>
> I have once again switched my local environment to a UIMA v3 mode:
>
> - UIMA SDK v3 (3.0.1-beta-SNAPSHOT v3 branch)
> - uimaFIT (3.0.0-SNAPSHOT v3 branch)
> - DKPro Core (2.0.x branch)
> - WebAnno (feature/issue1115-uimav3 branch)
>
> Last time, I ran into trouble because the IDs loaded from serialized CAS 
> files were no longer accessible.
> I programmatically set "uima.default_v2_id_references" to "true" during 
> startup now to avoid that.
>
>
> But what seems to be happening even before getting there is that I run again 
> in JCas <-> Type System problems.
> When a user opens a document for annotation in WebAnno, WebAnno loads the 
> serialized CAS (CasCompleteSerializer),
> serializes the CAS into a byte array (compressed form 6), creates a new CAS 
> with the current type system definition,
> and deserializes the data again into that CAS. The idea is that the lenient 
> loading of the compressed form 6 allows
>
>   a) new types / features to be added in that way
>   b) unreachable FSes to be garbage collected
>
> So, it is not an uncommon case here that the data stored with the 
> CasCompleteSerializer used a different type system than the CAs into which it 
> is loaded - and in fact it can be the case that the data stored with the 
> CasCompleteSerializer had used different JCas wrappers at the time then what 
> is available at the time of loading
> the data again. Afaik in there should be no truely incompatible changes in 
> the type system though - i.e. only new features / types were added; no 
> features were removed. Still, I get a lot of this type of error:
>
>> org.apache.uima.cas.CASRuntimeException: The JCas cannot be initialized.  
>> The following errors occurred: 
>> In JCAS class 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures",
>>  UIMA field 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures:verbForm"
>>  was set up when this class was previously loaded and initialized, to have 
>> an adjusted offset of "-1" but now the feature has a different adjusted 
>> offset of "5"; this may be due to something else other than type system 
>> commit actions loading and initializing the JCas class, or to having a 
>> different non-compatible type system for this class, trying to use a common 
>> JCas cover class, which is not supported. 
>>  
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.reportErrors(FSClassRegistry.java:870)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.loadJCasForTSandClassLoader(FSClassRegistry.java:342)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.getGeneratorsForClassLoader(FSClassRegistry.java:904)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.getGeneratorsForClassLoader(TypeSystemImpl.java:2651)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.commit(TypeSystemImpl.java:1393) 
>> ~[classes/:?]
>>  at org.apache.uima.cas.impl.CASImpl.commitTypeSystem(CASImpl.java:1607) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.doCreateCas(CasCreationUtils.java:614) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:362) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:313) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.fit.factory.JCasFactory.createJCas(JCasFactory.java:147) 
>> ~[classes/:?]
>>  at 
>> 

Re: UIMAv3 & WebAnno

2018-01-04 Thread Marshall Schor
Thanks for your testing!  I'm looking into this, more later.
-Marshall

On 1/3/2018 6:16 PM, Richard Eckart de Castilho wrote:
> Hi again,
>
> I have once again switched my local environment to a UIMA v3 mode:
>
> - UIMA SDK v3 (3.0.1-beta-SNAPSHOT v3 branch)
> - uimaFIT (3.0.0-SNAPSHOT v3 branch)
> - DKPro Core (2.0.x branch)
> - WebAnno (feature/issue1115-uimav3 branch)
>
> Last time, I ran into trouble because the IDs loaded from serialized CAS 
> files were no longer accessible.
> I programmatically set "uima.default_v2_id_references" to "true" during 
> startup now to avoid that.
>
>
> But what seems to be happening even before getting there is that I run again 
> in JCas <-> Type System problems.
> When a user opens a document for annotation in WebAnno, WebAnno loads the 
> serialized CAS (CasCompleteSerializer),
> serializes the CAS into a byte array (compressed form 6), creates a new CAS 
> with the current type system definition,
> and deserializes the data again into that CAS. The idea is that the lenient 
> loading of the compressed form 6 allows
>
>   a) new types / features to be added in that way
>   b) unreachable FSes to be garbage collected
>
> So, it is not an uncommon case here that the data stored with the 
> CasCompleteSerializer used a different type system than the CAs into which it 
> is loaded - and in fact it can be the case that the data stored with the 
> CasCompleteSerializer had used different JCas wrappers at the time then what 
> is available at the time of loading
> the data again. Afaik in there should be no truely incompatible changes in 
> the type system though - i.e. only new features / types were added; no 
> features were removed. Still, I get a lot of this type of error:
>
>> org.apache.uima.cas.CASRuntimeException: The JCas cannot be initialized.  
>> The following errors occurred: 
>> In JCAS class 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures",
>>  UIMA field 
>> "de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures:verbForm"
>>  was set up when this class was previously loaded and initialized, to have 
>> an adjusted offset of "-1" but now the feature has a different adjusted 
>> offset of "5"; this may be due to something else other than type system 
>> commit actions loading and initializing the JCas class, or to having a 
>> different non-compatible type system for this class, trying to use a common 
>> JCas cover class, which is not supported. 
>>  
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.reportErrors(FSClassRegistry.java:870)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.loadJCasForTSandClassLoader(FSClassRegistry.java:342)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.FSClassRegistry.getGeneratorsForClassLoader(FSClassRegistry.java:904)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.getGeneratorsForClassLoader(TypeSystemImpl.java:2651)
>>  ~[classes/:?]
>>  at 
>> org.apache.uima.cas.impl.TypeSystemImpl.commit(TypeSystemImpl.java:1393) 
>> ~[classes/:?]
>>  at org.apache.uima.cas.impl.CASImpl.commitTypeSystem(CASImpl.java:1607) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.doCreateCas(CasCreationUtils.java:614) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:362) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.util.CasCreationUtils.createCas(CasCreationUtils.java:313) 
>> ~[classes/:?]
>>  at 
>> org.apache.uima.fit.factory.JCasFactory.createJCas(JCasFactory.java:147) 
>> ~[classes/:?]
>>  at 
>> de.tudarmstadt.ukp.clarin.webanno.api.dao.AnnotationSchemaServiceImpl.upgradeCas(AnnotationSchemaServiceImpl.java:640)
>>  ~[classes/:?]
> I have the feeling that this is what happens:
>
> 1) a CasCompleteSerialized-CAS is loaded - it was created at a time when the 
> MorphologicalFeatures did not yet have a feature called "verbForm".
> 2) I create a new JCas, now using a type system description where 
> MorphologicalFeatures includes the "verbForm" feature
>
> At step 2, the above error seems to be triggered. I actually do not even get 
> to the point where I would temporarily serialize into form 6 and back. The 
> code already crashes when trying to set up the target task with the updated 
> type system.
>
> Any ideas?
>
> Cheers,
>
> -- Richard