Re: [VOTE] Retire UIMA C++ SDK

2022-12-08 Thread Richard Eckart de Castilho
Hi Pablo,

> On 8. Dec 2022, at 00:15, Pablo Duboue  wrote:
> 
> Python has attracted most of the newcomers mind share in NLP. UIMA C++ can
> get us in the Python game and it is a great way to bring back stand-off
> annotations into NLP, something we have lost with newer toolkits.
> 
> If possible, I'd like to try Eddie's task list and if I can get it to work,
> step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of work,
> I ask for a month time, then I'll come back and report.

Cool! If you need anything, let me know! Happy to help :)

As I said, I have not followed C++ development very closely, but I believe
that there are new tools these days like cmake and also easier ways of 
integrating with Java like JavaCPP or maybe JNA. Over the years, the people
I had talked with about UIMA C++, the feedback way generally that it was
quite a rocky road. If we want to get into the Python game where people
are used to simple "pip install" stuff, the road probably needs a good paving.

Cheers,

-- Richard

Re: [VOTE] Retire UIMA C++ SDK

2022-12-08 Thread Eddie Epstein
It would be useful to understand what roles uimacpp is still needed.
Historically uimacpp code predated the existence of uimaj.
See
https://web.archive.org/web/20060312040720id_/http://researchweb.watson.ibm.com/journal/sj/433/gotz.pdf

Uimacpp evolved along with uimaj to include new functionality such as
multiple CAS views, xmi and binary CAS serialization, and UIMA-AS service
interfaces. But uimacpp never supported CAS multipliers and development
basically stopped around 2010. Improvements in CAS indexing and the newer
compressed binary formats were never supported.  Uimacpp support of UIMA-AS
services was the most useful to us because the larger native C/C++
analytics would simply not run correctly thru the JNI; perhaps those JNI
problems are fixed in newer Java releases.

Python, tcl and perl support are all based on the swig interface work
originally done by Jeff Sorensen. That python interface is C-like rather
than python-like. Back in 2008 Edward Loper implemented a python native CAS
and xmi serialization, but after a fair amount of work it still had
problems deserializing large and complex XmiCas files. As far as I know,
that code was never donated or made public.

What functionality is needed now ... Just a standalone uimacpp driver? Just
uimacpp thru the JNI which would enable use of uimacpp in any scenario
where uimaj is used? Just the native uimacpp service wrapper compatible
with UIMA-AS?

Eddie


On Thu, Dec 8, 2022 at 3:20 AM Richard Eckart de Castilho 
wrote:

> Hi Pablo,
>
> > On 8. Dec 2022, at 00:15, Pablo Duboue  wrote:
> >
> > Python has attracted most of the newcomers mind share in NLP. UIMA C++
> can
> > get us in the Python game and it is a great way to bring back stand-off
> > annotations into NLP, something we have lost with newer toolkits.
> >
> > If possible, I'd like to try Eddie's task list and if I can get it to
> work,
> > step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of
> work,
> > I ask for a month time, then I'll come back and report.
>
> Cool! If you need anything, let me know! Happy to help :)
>
> As I said, I have not followed C++ development very closely, but I believe
> that there are new tools these days like cmake and also easier ways of
> integrating with Java like JavaCPP or maybe JNA. Over the years, the people
> I had talked with about UIMA C++, the feedback way generally that it was
> quite a rocky road. If we want to get into the Python game where people
> are used to simple "pip install" stuff, the road probably needs a good
> paving.
>
> Cheers,
>
> -- Richard


Re: [VOTE] Retire UIMA C++ SDK

2022-12-08 Thread Eddie Epstein
PS ... Hi Pablo ... very generous of you to volunteer to help!
Best,
Eddie

On Thu, Dec 8, 2022 at 1:37 PM Eddie Epstein  wrote:

> It would be useful to understand what roles uimacpp is still needed.
> Historically uimacpp code predated the existence of uimaj.
> See
> https://web.archive.org/web/20060312040720id_/http://researchweb.watson.ibm.com/journal/sj/433/gotz.pdf
>
> Uimacpp evolved along with uimaj to include new functionality such as
> multiple CAS views, xmi and binary CAS serialization, and UIMA-AS service
> interfaces. But uimacpp never supported CAS multipliers and development
> basically stopped around 2010. Improvements in CAS indexing and the newer
> compressed binary formats were never supported.  Uimacpp support of UIMA-AS
> services was the most useful to us because the larger native C/C++
> analytics would simply not run correctly thru the JNI; perhaps those JNI
> problems are fixed in newer Java releases.
>
> Python, tcl and perl support are all based on the swig interface work
> originally done by Jeff Sorensen. That python interface is C-like rather
> than python-like. Back in 2008 Edward Loper implemented a python native CAS
> and xmi serialization, but after a fair amount of work it still had
> problems deserializing large and complex XmiCas files. As far as I know,
> that code was never donated or made public.
>
> What functionality is needed now ... Just a standalone uimacpp driver?
> Just uimacpp thru the JNI which would enable use of uimacpp in any scenario
> where uimaj is used? Just the native uimacpp service wrapper compatible
> with UIMA-AS?
>
> Eddie
>
>
> On Thu, Dec 8, 2022 at 3:20 AM Richard Eckart de Castilho 
> wrote:
>
>> Hi Pablo,
>>
>> > On 8. Dec 2022, at 00:15, Pablo Duboue  wrote:
>> >
>> > Python has attracted most of the newcomers mind share in NLP. UIMA C++
>> can
>> > get us in the Python game and it is a great way to bring back stand-off
>> > annotations into NLP, something we have lost with newer toolkits.
>> >
>> > If possible, I'd like to try Eddie's task list and if I can get it to
>> work,
>> > step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of
>> work,
>> > I ask for a month time, then I'll come back and report.
>>
>> Cool! If you need anything, let me know! Happy to help :)
>>
>> As I said, I have not followed C++ development very closely, but I believe
>> that there are new tools these days like cmake and also easier ways of
>> integrating with Java like JavaCPP or maybe JNA. Over the years, the
>> people
>> I had talked with about UIMA C++, the feedback way generally that it was
>> quite a rocky road. If we want to get into the Python game where people
>> are used to simple "pip install" stuff, the road probably needs a good
>> paving.
>>
>> Cheers,
>>
>> -- Richard
>
>


Re: [VOTE] Retire UIMA C++ SDK

2022-12-08 Thread Richard Eckart de Castilho
On 8. Dec 2022, at 19:37, Eddie Epstein  wrote:
> 
> Python, tcl and perl support are all based on the swig interface work
> originally done by Jeff Sorensen. That python interface is C-like rather
> than python-like. Back in 2008 Edward Loper implemented a python native CAS
> and xmi serialization, but after a fair amount of work it still had
> problems deserializing large and complex XmiCas files. As far as I know,
> that code was never donated or made public.

Because Edward Loper's code was not available, eventually DKPro Cassis
was built in 2018 by Jan-Christoph Klie and myself. It is available
under the Apache License 2.0.

  https://github.com/dkpro/dkpro-cassis

I believe that DKPro Cassis has full support for creating, deserializing,
manipulating and serializing CAS XMI files by now. It also has support for
the new (experimental) UIMA CAS JSON format.

  https://github.com/apache/uima-uimaj-io-jsoncas

If anybody finds any problems with DKPro Cassis, please open an issue
in its tracker at GitHub.

As far as I know the DKPro Cassis has found some friends in the UIMA
community that are actively using it.

Cheers,

-- Richard