Re: [VOTE] Retire UIMA C++ SDK
Hi Pablo, > On 8. Dec 2022, at 00:15, Pablo Duboue wrote: > > Python has attracted most of the newcomers mind share in NLP. UIMA C++ can > get us in the Python game and it is a great way to bring back stand-off > annotations into NLP, something we have lost with newer toolkits. > > If possible, I'd like to try Eddie's task list and if I can get it to work, > step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of work, > I ask for a month time, then I'll come back and report. Cool! If you need anything, let me know! Happy to help :) As I said, I have not followed C++ development very closely, but I believe that there are new tools these days like cmake and also easier ways of integrating with Java like JavaCPP or maybe JNA. Over the years, the people I had talked with about UIMA C++, the feedback way generally that it was quite a rocky road. If we want to get into the Python game where people are used to simple "pip install" stuff, the road probably needs a good paving. Cheers, -- Richard
Re: [VOTE] Retire UIMA C++ SDK
It would be useful to understand what roles uimacpp is still needed. Historically uimacpp code predated the existence of uimaj. See https://web.archive.org/web/20060312040720id_/http://researchweb.watson.ibm.com/journal/sj/433/gotz.pdf Uimacpp evolved along with uimaj to include new functionality such as multiple CAS views, xmi and binary CAS serialization, and UIMA-AS service interfaces. But uimacpp never supported CAS multipliers and development basically stopped around 2010. Improvements in CAS indexing and the newer compressed binary formats were never supported. Uimacpp support of UIMA-AS services was the most useful to us because the larger native C/C++ analytics would simply not run correctly thru the JNI; perhaps those JNI problems are fixed in newer Java releases. Python, tcl and perl support are all based on the swig interface work originally done by Jeff Sorensen. That python interface is C-like rather than python-like. Back in 2008 Edward Loper implemented a python native CAS and xmi serialization, but after a fair amount of work it still had problems deserializing large and complex XmiCas files. As far as I know, that code was never donated or made public. What functionality is needed now ... Just a standalone uimacpp driver? Just uimacpp thru the JNI which would enable use of uimacpp in any scenario where uimaj is used? Just the native uimacpp service wrapper compatible with UIMA-AS? Eddie On Thu, Dec 8, 2022 at 3:20 AM Richard Eckart de Castilho wrote: > Hi Pablo, > > > On 8. Dec 2022, at 00:15, Pablo Duboue wrote: > > > > Python has attracted most of the newcomers mind share in NLP. UIMA C++ > can > > get us in the Python game and it is a great way to bring back stand-off > > annotations into NLP, something we have lost with newer toolkits. > > > > If possible, I'd like to try Eddie's task list and if I can get it to > work, > > step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of > work, > > I ask for a month time, then I'll come back and report. > > Cool! If you need anything, let me know! Happy to help :) > > As I said, I have not followed C++ development very closely, but I believe > that there are new tools these days like cmake and also easier ways of > integrating with Java like JavaCPP or maybe JNA. Over the years, the people > I had talked with about UIMA C++, the feedback way generally that it was > quite a rocky road. If we want to get into the Python game where people > are used to simple "pip install" stuff, the road probably needs a good > paving. > > Cheers, > > -- Richard
Re: [VOTE] Retire UIMA C++ SDK
PS ... Hi Pablo ... very generous of you to volunteer to help! Best, Eddie On Thu, Dec 8, 2022 at 1:37 PM Eddie Epstein wrote: > It would be useful to understand what roles uimacpp is still needed. > Historically uimacpp code predated the existence of uimaj. > See > https://web.archive.org/web/20060312040720id_/http://researchweb.watson.ibm.com/journal/sj/433/gotz.pdf > > Uimacpp evolved along with uimaj to include new functionality such as > multiple CAS views, xmi and binary CAS serialization, and UIMA-AS service > interfaces. But uimacpp never supported CAS multipliers and development > basically stopped around 2010. Improvements in CAS indexing and the newer > compressed binary formats were never supported. Uimacpp support of UIMA-AS > services was the most useful to us because the larger native C/C++ > analytics would simply not run correctly thru the JNI; perhaps those JNI > problems are fixed in newer Java releases. > > Python, tcl and perl support are all based on the swig interface work > originally done by Jeff Sorensen. That python interface is C-like rather > than python-like. Back in 2008 Edward Loper implemented a python native CAS > and xmi serialization, but after a fair amount of work it still had > problems deserializing large and complex XmiCas files. As far as I know, > that code was never donated or made public. > > What functionality is needed now ... Just a standalone uimacpp driver? > Just uimacpp thru the JNI which would enable use of uimacpp in any scenario > where uimaj is used? Just the native uimacpp service wrapper compatible > with UIMA-AS? > > Eddie > > > On Thu, Dec 8, 2022 at 3:20 AM Richard Eckart de Castilho > wrote: > >> Hi Pablo, >> >> > On 8. Dec 2022, at 00:15, Pablo Duboue wrote: >> > >> > Python has attracted most of the newcomers mind share in NLP. UIMA C++ >> can >> > get us in the Python game and it is a great way to bring back stand-off >> > annotations into NLP, something we have lost with newer toolkits. >> > >> > If possible, I'd like to try Eddie's task list and if I can get it to >> work, >> > step in as a maintainer for UIMA C++. If it takes Eddie 1-2 weeks of >> work, >> > I ask for a month time, then I'll come back and report. >> >> Cool! If you need anything, let me know! Happy to help :) >> >> As I said, I have not followed C++ development very closely, but I believe >> that there are new tools these days like cmake and also easier ways of >> integrating with Java like JavaCPP or maybe JNA. Over the years, the >> people >> I had talked with about UIMA C++, the feedback way generally that it was >> quite a rocky road. If we want to get into the Python game where people >> are used to simple "pip install" stuff, the road probably needs a good >> paving. >> >> Cheers, >> >> -- Richard > >
Re: [VOTE] Retire UIMA C++ SDK
On 8. Dec 2022, at 19:37, Eddie Epstein wrote: > > Python, tcl and perl support are all based on the swig interface work > originally done by Jeff Sorensen. That python interface is C-like rather > than python-like. Back in 2008 Edward Loper implemented a python native CAS > and xmi serialization, but after a fair amount of work it still had > problems deserializing large and complex XmiCas files. As far as I know, > that code was never donated or made public. Because Edward Loper's code was not available, eventually DKPro Cassis was built in 2018 by Jan-Christoph Klie and myself. It is available under the Apache License 2.0. https://github.com/dkpro/dkpro-cassis I believe that DKPro Cassis has full support for creating, deserializing, manipulating and serializing CAS XMI files by now. It also has support for the new (experimental) UIMA CAS JSON format. https://github.com/apache/uima-uimaj-io-jsoncas If anybody finds any problems with DKPro Cassis, please open an issue in its tracker at GitHub. As far as I know the DKPro Cassis has found some friends in the UIMA community that are actively using it. Cheers, -- Richard