Bringing ClearTK to UIMAv3?
Hi folks, I have set up a branch of ClearTK building against UIMA v3.3.0. It compiles. Some tests seem to fail when building with tests from the command line. When I ran at least one of the tests in Eclipse, it worked there. I didn't investigate. Is this of interesting to somebody? Would somebody maybe like to help finishing this upgrade and getting the tests to work? https://github.com/ClearTK/cleartk/pull/443 Cheers, -- Richard
Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
Hi all, > On 6. Jun 2022, at 16:09, Finan, Sean > wrote: > > Hi Kean, > > Thank you for the suggestion and the link. I am really glad that people are > interested in this guithub topic and taking it seriously. It would be great > if we could make it happen. > > While definitely a possibility, the git LFS paradigm is something that I > would like to avoid. > > Like keeping our models on SVN, it would also require separating models from > code into two different repos, e.g. github and bitbucket. As opposed to > bitbucket, the apache svn repos are long established, familiar to and > supported by the apache infrastructure team. The same goes for the apache > foundation use of github. I like being able to lean on the apache infra team > for help. So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's GitHub plan allows us to use this and if so if there is a volume limit. Would have to ask INFRA about that. The use of Git and GitHub is well supported by the INFRA team. For example, there is self-service for creating and managing repos. [2] There is also the `.asf.yaml` mechanism for configuring GitHub repos and hooking them up with the ASF infrastructure including mailing lists, website publishing, etc. etc. [3] > The apache Jenkins servers are linked to the svn repos, making continuous > integration easy - on the rare occasion when somebody does change something > in a model repo. While I expect anybody savvy enough to work on models to > also have the knowhow and wherewithal to work with a separate svn repo, I > don't want them to need to get out to jenkins and manually kick off snapshot > builds. Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them up even gives us support pull requests [7]. I'm happy to help you setting that up for cTAKES as well. > Probably most important is the requirement of the client user to have the LFS > command line client. I think that there are enough hoops stuck in front of > getting ctakes installed/checked out/cloned/etc. and it seems to me that one > of the biggest reasons to use github is to make things easier for absolute > newbies to just pull down code and experiment. It is an additional hoop to jump through indeed, but it is a one-time action to install LFS. Chances are that people may even already have it set up because they use it in other repos. > Keeping the models on a separate svn repo would mean that they aren't checked > out as code, but would be put in the .m2 maven area when a user runs maven > compile. While the total footprint of full ctakes would still be the same > size, it would essentially make the code directory smaller and initial > downloads/checkouts would be faster. Plus, if done properly maybe it could > "clean up" all of those nearly identically named modules in my intellij > project window and I'd stop clicking on the wrong one when I've had too much > coffee. Nowadays, I fear that people may not have svn installed anymore ;) So requiring svn to download models and drop them into m2 might be an inconvenience. If the models live in a Maven Repository and can be dragged in as a normal dependency, that would seem most convenient. Cheers, -- Richard [1] https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage [2] https://gitbox.apache.org [3] https://s.apache.org/asfyaml [4] https://builds.apache.org/job/UIMA/ [5] https://github.com/apache/uima-uimaj/blob/main/Jenkinsfile [6] https://github.com/apache/uima-build-jenkins-shared-library [7] https://builds.apache.org/job/UIMA/job/uima-uimaj/view/change-requests/
Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
On 2. Jun 2022, at 14:22, Finan, Sean wrote: > > I don't know much about how this is done. If anybody out there has knowledge > or experience that they can pass on, please share. When we did this for UIMA, the steps were documented here: https://uima.apache.org/convert-to-git.html Not 100% sure if this is still the way to go - INFRA may know more. Basically, if the Git(Hub) mirror is working properly, then at some point you can tell Infra to make it the main repo and to put SVN into read-only. But first, the Git(Hub) mirror needs to be up-to-date. I'm hanging out on the ASF slack e.g. in the ComDev channel - feel free to ping me there. Cheers, -- Richard
Apache cTAKES GitHub mirror is stuck in 2019
Hi, it appears that the GitHub mirror of Apache cTAKES may be stuck. When I check the svn log of https://svn.apache.org/repos/asf/ctakes/trunk/, I can see activity as recent as May 2022. However, on GitHub, I can only see stale branches: https://github.com/apache/ctakes/branches Wouldn't it be good if the GitHub mirror would be kept up-to-date? Best, -- Richard
End of the road for UIMAv2 - please upgrade to UIMAv3
On 17. Aug 2021, at 22:08, Finan, Sean wrote: > > If you absolutely require uima 3 for some reason then I don't think that I > can help you. You may want to ask the uima lists about mixing versions or > equivalent v2 solutions for your goals. Besides connecting pipes through remote services, there is no way to combine UIMAv2 and UIMAv3. Work on UIMAv2 has fully stopped. UIMAv2 is very likely not going to get any more updates and bug fixes. A very last uimaFIT 2.6.0 might still make it, but that's likely it. I would strongly recommend that you upgrade to v3 as soon as possible. If you have and trouble doing so, please let me know. The easiest way is via the Apache UIMA users mailing list. Best, -- Richard (Apache UIMA PMC Chair)
Re: uimafit version and commit messages
The uimaFIT releases mainly fix bugs and add smaller features. It should be pretty safe to update to 2.4.0. I didn't face any problems upgrading e.g. DKPro Core which is pretty large and uses uimaFIT intensively. Cheers, -- Richard (atm maintaining uimaFIT) > On 28.11.2017, at 01:45, David Kincaidwrote: > > Thanks for upgrading uimafit. I was thinking of giving it a try myself when > I had a chance. I see that you upgrades to 2.30, but the most recent > version is 2.4.0. Was there a problem with 2.4.0? > > - Dave
Re: Travis for testing
On 19.08.2017, at 16:34, Andrey Kurdumovwrote: > > Given the fact that cTakes is available over GitHub ( > https://github.com/apache/ctakes) I interested in having configure Travis > to run exiting test suite of cTakes. > > That give clear visibility of the workflow and this investment in the > infrastructure could help let other people start faster. The ASF runs a Jenkins server. It includes the necessary plugins to build pull requests and to update the build status on GitHub. Also, the "Embeddable Build Status" plugin is available which can provide you with a "badge" that indicates the build status. Travis offers a great *free* service to the OSS community, in particular to smaller projects to help them get started with proper development infrastructure. But since the ASF has proper development infrastructure run on own resources, there is no need to make use of this free service - IMHO we should leave the free resources to others who do not have their own build infrastructure. Cheers, -- Richard
Re: jcas to json error
On 07.06.2017, at 07:33, Kumar, Avanishwrote: > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.uima.cas.impl.TypeImpl.getSuperType()Lorg/apache/uima/cas/impl/TypeImpl; Could it be that you are mixing JARs from different version of UIMA, e.g. uimaj-core 2.7.0 with uimaj-json 2.10.0 or something like that? Cheers, -- Richard
Re: UIMA 3.0.0
> On 14.02.2017, at 17:35, David Kincaidwrote: > > I see there is a new release of UIMA in the works and it's labeled as > 3.0.0. That jump seems to imply significant changes/updates. Is anyone in > the ctakes community close enough to the UIMA project to know if there is > anything beneficial to ctakes in there? Has anyone been bold enough to try > ctakes with the newest version? Hi all, please mind that this is UIMA 3.0.0 *ALPHA*. This is meant for early access to the new architecture so we (the UIMA project) can get feedback on things that break, possibly on things that can still be improved (in incompatible ways) before we move on to BETA and eventually to a general availability release. The main thing that is changing with UIMA v3 is the internal management of the CAS. In v2, UIMA used its own memory management similar to the way it is implemented in the UIMA C++ version. With UIMA v3, this changes radically. The CAS and the feature structures in the CAS are now proper Java objects subject to Java garbage collection. Preliminary testing indicates that this change can yield some quite significant performance improvements. The completely rewritten CAS also means that JCas classes need to be regenerated to be compatible with v3. This is probably the most significant breaking change. There is also a completely new API to retrieve annotations from the CAS. It was inspired by the uimaFIT (J)CASUtil methods as well as the Java Streaming API. It would be great if you find the time to have a look at UIMA v3. We're happy to hear any feedback you might have and to help you overcome any rough parts you might hit - just leave a post on the d...@uima.apache.org mailing list :) Cheers, -- Richard
Karma for Jira
Hi all, could somebody please add me to the cTAKES project in Jira such that I can assign issues to myself? Best, -- Richard
Re: Welcome Richard Eckart de Castilho as a cTAKES committer
Hi all, cool, thanks! :) Looking forward to helping out with cleaning up some aspects of the cTAKES codebase. Best, -- Richard > On 27.05.2016, at 21:23, Pei Chen <chen...@apache.org> wrote: > > The Apache cTAKES PMC is pleased to introduce Richard Eckart de > Castilho as a new committer. We are very happy with the sustained > growth of the project and look forward to continued contributions from > the community and adding to the ranks of the cTAKES committers. > > --Pei
cTAKES dirty on checkout
Hi all, when checking out the sources of cTAKES from SVN with Eclipse, most of the projects are dirty because the Eclipse settings (.classpath and jdt.core.prefs) are in the SVN. The particular difference is that on my machine, the projects are configured to use a Java 8, while in SVN, it is configured to be a Java 7. The parent POM of cTAKES states Java 8 1.8 1.8 Since the Eclipse files in SVN are at least outdated, maybe it would be a good idea to drop the .classpath and jdt prefs files from SVN and prevent them from being committed? Cheers, -- Richard
Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
As far as I know, you can convert as long as ALL original authors / copyright holders agree to the conversion. Only the original authors may assign new licenses to their work. You might also want to double check that the codebase doesn't contain any copy/pasted code from third sources. As a third party, you cannot convert GPL code to ASL. Mind, I am not a lawyer. If you need a more advice, post to legal-discuss@asf. Cheers, -- Richard On 08.10.2015, at 21:32, andy mcmurrywrote: > caution: Im not sure you can convert GPL3 to ASL2 > anyone know for sure? > > On Thu, Oct 8, 2015 at 12:03 PM, Chen, Pei > wrote: > >> This is great news! >>> What is the current status and procedure? Is there an explicit >> contribution to cTAKES? Is there an ICLA? What about the license of the >> sourceforge project? >> Jira has been opened to track this: >> https://issues.apache.org/jira/browse/CTAKES-384 >> >> 1) Azad, would you be willing to switch licenses? I believe it's >> currently GNU3 -> ASL 2.0? >> 2) Create a project/module in cTAKES sandbox for this >> 3) Export/Import sourceforge and attach the code to the Jira initially. >> One of the current cTAKES committers can commit it to the repo (Until folks >> can commit directly to the ctakes repo directly going forward.) >> >> -Original Message- >> From: Peter Klügl [mailto:peter.klu...@averbis.com] >> Sent: Thursday, October 08, 2015 8:06 AM >> To: dev@ctakes.apache.org >> Subject: Re: Combining Knowledge- and Data-driven Methods for >> De-identification of Clinical Narratives >> >> Hi, >> >> I can offer my help here if required. >> >> I have experience in translating JAPE rules to UIMA Ruta and already >> worked with clinical notes, e.g., also concerning deidentification. >> >> The problem is that I can only invest a few hours in the next two weeks. >> I will have more time next month or even more next year. >> >> What is the current status and procedure? Is there an explicit >> contribution to cTAKES? Is there an ICLA? What about the license of the >> sourceforge project? >> >> Best, >> >> Peter >> >> Am 01.10.2015 um 16:20 schrieb Pei Chen: >>> Hi Azad, >>> This is awesome news. Thanks for adding in the code that was >>> referenced by the paper. I'll create a Jira to track we need to port >>> it over to UIMA/Ruta. >>> >>> In the meantime, the link is at: >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_ >>> >> clinical-2Ddeid_code_ci_master_tree_=BQICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY=yjhqco4EH0XrR798kbkzfYcFQ8z8MR9UF8mMRSjKTH0=_k7AbwzkVrRwTrNC3LArZ5hQ5Q47eh06KCDla7UBugY= >> for those who may be interested in helping out... >>> >>> --Pei >>> >>> Hello Pei, >>> >>> I hope all is well. >>> >>> I have now uploaded the source code for cDeid >>> (https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p >>> _clinical-2Ddeid_code_ci_master_tree_=BQICaQ=qS4goWBT7poplM69zy_3x >>> hKwEW14JZMSdioCoppxeFU=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY >>> >> =yjhqco4EH0XrR798kbkzfYcFQ8z8MR9UF8mMRSjKTH0=_k7AbwzkVrRwTrNC3LArZ5hQ5Q47eh06KCDla7UBugY= >> ) ; I have tried to make the code as portable and modular as possible with >> some trade-off for performance. This should help with porting the code to >> cTAKES/UIMA. >>> >>> Once you let the community know I will try to get involved to help >>> with translating JAPE to RUTA, etc. >>> >>> Best, >>> Azad >> >>
Re: ytex DBconsumer and groovy parser
Hi John, there is actually no grand difference between analysis engines and consumers. Per default, a UIMA runtime may create multiple instances of an analysis engine and run them in parallel (if the runtime supports that), but a consumer must see all data going through the pipeline, so there can only be once instance. The default value of flag about being allowing multiple instances or not is the only real difference. Basically any analysis engine that does only read annotations from the CAS but not add/change anything is a consumer. Consequently, a consumer can be added anywhere in the pipeline, not only at the end (I sometimes do that to see intermediate results). If a component has the allow multiple instances flag set to false (which is usually what you want), then runtimes may react to that differently. E.g. the Collection Processing Engine (CPE) will single-thread all components (analysis engines or consumers) after it hits the first component with allow multiple instances set to false (which is typically a consumer). So to make optimal use of the CPEs multi-threading capabilities, such components should be towards the end of the CPE pipeline. I believe there is a Java interface declaration and base classes for CasConsumers in UIMA - I haven't used these in years. The uimaFIT API doesn't even support these because everything can also be (and is within uimaFIT) nicely modeled using analysis engines and the allow multiple instances flag. Cheers, -- Richard On 02.07.2014, at 04:01, Masanz, James J. masanz.ja...@mayo.edu wrote: Hi John, Not positive this is the line you are referring to, but there is a line in cTAKES_clinical_pipeline.groovy (which is not in sandbox, btw) that has a comment about createAnalysisEngineDescription expects name to not end in .xml even though filename actually does I am guessing the comment you see is trying to say the same thing. cTAKES_clinical_pipeline.groovy is in ctakes-core/scripts/groovy In that script, line 321 is where the writer is specified. There is no separately defined consumer in the same sense that the CPE GUI has consumers that are separate from annotators. The script just uses the last annotator as a consumer and convention is AFAIK to call them writers in this case. Hope that helps, -- James -Original Message- From: John Green [mailto:john.travis.gr...@gmail.com] Sent: Tuesday, July 01, 2014 7:15 PM To: dev@ctakes.apache.org Subject: ytex DBconsumer and groovy parser If someone has a free minute, which, judging from my own life is probably not the case - where in the groovy scrips in sandbox do you define the consumer to use? There is one comment that says dont put the .xml here then there is a path to the dictionary ae. Im working by ssh from the hospital a lot in my free time in the ICU and running gui CPEs isn't gonna cut it. Apropos the ytex dbconsumer - I should be able to just tack this on to the end of the ytex aggregate pipeline? I'm probably still asking very naive questions but to date I still haven't had the time to dive into UIMA's base very well, so I apologize. My goal is to run the full ytex pipeline from the command line with the ytex dbconsumer ... Thanks for everyone's patience, John
Re: suggestion for default pipelines
It would be nice if uimaFIT provided a Maven plugin to automatically generate descriptors for aggregates. Maybe if we come up with a convention for factories, e.g. a class with static methods that do not take any parameters and that return descriptors, or methods that bear a specific Java annotation, e.g. @AutoGenerateDescriptor) it should be possible to implement such a Maven plugin. Cheers, -- Richard On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote: +1. And note that once you have a descriptor, you can generate the XML, so we should arrange to replace the current XML descriptors with ones generated automatically from the uimaFIT code. That should reduce some synchronization problems when the Java code was changed but the XML descriptor was not. Steve On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the list. We have been using some UIMAFit pipeline builders in the temporal project that maybe could be moved into clinical-pipeline. For example, look to this file: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup with the static methods getPreprocessorAggregateBuilder() and getLightweightPreprocessorAggregateBuilder() [no umls]. So my idea would be to create a class in clinical-pipeline (CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions instead of AggregateBuilders?): getStandardUMLSPipeline() -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component in ctakes We could then potentially merge our entry points -- I think Abraham's experience points out that this is currently confusing, as well as probably not implemented optimally. For example, either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people feel strongly about keeping those around. Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom file getting a UIMA pipeline is one UimaFit call: builder.add(CTAKESPipelines.getStandardUMLSPipeline()); I think this would actually be pretty easy to implement, but hoping to get some feedback on whether this is a good direction. Tim
Re: Represent your project at ApacheCon
Hi Andy, you might find this interesting: https://github.com/jimpil/clojuima -- Richard On 29.01.2014, at 12:24, andy mcmurry mcmurry.a...@gmail.com wrote: I'm hoping to attend. If I were to present, it would be using NLP to find evidence of DNA mutations causing disease. Interesting topic (for me at least) but I'm not sure if ApacheCON would be into it. PS: the radio silence is because I've been working on both the VM and a wrapper for REST that runs in the JVM (clojure). Clojure, having its origins in LISP, is a better fit for serious NLP work than Groovy however, groovy is probably easier for a novice to understand. (which is important). These days everyone understands REST, so the idea of providing a VM with REST support for NLP services is highly attractive (to me at least). Does this interest anyone else? --AndyMC
How are cTAKES resources distributed via Maven Central?
Hi I was looking if/how cTAKES distributes resources via Maven Central. I found some, but I am actually quite a bit confused now. There are the component-res artifacts [1], like ctakes-pos-tagger-res. These have a JAR and a sources-JAR. The JAR is practically empty, but the sources-JAR appears to contain the actual resources. Is there a special reason for this? Additionally, there is the ctakes-resources-distribution which is distributed as a bin.zip via Maven Central. It appears to contain UMLS data. Has it been replaced by ctakes-resources-umls2011ab in 3.1.1? The ctakes-resources-umls2011ab JAR actually contains data, contrary to the component-res JARs mentioned above. Why is there data in this JAR, but not in the component-res JARs? /me scratching head… Please enlighten me :) Cheers, -- Richard [1] http://search.maven.org/#search%7Cga%7C1%7Cctakes%20res
Re: scala and groovy
); And those four lines still result in the following: Resolving dependency: org.cleartk#cleartk-util;0.9.2 {default=[default]} Preparing to download artifact org.cleartk#cleartk-util;0.9.2!cleartk-util.jar Preparing to download artifact org.apache.uima#uimaj-core;2.4.0!uimaj-core.jar Preparing to download artifact org.uimafit#uimafit;1.4.0!uimafit.jar Preparing to download artifact args4j#args4j;2.0.16!args4j.jar Preparing to download artifact com.google.guava#guava;13.0!guava.jar Preparing to download artifact com.carrotsearch#hppc;0.4.1!hppc.jar Preparing to download artifact commons-io#commons-io;2.4!commons-io.jar Preparing to download artifact commons-lang#commons-lang;2.4!commons-lang.jar Preparing to download artifact org.apache.uima#uimaj-tools;2.4.0!uimaj-tools.jar Preparing to download artifact org.springframework#spring-core;3.1.0.RELEASE!spring-core.jar Preparing to download artifact org.springframework#spring-context;3.1.0.RELEASE!spring-context.jar Preparing to download artifact org.apache.uima#uimaj-cpe;2.4.0!uimaj-cpe.jar Preparing to download artifact org.apache.uima#uimaj-document-annotation;2.4.0!uimaj-document-annotation.jar Preparing to download artifact org.apache.uima#uimaj-adapter-vinci;2.4.0!uimaj-adapter-vinci.jar Preparing to download artifact org.apache.uima#jVinci;2.4.0!jVinci.jar Preparing to download artifact org.springframework#spring-asm;3.1.0.RELEASE!spring-asm.jar Preparing to download artifact commons-logging#commons-logging;1.1.1!commons-logging.jar Preparing to download artifact org.springframework#spring-aop;3.1.0.RELEASE!spring-aop.jar Preparing to download artifact org.springframework#spring-beans;3.1.0.RELEASE!spring-beans.jar Preparing to download artifact org.springframework#spring-expression;3.1.0.RELEASE!spring-expression.jar Preparing to download artifact aopalliance#aopalliance;1.0!aopalliance.jar org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: General error during conversion: Error grabbing Grapes -- [download failed: org.springframework#spring-asm;3.1.0.RELEASE!spring-asm.jar] java.lang.RuntimeException: Error grabbing Grapes -- [download failed: org.springframework#spring-asm;3.1.0.RELEASE!spring-asm.jar] I tried deleting .groovy/grapes/org.springframework but get the same error I don't see this as being friendly for new users if downloading dependencies is not so simple. -Original Message- From: dev-return-2317-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2317-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Richard Eckart de Castilho Sent: Friday, December 13, 2013 12:16 PM To: dev@ctakes.apache.org Subject: Re: scala and groovy On 13.12.2013, at 15:27, Steven Bethard steven.beth...@gmail.com wrote: P.S. I've stayed out of this whole Groovy thing because we (at ClearTK) had some bad experiences with Groovy in the past. Mainly with Groovy scripts getting out of sync with the rest of the code base, just like XML descriptors, though perhaps the IDEs and Maven are better now and that's no longer a problem? But this whole grape thing instead of standard Maven isn't changing my mind. Not that I planned to switch away from Scala for my scripting anyway, but... I heard and read about your bad experiences with Groovy. I believe that the IDEs got somewhat better at handling Groovy. However, I think a difference needs to be made depending on the use case. Some people use the XML files as a format to exchange pipelines with each other. However, alone, these files are not of much use. One benefit of using Groovy as a pipeline-exchange format is, that it can actually get all its dependencies itself via Grape. The Groovy script is quite self-contained (although it relies on the Maven infrastructure for downloading its dependencies). Another is, that thanks to uimaFIT, the Groovy code is much less verbose than the XML descriptors. At the UKP Lab, we also use Groovy sometimes for high-level experiment logic. For us, it is a good compromise between inflexible and verbose XML files and flexible and verbose Java code. Groovy is flexible and concise and the IDE support is meanwhile reasonable. Mind that the IDE support for Grapes (at least in Eclipse) is hilarious. Grapes cause the IDE to become quite unresponsive as the artifact resolution is now well integrated into the IDE. So here is my summarized opinion when to use or not to use Groovy: == Examples / Exchange == In order to get quick results for new users and to showcase the capabilities of a component collection such as DKPro Core or cTAKES, I think the Groovy scripts are a convenient vehicle. At DKPro Core, we also packaged all the resources (models) as Maven artifacts, which gives us an additional edge over the manual downloading currently happening in the cTAKES Groovy prototypes. == High-level experiment orchestration == Groovy
Re: scala and groovy
I can understand your reservations. However, they appear to be similar to the reservations that some people have against using Maven (which also automatically downloads stuff, although is for developers) or using web-services (e.g. the UMLS service used by cTAKES). A Groovy script is certainly no replacement for a full download, for all the reasons that you are describing. I think it can be a supplement for those who do not want to start out with the full download. It may be possible to combine both approaches, though. E.g. use the same script in a scenario which does auto-downloading and in a scenario where the user has downloaded a distribution. In the second case, the distribution would have to come with proper configuration files to point the artifact resolution mechanism at the folders to which the distribution has been downloaded. It sounds reasonable, but it is probably much less straight forward then it sounds. But eventually, that is part of the idea, that you can trade convenience (auto-downloads) for control (pre-downloaded artifacts). I believe, the script approach also shows where resource handling could be improved, e.g. by distributing certain resources as Maven artifacts and/or incorporating the ability of automatically downloading resources directly in analysis engines. IMHO, there shouldn't be any code which explicitly downloads resources. In DKPro Core, we support both. If a resources is available on the classpath (e.g. by virtue of being a Maven dependency, by being referred to by a @Grab, or by having been downloaded as part of a distribution), it is used form there. Otherwise, our AEs try to automatically download the resource from our Maven repository (unless this is explicitly disabled). In my experience, using technologies like Maven or Grapes in a corporate environment should be supplemented by a private artifact repository run by the corporation, e.g. to reduce network issues when talking to external repositories, or to distribute proprietary artifacts (resources, analysis components, or other libraries). Corporate users should then use this repository as a general proxy to access any artifacts. E.g. at the UKP Lab, we run such an internal repository. All our users get all artifacts through there - it caches everything anybody ever used, so we can even continue to use artifacts should the remote repository be down temporarily, permanently, or if artifacts got deleted. We trust in the Maven infrastructure, but we like to have control over the artifacts. Some stuff, like the Groovy scripts, we do only as a service to newbies or for doing small things, e.g. simple conversion pipelines. They are a result of trying to provide some usable examples for people that have reservations against installing Eclipse, setting up Maven, etc. And they appear to be less intimidating then Java to people who know e.g. Python, because they are directly executable and quite readable. I'm not perfectly happy with them, because there is still stuff that is too technical, e.g. all the import statements. Eventually, a similar technology would be nice which only consists of the pipeline declaration (no @Grabs, no imports), but still functions in the same way (including auto-downloads). But, that is - just as the pre-deploy scenario - future work ;) Anyway, I would also like to thank you for experimenting with the idea and testing its implications in a corporate environment! -- Richard On 13.12.2013, at 19:46, Masanz, James J. masanz.ja...@mayo.edu wrote: Thanks Richard for doing all that testing. But the idea that we cannot easily get to what is causing the issue, together with the fact Tim was able to reproduce one of my issues [1], leads me to question using dynamic downloading of anything for our users. I would prefer to see a single download that a user extracts from, which I see having the following advantages - no mysterious suspected network issues - user can be told how much space will be taken up - user has easy control where things will be put (rather than having to configure where grapes will be stored, if user does not want them under their home directory) That's my 2 cents. Yes, I am behind a firewall. And in fact I am VPN'd in to work. But I suspect some of our users do that too. [1] http://markmail.org/message/lgo7eyruotl7nnix -- James -Original Message- From: dev-return-2322-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2322-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Richard Eckart de Castilho Sent: Friday, December 13, 2013 3:36 PM To: dev@ctakes.apache.org Subject: Re: scala and groovy Hi James, I enabled info on the grape resolving using export JAVA_OPTS=-Dgroovy.grape.report.downloads=true $JAVA_OPTS Then I tried your script three times. 1) First, I just ran without any changes to my system (custom grapeConfig.xml which avoids using .m2
Re: cTAKES Groovy...
Might be a temporary network problem. The artifact is on Maven Central: http://search.maven.org/#artifactdetails%7Cedu.mit.findstruct%7Cfindstructapi%7C0.0.1%7Cjar -- Richard On 12.12.2013, at 15:01, Masanz, James J. masanz.ja...@mayo.edu wrote: The story continues: The @GrabResolver line from Richard did the trick for jwnl. But I cleared my .groovy/grapes and .m2/repository and tried running parser.groovy and get the following: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: General error during conversion: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] java.lang.RuntimeException: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] FYI. I will take a look but if anyone has any hints, don't be shy -Original Message- From: dev-return-2299-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2299-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Finan, Sean Sent: Friday, December 06, 2013 2:38 PM To: dev@ctakes.apache.org Subject: RE: cTAKES Groovy... Good stuff - Thanks Richard -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Friday, December 06, 2013 3:30 PM To: 'dev@ctakes.apache.org' Subject: RE: cTAKES Groovy... Thanks Richard! That did the trick I'll create a JIRA and update the script including adding a comment that that @GrabResolver is only needed for pre-OpenNLP 1.5.3 and should be removed when we upgrade to 1.5.3+. and I'll update CTAKES-191 Update Apache OpenNLP dependency to 1.5.3 with a reminder to update the script. Trunk of cTAKES still uses 1.5.2-incubating -Original Message- From: dev-return-2297-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2297-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Richard Eckart de Castilho Sent: Friday, December 06, 2013 2:12 PM To: dev@ctakes.apache.org Subject: Re: cTAKES Groovy... On 06.12.2013, at 18:01, Masanz, James J. masanz.ja...@mayo.edu wrote: I have not solved my issues on my ubuntu server yet where Error grabbing Grapes -- [unresolved dependency: jwnl#jwnl;1.3.3: not found] This has also already been fixed in OpenNLP 1.5.3, so there must be some dependency on OpenNLP 1.5.(1|2)-incubating. Anyway, you should be able to fix it by adding this to the beginning of your Groovy script, in front of the Grapes: @GrabResolver(name='opennlp.sf.net', root='http://opennlp.sourceforge.net/maven2') -- Richard
Re: cTAKES Groovy...
I believe that Grape (like Maven) caches failures. It might be necessary to delete any cached info on that artifact from your local grape repository before you try again. Btw. there you might (or might not) be able to find additional information on why the download failed. Check out the trouble-shooting section in the DKPro Core Groovy recipe page: http://code.google.com/p/dkpro-core-asl/wiki/DKProGroovyCookbook#Trouble_shooting on cache flushing and on enabling verbose information on Grape downloads. -- Richard On 12.12.2013, at 15:22, Masanz, James J. masanz.ja...@mayo.edu wrote: Shouldn't be firewall - other grapes download fine. I created a short groovy script to just grab findstructapi - I copy/pasted the @grab line from from the Groovy Grape section of http://search.maven.org/#artifactdetails%7Cedu.mit.findstruct%7Cfindstructapi%7C0.0.1%7Cjar And I still get org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: General error during conversion: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] java.lang.RuntimeException: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] Very odd. My script is simply: #!/usr/bin/env groovy @Grab(group='edu.mit.findstruct', module='findstructapi', version='0.0.1') import java.io.File; if(args.length 1) { System.out.println(Please specify input directory); System.exit(1); } System.out.println(Input parm is: + args[0]); System.exit(0); -Original Message- From: dev-return-2305-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2305-Masanz.James=mayo@ctakes.apache.org] On Behalf Of William Karl Thompson Sent: Thursday, December 12, 2013 11:06 AM To: dev@ctakes.apache.org Subject: RE: cTAKES Groovy... Seems unlikely to be the source of your problem, but could it be a firewall issue? -Original Message- From: Richard Eckart de Castilho [mailto:r...@apache.org] Sent: Thursday, December 12, 2013 11:04 AM To: dev@ctakes.apache.org Subject: Re: cTAKES Groovy... Might be a temporary network problem. The artifact is on Maven Central: http://search.maven.org/#artifactdetails%7Cedu.mit.findstruct%7Cfindstructapi%7C0.0.1%7Cjar -- Richard On 12.12.2013, at 15:01, Masanz, James J. masanz.ja...@mayo.edu wrote: The story continues: The @GrabResolver line from Richard did the trick for jwnl. But I cleared my .groovy/grapes and .m2/repository and tried running parser.groovy and get the following: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: General error during conversion: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] java.lang.RuntimeException: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] FYI. I will take a look but if anyone has any hints, don't be shy -Original Message- From: dev-return-2299-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2299-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Finan, Sean Sent: Friday, December 06, 2013 2:38 PM To: dev@ctakes.apache.org Subject: RE: cTAKES Groovy... Good stuff - Thanks Richard -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Friday, December 06, 2013 3:30 PM To: 'dev@ctakes.apache.org' Subject: RE: cTAKES Groovy... Thanks Richard! That did the trick I'll create a JIRA and update the script including adding a comment that that @GrabResolver is only needed for pre-OpenNLP 1.5.3 and should be removed when we upgrade to 1.5.3+. and I'll update CTAKES-191 Update Apache OpenNLP dependency to 1.5.3 with a reminder to update the script. Trunk of cTAKES still uses 1.5.2-incubating -Original Message- From: dev-return-2297-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-2297-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Richard Eckart de Castilho Sent: Friday, December 06, 2013 2:12 PM To: dev@ctakes.apache.org Subject: Re: cTAKES Groovy... On 06.12.2013, at 18:01, Masanz, James J. masanz.ja...@mayo.edu wrote: I have not solved my issues on my ubuntu server yet where Error grabbing Grapes -- [unresolved dependency: jwnl#jwnl;1.3.3: not found] This has also already been fixed in OpenNLP 1.5.3, so there must be some dependency on OpenNLP 1.5.(1|2)-incubating. Anyway, you should be able to fix it by adding this to the beginning of your Groovy script, in front of the Grapes: @GrabResolver(name='opennlp.sf.net', root='http://opennlp.sourceforge.net/maven2') -- Richard
Re: cTAKES Groovy...
I tried with the small script (buh): export JAVA_OPTS=-Dgroovy.grape.report.downloads=true $JAVA_OPTS HighFire-6:~ bluefire$ ./buh Resolving dependency: edu.mit.findstruct#findstructapi;0.0.1 {default=[default]} Preparing to download artifact edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar Downloaded 13 Kbytes in 2326ms: [SUCCESSFUL ] edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar (2311ms) Please specify input directory looks ok to me. -- Richard On 12.12.2013, at 15:54, Tim Miller timothy.mil...@childrens.harvard.edu wrote: I was able to replicate the error after removing the findstruct directories from my .groovy and .m2 repositories. On 12/12/2013 12:22 PM, Masanz, James J. wrote: Shouldn't be firewall - other grapes download fine. I created a short groovy script to just grab findstructapi - I copy/pasted the @grab line from from the Groovy Grape section of http://search.maven.org/#artifactdetails%7Cedu.mit.findstruct%7Cfindstructapi%7C0.0.1%7Cjar And I still get org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: General error during conversion: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] java.lang.RuntimeException: Error grabbing Grapes -- [download failed: edu.mit.findstruct#findstructapi;0.0.1!findstructapi.jar] Very odd. My script is simply: #!/usr/bin/env groovy @Grab(group='edu.mit.findstruct', module='findstructapi', version='0.0.1') import java.io.File; if(args.length 1) { System.out.println(Please specify input directory); System.exit(1); } System.out.println(Input parm is: + args[0]); System.exit(0);
Re: cTAKES user interface
Maven allows to do marvelous things on the CLI, provided you throw in an additional component: Groovy. We did some amazing self-contained Groovy scripts with uimaFIT and DKPro Core which you might find interesting http://code.google.com/p/dkpro-core-asl/wiki/DKProGroovyCookbook -- Richard On 29.10.2013, at 23:09, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: I think this is also an area where Maven integration was a small step backwards (I greatly appreciate the steps forward it allowed). I used to run stuff from the command line and in scripts more often but it's slightly less straightforward setting up the classpath with maven -- before you could put a simple java -cp lib/*.jar class name in a script, now I'm not sure how to go about it using maven. I'm sure there's a way, but I am afraid of falling down the maven rabbit hole. Tim On Oct 29, 2013, at 5:53 PM, Chen, Pei wrote: +1 Pan, the short answer is yes- it can be done in CLI. The problem is that most of us who are already familiar with the nitty gritty are probably doing this with some sort of custom scripts or solution. Cc' the dev group to get a fresh perspective; not sure what the easiest would be-- run the CPE via command line with default input/output directories or running a Driver Main Class as part of examples. --Pei
Re: CTAKES-248- include original covered text of NEs which can't be recovered post if NE is from a disjoint span
What benefit would it have to store a string with some separation character (which may mean that the separation character in the elements may need to be escaped), over using a feature of type FSArrayToken pointing to the original segments? Not sure if that is what Karthik meant when referring to fetching the matched atom. -- Richard On 02.10.2013, at 01:46, Karthik Sarma ksa...@ksarma.com wrote: Hmm, couldn't you just fetch the matched atom and use that? Should be the same information (without, I suppose, the original ordering and split). -- Karthik Sarma UCLA Medical Scientist Training Program Class of 20?? Member, UCLA Medical Imaging Informatics Lab Member, CA Delegation to the House of Delegates of the American Medical Association ksa...@ksarma.com gchat: ksa...@gmail.com linkedin: www.linkedin.com/in/ksarma On Tue, Oct 1, 2013 at 12:37 PM, Masanz, James J. masanz.ja...@mayo.eduwrote: Yes, this would help address that multiple permutations example. The new getOriginalText method would return something like Acute|Disease. Right now I'm thinking of just using vertical bar as delimiter, to start with at least, but think it should be configurable. -Original Message- From: dev-return-2067-Masanz.James=mayo@ctakes.apache.org [mailto: dev-return-2067-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Chen, Pei Sent: Tuesday, October 01, 2013 9:38 AM To: dev@ctakes.apache.org Subject: CTAKES-248- include original covered text of NEs which can't be recovered post if NE is from a disjoint span This sounds pretty cool. James, will this address the multiple permutations lookup example: Acute alcoholic liver disease. There is a cui: C0001314: Acute Disease, but if you getCoveredText(), on the UMLSConcept, you would actually get the same Acute alcoholic liver disease instead of Acute Disease. So, there is a new field called getOriginalText() that matched the hit? -Original Message- From: james-mas...@apache.org [mailto:james-mas...@apache.org] Sent: Monday, September 30, 2013 5:49 PM To: comm...@ctakes.apache.org Subject: svn commit: r1527792 - /ctakes/trunk/ctakes-type- system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSyst em.xml Author: james-masanz Date: Mon Sep 30 21:48:01 2013 New Revision: 1527792 URL: http://svn.apache.org/r1527792 Log: CTAKES-248 - for named entities, since the annotation just has the begin and end offset, it is requested to have a way to get the original covered text (especially for disjoint spans) so it is possible to know which words in the covered text were actually used in the matching to the dictionary entry Modified: ctakes/trunk/ctakes-type- system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSyst em.xml Modified: ctakes/trunk/ctakes-type- system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSyst em.xml URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-type- system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSyst em.xml?rev=1527792r1=1527791r2=1527792view=diff == Binary files - no diff available.
Re: ClearNLP POSTagger
Hi, did you train new models for the ClearNLP/OpenNLP tools? (Maybe I knew if I had followed a past discussion on models more closely…) Cheers, -- Richard Am 08.04.2013 um 18:15 schrieb Chen, Pei pei.c...@childrens.harvard.edu: Hi, While working on the Dependency Parser/SRL labeler, we also have a POSTagger from ClearNLP. It is fairly simple and I have the code ready (also trained on the same data as the dep parser- MiPaq/SHARP) to be checked-in. What does the folks think: We can include both Analysis Engines in the ctakes-pos-tagger project. But should we leave the current OpenNLP in the default pipeline or default to the latest? The ClearNLP POS tagger shows more robust results on unknown words by generalizing lexical features. You can find the reference from this paper. Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection, Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL'12), 363-367, Jeju, Korea, 2012. [1] It also uses AdaGrad for machine learning, which is a more advanced learning algorithm than maximum entropy used by OpenNLP. [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf -- --- Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing Lab (UKP-TUD) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 eck...@ukp.informatik.tu-darmstadt.de www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de ---