Steve Bethard wrote:
I spent some time writing a script for diff-ing CASes
I urge anyone interested in comparing cTakes CASes / output to use this type of
approach. Comparison of program output is a post-process task, and unless
absolutely necessary code to juggle data and metadata belongs
Hi Kim,
One might want compare the Sentence detector that uses end of line characters
as sentence splitters with one that does not. Such a change in sentence
splitting would not only effect the sentence type discoveries but also
practically every type that follows.
Another might want to
The option Sean mentioned of writing your own custom consumer (without the UIMA
id that is causing your issues) should meet these needs I believe.
Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
Hi Sean,
Well of course that makes plenty of sense. Testing different cTakes
configurations you would expect different output. In our testing we've
found several cases where running with the same configuration outputs
different data under different moons. Having consistent results helps us
know
I think we may really prefer the first method. Since it doesn't appear
that there are any consequences with moving forward with changing the
code, we would really like to move forward with this approach.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 10/07/2014
FWIW, I agree with Sean that comparing should be a post-processing step and
trying to get UIMA internal IDs to match on subsequent runs is not worth
opening the code for.
-Original Message-
From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
Sent: Tuesday, October 07, 2014 10:56
I think changing the code raises at least some concerns of affecting others,
while adding a custom consumer raises zero. Given how easy it is to write a
custom consumer, that is my vote.
Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
I think it would be helpful actually, as digging deeper into the issue
has highlighted to me a few places in the code that actually cause
inconsistent results to be returned when running the same document
through multiple times. I think having the code base be predictable will
make it easier to
It concerns me a bit by making the code return consistent results would
be so concerning. This should be the default mode of operation.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 10/07/2014 09:59 AM, britt fitch wrote:
I think changing the code raises at
Jay,
I agree. This does lead to reproducible unit tests, which helps us out
in the long term.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 10/06/2014 05:38 PM, jay vyas wrote:
Im not a ctakes expert by any means, but in general, I like that idea
Hi Kim,
It concerns me a bit by making the code return consistent results would be so
concerning.
Could you please clarify what you mean by consistent results? Do you mean
ordering and IDs or are you talking about actual type values not matching?
This should be the default mode of
Hi Sean,
No, your not a jerk. These are things worth considering, and I
understand your concerns with touching various points of the codebase.
I'll talk with our group over here and see where we want to go. We are
really interested in cTakes behaving well, so we are usually pretty
careful in
Hi Sean,
Yes, I mean actual type values not matching.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 10/07/2014 10:46 AM, Finan, Sean wrote:
Hi Kim,
It concerns me a bit by making the code return consistent results would be
so concerning.
Could you
I did not intend to step on anyone's toes.
One of the reasons I proposed the changes was to try to make it extremely
obvious when there are significant difference in output from the cTakes
pipeline when running the same document again, and once identified, make it
easier to identify the source of
I'm just about sapped on this topic. What comes below is my final writing.
Kim wrote:
Yes, I mean actual type values not matching.
Ok, this is a very serious problem and should have nothing to do with ordering
and/or IDs. I repeat: this should have nothing to do with ordering or ids.
Hi Bruce,
Could you send the record over that you are seeing this on?
Thanks,
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 10/07/2014 11:20 AM, Bruce Tietjen wrote:
I did not intend to step on anyone's toes.
One of the reasons I proposed the changes was
Hi Kim,
Great Catch!
I think that by now this thread may be discarded by most as spam. So, I'm back
(apologies - I know that you are tired of me by now).
I checked the code that you pointed to ... I really dislike looking at older
cTakes code because I'm filled with an overwhelming urge to
Hi Sean,
Alright, it seems that rather than doing the sorted approach, we want to
manage these individually. I'll create tickets on all of the items we
have found so far. This is just one example. Then maybe we can move our
discussion of how to solve each one to discussions around that ticket
18 matches
Mail list logo