On 26/07/2011 09:06, Jörn Kottmann wrote:
On 7/25/11 11:07 PM, mark meiklejohn wrote:
Hi,
I'm coming from 1.3.1 to 1.5.1, now I can get 1.5.1 up and running
fine with the examples. However, there are some features missing and
I'm wondering how I can go about incorporating/instantiating them.
Typically, I used the TreebankParser as it gives me nice structure to
traverse, but that seems to have gone AWOL or has been replaced by the
POSModels.
Do you need to parse a sentence, or do you only want to do
part-of-speech tagging? If you only do pos tagging you should
only use the pos tagger, because it is much faster.
I agree it is much faster but I need the full parse.
First off I'm looking to use the 'tagdict' that was with 1.3.1 & case
insensitive mode. The reason being is that I have no control over the
input that I will be processing.
So it could be entirely possible that information I receive could be
all in capitals i.e. "I NEED OPENNLP TO BE ABLE TO PROCESS IN CASE
INSENSITIVE MODE" now in this case 1.5.1 typically returns the
majority of these as NNPs as would 1.3.1, which is no good, but since
1.3.1 would process in case insensitive it give me a better parse
structure for it.
Now I can't just reduce everything to lower case as it comes through
as this may have knock-on effects. So is there away to achieve what I
want to achieve??
If someone knows how to go about instantiating what I'm looking for an
example would be greatly appreciated
Just had a look at the code. Looks like the case sensitive flag doesn't
work correctly with the pos dictionary we currently have.
It is not possible to set it to false.
Do you want to open a jira?
I'll raise an issue through jira
It should be fixed for 1.5.2, which will be released soon.
Jörn