Re: readable POS tags

2014-05-07 Thread Daniel Naber
On 2014-04-08 14:44, Daniel Naber wrote: I have now added a branch (readable-pos-tags) for this, simply because the changes are getting so complex. It's still incomplete and buggy. As you may have noticed, I did some work in this branch. You can see it at

Re: readable POS tags

2014-05-07 Thread Daniel Naber
On 2014-05-07 19:07, Marcin Miłkowski wrote: unification. I still don't get why German doesn't use it for disambiguation, for example. Maybe because nobody has seen an urgent need for that yet. I don't work that much on the German rules, but I'm generally okay with the way they work.

Re: readable POS tags

2014-04-08 Thread Marcin Miłkowski
W dniu 2014-04-07 23:01, Daniel Naber pisze: On 2014-03-25 09:35, Daniel Naber wrote: I've written an overview of how we could use readable POS tags in LT: http://wiki.languagetool.org/readable-part-of-speech-tags I'm writing a prototypical implementation on this for English now. But

Re: readable POS tags

2014-04-08 Thread Daniel Naber
On 2014-04-08 08:43, Marcin Miłkowski wrote: Internally, we now have information like this: postag=VBD, pos=verb, tense=past (etc.). But the disambiguation only works on the old tag? I guess I will need to resolve VBD here so the action works on both the old and the new representation? I

Re: readable POS tags

2014-04-07 Thread Daniel Naber
On 2014-03-25 09:35, Daniel Naber wrote: I've written an overview of how we could use readable POS tags in LT: http://wiki.languagetool.org/readable-part-of-speech-tags I'm writing a prototypical implementation on this for English now. But there's one point where I'm stuck. Maybe I'm

Re: readable POS tags

2014-03-26 Thread Daniel Naber
On 2014-03-25 21:59, Dominique Pellé wrote: power compared to using regexp. Power users know regexp well as they are used in many programs so they don't have to learn something new. Power users also like the conciseness of regexp. As you said, the old way of matching will still be there for

Re: readable POS tags

2014-03-26 Thread Dave Pawson
On 26 March 2014 10:49, Daniel Naber daniel.na...@languagetool.org wrote: On 2014-03-25 21:59, Dominique Pellé wrote: power compared to using regexp. Power users know regexp well as they are used in many programs so they don't have to learn something new. Power users also like the

Re: readable POS tags

2014-03-26 Thread Daniel Naber
On 2014-03-25 14:24, Marcin Miłkowski wrote: So instead of just adding the POS tag we get from Morfologik to our AnalyzedToken object as a string, we interpret it and store something like pos = preposition, case = accusative. Is it that what you mean? Exactly. Any ideas on how the VBP tag

RE: readable POS tags

2014-03-26 Thread Mike Unwalla
I agree that backward compatibility is important. Without backward compatibility, the proposed change means that the content of disambiguation files and grammar files must be changed. That is a huge task. Even if you develop a utility that lets people convert files to the new format, there

Re: readable POS tags

2014-03-26 Thread Marcin Miłkowski
W dniu 2014-03-26 13:51, Daniel Naber pisze: On 2014-03-25 14:24, Marcin Miłkowski wrote: So instead of just adding the POS tag we get from Morfologik to our AnalyzedToken object as a string, we interpret it and store something like pos = preposition, case = accusative. Is it that what you

Re: readable POS tags

2014-03-26 Thread Marcin Miłkowski
W dniu 2014-03-26 15:20, Mike Unwalla pisze: I agree that backward compatibility is important. Without backward compatibility, the proposed change means that the content of disambiguation files and grammar files must be changed. That is a huge task. Even if you develop a utility that lets

Re: readable POS tags

2014-03-26 Thread Daniel Naber
On 2014-03-26 17:49, Marcin Miłkowski wrote: No, that would be horrible, as this is not an improvement. The problem is not that tags are cryptic and short; That's also a problem, but not so much for power users and for everybody else we will be able to solve that in the user interface (i.e.

Re: readable POS tags

2014-03-25 Thread Dave Pawson
On 25 March 2014 08:35, Daniel Naber daniel.na...@languagetool.org wrote: Hi, I've written an overview of how we could use readable POS tags in LT: http://wiki.languagetool.org/readable-part-of-speech-tags The core part however - how do these new POS tags actually look like - is still

Re: readable POS tags

2014-03-25 Thread Daniel Naber
On 2014-03-25 11:07, Marcin Miłkowski wrote: For all I can see, no HashMaps are required at all, just a consistent way of understanding the values in class members. So instead of just adding the POS tag we get from Morfologik to our AnalyzedToken object as a string, we interpret it and store

Re: readable POS tags

2014-03-25 Thread Marcin Miłkowski
W dniu 2014-03-25 13:29, Daniel Naber pisze: On 2014-03-25 11:07, Marcin Miłkowski wrote: For all I can see, no HashMaps are required at all, just a consistent way of understanding the values in class members. So instead of just adding the POS tag we get from Morfologik to our AnalyzedToken

Re: readable POS tags

2014-03-25 Thread Dominique Pellé
Daniel Naber wrote: Hi, I've written an overview of how we could use readable POS tags in LT: http://wiki.languagetool.org/readable-part-of-speech-tags The core part however - how do these new POS tags actually look like - is still missing. Any input on the overview and ideas about that