Gazs, wordBreaker is what you're looking for. Unfortunately, there are a number of issues involved with working with strongly case-marked languages, like Hungarian, only the first of which is writing a custom wordBreaker to "de-affix" the arguments. Here's an explanation here:
http://mitcho.com/blog/projects/in-case-of-case/ mitcho > Hello, > > I'm toying around with creating a Hungarian language parser for > Ubiquity, but I have a big problem: how can I tell Ubiquity that not > only is Hungarian left-branching, the suffixes (which show the roles) > are glued to the end of the words (...which sometimes assimilate as > well, but that's a later problem). > > There are two ways I thought I could make it work with Ubiquity. The > wordBreaker function from the Japanese parser seems unfortunately too > rigid (it mercilessly chops off everything that looks like a suffix). > The other function that seemed like it could work was the > normalizeArgument found in romance language parsers, but I couldn't > make it work. Would this be what I'm looking for? > > Thanks for any help, > Gazs > > -- > > You received this message because you are subscribed to the Google Groups > "ubiquity-firefox" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/ubiquity-firefox?hl=en. > > -- mitcho (Michael 芳貴 Erlewine) [email protected] http://mitcho.com/ linguist, coder, teacher -- You received this message because you are subscribed to the Google Groups "ubiquity-firefox" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ubiquity-firefox?hl=en.
