On Wed, 19 Feb 2003, [EMAIL PROTECTED] wrote: > My problem is that these rules assume that the lemma can't contain > any of the characters [+:_]. Unfortunately, sometimes in the input, > I find cases where it happens: > > |50:50:42_MC| or |+50dollar:23_FO| > > where the lemmas are (resp.) "50:50" and "+50dollar". > > However, if I change the Lemma rule to > > Lemma: /[^\\|\\s\\"]+/i { $item[1] } > > I then never match any Morpho, Position or POSpeech anymore, as what > they contain perfectly fits into the requirements for the Lemma
I thought of a few possibilities, but your input is just too ambiguous. Can you define your input a little better? For instance, can there be a ':' after the one that ends a Lemma? Your current syntax implies that there can be a ':' in POSpeech, which would make the parsing you want impossible. If possible, I would simply replace the 3 different divisors with ',' (or some other special character that won't occur in your input) and use '|' for the group boundaries. It's enough to have two special characters, I'm not sure why you need 4, especially when some of those can happen in your input. Your input would then look like this: |50:50,42,,MC||record2||record3||...| or, with newlines to replace '|' and '|' replacing ',' 50:50|42||MC record2 record3 ... HTH Ted