I'm trying to run count.pl for a directory of unicode documents (a sample document has been attached) using Perl 5 (v5.18.2). The output is a list of digits and punctuations without any unicode word: 2732 .<>1589 :<>626 2<>19 !<>17 10<>16 4<>14 13<>13 12<>13 20<>12 9<>11 15<>11 3<>10 5<>10 Is it possible to ask count.pl to tokenize the input file just by space?
There is --token option which maybe useful. But I don't how to use it.