> >> > > So, with this improvement considering ASCII-only words a special case, > >> > > libunistring really beats them all. > >> > > > >> > > >> > yeah libunistring looks like good stuff - I must check the source! > >> > > >> > I still note you need to apply word filtering rules on words beginning > >> > with numbers or symbols - Im sure thats easy to do? > >> > > >> > >> Probably words starting with symbols other than underscore can be > >> avoided. BTW, Why underscore not? > > > > we only allowed underscore as some function names start with underscore > > in source files > > > >> > >> And regarding filtering numbers, is this something we want to do? > >> There's a bugreport regarding this: > >> https://bugzilla.gnome.org/show_bug.cgi?id=503366 > > > > most numbers are junk - especially in source files and would bloat up > > the index. > > > > we used to have an option where if a number was longer than x characters > > we would accept it (on the grounds it was a telephone number and > > therefore actually useful - im not sure if this preference is still > > available or used) > > An interesting limitation of that is the convention of writing numbers > like this (012) 345 6789. >
Yes, you are fully right. Probably the best option then is to make it configurable, disabled by default. There are probably lots of use cases needing full numbers being parsed which we are not aware of, and making that configurable is not a big work... _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
