Laurent Aguerreche wrote: > Le mercredi 15 novembre 2006 à 18:05 +0100, Javier Arantegui a écrit : >> Hi, >> >> I have several documents where appear "Ziegler-Nichols". I can find >> the documents searching for "ziegler-nichols" but I cannot find >> anything if I look for "ziegler" or "nichols". Is there any way to >> find them looking for "Ziegler"? >> >> I'm using the tracker-search-tool (0.5.1) and tracker (0.5.1) > > Currently it is not possible and it is not normal... I do not know if > QDBM (which stores file names associated with keywords) can be set to > split string like "ziegler-nichols" into "ziegler" and "nichols" > automatically for searching or if we need to split strings ourselves.
we would need to do this as QDBM is just a hash table. Im not sure we should though? underscores and hyphens are not treated as word breaks (it would be possible to do both - index the hyphenated term and its individual parts and I will look into this as we need to do this for filenames anyhow) > > What I also dislike with libstemmer (which aims to "reduce" strings to > radicals to ignore plural for instance) is that it does not ignore > accentuated characters, so if I have a file which contains "éléphant", > then "élephant" or "elephant" will not be found. "éléphant" is the > correct orthography but it happens very often that french people miss > some accents or add superflus ones... and it is the same problem in > other languages. > I am surprised that is happening because we normalize all utf8 strings before stemming (stemming can be turned off or can be set to french by setting the language code to "fr" in the config file but I am not sure if it is the stemmer) obviously misspelt words or incorrect accents will be problematic but not sure how to get around that? -- Mr Jamie McCracken http://jamiemcc.livejournal.com/ _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
