Hi Mildred, > > There is the special function fts:offsets that return the offsets > where the query was found in a text property. > > I was wondering, how can I later on take this offset and select an > extract to display where the text is found? I first thought that the > offset was a character offset and tried to split using that, but then > I noticed that didn't work. It's probably a word offset. > > Do you know how I could split split the text into words. Is there a > known algorithm (regular expression?). Or is there a special SPARQL > function that can return the list of words in the text, with all > punctuation and non words removed? > > Or better yet, is it possible to have an offset in bytes or characters ? >
I think you have done *all* the right questions :-) FTS, even if being probably the most known feature of Tracker in the desktops, has been just a side feature for core developers. It can even be disabled during build time, something that we do in our MeeGo Harmattan builds. This paragraph is the excuse to justify why we haven't given it more support lately :-) As you say, and IIRC, fts:offsets returns the index of the words in the text, without considering words that are not indexed (i.e. shorter than the minimum), and without considering punctuation. That is definitely not a good thing if you want to use fts:offsets, as you need to get exactly the same list of words as parsed by the Unicode FTS parsers in order to get the words matching. Currently there is no way of retrieving that list of words, and anyway it would be quite costly to expose an API which returns that list of words (costly in performance if we build the list each time we get it requested; or costly memory wise if we pregenerate and store in memory that list). So far, the best thing would be to really return the byte index of the words as found in the value of the properties. Note that this byte index will not be the byte index of the word in the original file, as the extraction depends on the file type. I believe Ottela had also some other concerns regarding fts:offsets, for example when working with multi-valued properties. Any help in improving FTS to make fts:offsets work better would be highly appreciated. -- Aleksander _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
