My current project needed to tokenize the text in HTML without the tags. The easy solution for us was to license a library from Chilkat that supported text extraction then tokenize that. I'm on my phone at the moment but could supply more details later if desired.
SDR On Feb 13, 2014 1:02 PM, "David King" <dk...@ketralnis.com> wrote: > > New to Sqlite, anybody knows is there a HTML tokenizer for full text > search, > > Or do I need to implement my own? > > There isn't an HTML tokeniser. But the default tokeniser considers > punctuation like <> to be word breaks so it may already work for you with > the down side that things like <div class="foo">Hello!</div> will consider > "div", "class", "foo", and "hello" as words. (Rather than the just "hello" > that you may be after) > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users