My current project needed to tokenize the text in HTML without the tags.
The easy solution for us was to license a library from Chilkat that
supported text extraction then tokenize that. I'm on my phone at the moment
but could supply more details later if desired.

SDR
On Feb 13, 2014 1:02 PM, "David King" <dk...@ketralnis.com> wrote:

> > New to Sqlite, anybody knows is there a HTML tokenizer for full text
> search,
> > Or do I need to implement my own?
>
> There isn't an HTML tokeniser. But the default tokeniser considers
> punctuation like <> to be word breaks so it may already work for you with
> the down side that things like <div class="foo">Hello!</div> will consider
> "div", "class", "foo", and "hello" as words. (Rather than the just "hello"
> that you may be after)
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to