Re: [ADMIN] full-text search question

2008-06-18 Thread Andrew Sullivan
On Wed, Jun 18, 2008 at 02:49:48PM +0200, Sabbiolina wrote: > www.google.com is only treated as a unique word? Why not producing multiple > tokens like www.google.com, www, ., google, ., com? (obviously www and . can > be nulled or stopworded). You wouldn't want to get the token ".". It's not a t

Re: [ADMIN] full-text search question

2008-06-18 Thread Oleg Bartunov
Sabbiolina, you have two options: 1. Write you very own parser 2. Write dictionary, which breaks host to parts Fortunately, you can use our dict_regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html) instead of 2. Oleg On Wed, 18 Jun 2008, Sabbiolina wrote: Hello, I've seen that

[ADMIN] full-text search question

2008-06-18 Thread Sabbiolina
Hello, I've seen that the default parser for the full-text search can identify e-mail addresses, hosts, URLs… but I have a serious problem with it: Suppose I index the following sentence "the search engine I use the most is www.google.com" And I search "google" no result is found. Instead