on Fri Jul 06 2007, David Abrahams <dave-UB3wUj7V41K5azolltMz9laTQe2KTcn/-AT-public.gmane.org> wrote:
> on Fri Jul 06 2007, David Abrahams > <dave-UB3wUj7V41K5azolltMz9laTQe2KTcn/-AT-public.gmane.org> wrote: > >> on Fri Jul 06 2007, skip-AT-pobox.com wrote: >> >>> Try these two settings >>> >>> x-pick_apart_urls:True >>> x-lookup_ip:True >>> >>> and see if they help. > > Oh, and these go in the [Tokenizer] section, right? > >> Well, they sure make training slow to a crawl! >> Is there any effective way of cacheing those DNS lookups? > > I did eventually find the lookup_ip_cache option, but frankly the > results are disappointing. I would have expected one slow round in my > train-to-exhaustion regime and then all following rounds to go very > quickly, but that doesn't appear to be the case. The first round took > 18.5 minutes and it doesn't look like the 2nd round is going to be > much faster. Oh, and right now the dnscache file is 414 bytes long > and is full of stuff that mostly doesn't look like it has any > relevance to dns lookup. I realize I shouldn't expect to be able to > read a pickle by eye, but there is one string in there that looks like > a domain name so I expect to see the others. Well, I eventually got training to finish, but I don't notice any improvement in accuracy. It may even have gotten worse; I've had a few false negatives since enabling those options, and in general I *never* see those. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev