-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all,
I have been tweaking the crawler's settings and scheduling it to crawl all the time. I am taking automated backups from the Solr server. Furthermore, I found out that some sites are serving infinitely number of sub domains. For instance, the pirate bay is doing this: rss.rss.rss.rss.rss...rss.rss.rss.uj3wazyk5u4hnvtk.onion and as a result it is messing now the search results: https://ahmia.fi/search/?q=the+pirate+bay Because the crawler is re-building the index all the time and it is not allowed to follow these kind of problematic sub domain chains any more the problem will solve itself within a week. Greetings, Juha -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJT76RoAAoJELGTs54GL8vAzlcH+gIH0q8XmgOAs2/tcbs11Qxt +3wzLtB/UgmY6b3yGlWfYlYKJW2rnPsfERXmH850sJESZwhFuYNVqshSVVjHbohS +bO311bXAslPYrJYKt0ME8MtHmPBR4nvPIH5JNRmsuLxH7TD5MthbbFvC/vWk5Pi mbEYBcIm5jbPhaRuby0xbHO9q766uVx4iWafNTc5i11qqrdIZ1inJvET5MyUxZEL rge++BsYNqV2M4Dk55cMHNe4bUtAPMMlfVBl7b9li3aPVtSH6uJL40a/DggeeF9d DYBXEZ/CvPBStsh73R7V3KS9Ro78uU9Lxi2XkVA7SlBs+r4REQ2uSvPE5rBBU9o= =oGrf -----END PGP SIGNATURE----- _______________________________________________ tor-reports mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-reports
