Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> I add those Disallow lines so that, both app can crawl the same
> number of urls (approximatively 140),
>
> Disallow */basket-villeurbanne/author/*
> Disallow *?p=*
> Disallow */feed
>
> As it seems that Mnogosearch can manage a
Author: rafikCyc
Email:
Message:
I add those Disallow lines so that, both app can crawl the same
number of urls (approximatively 140),
Disallow */basket-villeurbanne/author/*
Disallow *?p=*
Disallow */feed
As it seems that Mnogosearch can manage a robots.txt, but not the
meta robots
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Here is the site : http://www.asbuers.com/
After crawling this site with mnoGoSearch, I did the following:
# Extracted the list of all documents found (478 documents)
mysql -uroot -N --database=tmp --execute="SELECT url FROM url"
Author: rafikCyc
Email:
Message:
Thank you for the reply.
Well, You're right...
With -P0 it does not have a limit of 1s.
But it remain very slow though.
--
I just did a quick speed test on a small site (500 documents)
Mnogosearch VS screaming frog.
The results :
Mnogosearch : 3.2 urls /
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
> Hello,
>
> I've tried this :
>
> ./indexer -p 0
>
> but it doesn't work :(
> The indexer sleeps for at least one seconde after each URL.
With -p0 it does not do any delays between URLs.
I guess the bottleneck is in the connection,
Author: rafikCyc
Email:
Message:
Hello,
I've tried this :
./indexer -p 0
but it doesn't work :(
The indexer sleeps for at least one seconde after each URL.
It seems impossible to index faster than 1s after each url.
To index 300 000 document on my website for example, the crawl takes 2 full