Hi guys.
I wonder if anyone has ever faced the experience on web crawling that the number of crawled counts differs between MCF0.4 and MCF0.5. I crawled some portal sites on intranet using MCF0.4 and MCF0.5. MCF0.4 crawled over 12000 contents, and meanwhile, MCF0.5 crawled only around half of the contents. I ran MCF0.4 on PostgreSQL and MCF0.5 on MySQL. I hope changing DB does not affect the crawling results: MCF0.4: - Crawled Counts: 12000 and over - Solr3.5 - PostgreSQL 9.1.3 - Tomcat6 - Max Hop on Links: 15 - Max Hop on Redirects: 10 - Include only hosts matching seeds: Checked - org.apache.manifoldcf.crawler.threads: 50 - org.apache.manifoldcf.database.maxhandles: 100 MCF0.5: - Crawled Counts: around 6000 - Solr3.5 - MySQL5.5 - Tomcat6 - Max Hop on Links: 15 - Max Hop on Redirects: 10 - Include only hosts matching seeds: Checked - org.apache.manifoldcf.crawler.threads: 50 - org.apache.manifoldcf.database.maxhandles: 100 Does anyone have any ideas?
