Hi, Nutch Gurus, I need to crawl two dynamically pages
1. http://example.com and 2. http://example.com?request_locale=es_US The difference is that when the query parameter "request_locale" equals "es_US", Spanish content is loaded. We would like to be able to crawl both the URLs if possible. I have passed these urls in my seed.txt but have the logs show that only the first URL is being crawled, but not the second. I modified the regex-normalize.xml to not strip out query parameters and is given below. How do I configure Nutch to crawl both URLs? Kartik <regex-normalize> <!-- removes session ids from urls (such as jsessionid and PHPSESSID) --> <regex> <pattern>(?i)(;?\b_?(l|j|bv_)?(sid|phpsessid|sessionid)=.*?)(\?|&|#|$)</pattern> <substitution>$4</substitution> </regex> <!-- changes default pages into standard for /index.html, etc. into / <regex> <pattern>/((?i)index|default)\.((?i)js[pf]{1}?[afx]?|cgi|cfm|asp[x]?|[psx]?htm[l]?|php[3456]?)(\?|&|#|$)</pattern> <substitution>/$3</substitution> </regex> --> <!-- removes interpage href anchors such as site.com#location --> <regex> <pattern>#.*?(\?|&|$)</pattern> <substitution>$1</substitution> </regex> <!-- cleans ?&var=value into ?var=value --> <regex> <pattern>\?&</pattern> <substitution>\?</substitution> </regex> <!-- cleans multiple sequential ampersands into a single ampersand --> <regex> <pattern>&{2,}</pattern> <substitution>&</substitution> </regex> <!-- removes trailing ? --> <regex> <pattern>[\?&\.]$</pattern> <substitution></substitution> </regex> <!-- removes duplicate slashes --> <regex> <pattern>(?<!:)/{2,}</pattern> <substitution>/</substitution> </regex> </regex-normalize> ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.

