====================================================================== exorbyte GmbH Sebastian Nagel Softwareentwickler Line-Eid-Str. 1 | D-78467 Konstanz Telefon: 0049 7531 363 39 15 | Telefax: 0049 7531 363 39 01 Email: [email protected] | Internet: www.exorbyte.de ______________________________________________________________________ Exorbyte ausgezeichnet: Landespreis 2010 und Rising Star 2010 – Die neuen Awards... http://www.exorbyte.de/auszeichnungen Exorbyte Commerce Search überzeugt: Die fehlertolerante Produktsuche für Online-Shops mit neuen Features – Als Webservice schnell in Ihren Shop einbauen und sofort nutzen... http://www.exorbyte.de/commerce-search/neue-commerce-search-features Praktischer Ratgeber: Wie Sie mehr Umsatz aus Ihrer Shop-Suche holen – Kostenlos downloaden... http://www.exorbyte.de/ratgeber ______________________________________________________________________ Registergericht: AG Freiburg, HRB 381802 Umsatzsteuer-ID: DE213331910 Geschäftsführer: Gero Lüben, Benno Nieswand Just add a rule to your regex-normalize.xml: <!-- lowercase URLs --> <regex> <pattern>([A-Z]+)</pattern> <substitution>\L$1</substitution> </regex> \L transforms the matched sequence $1 to lowercases, see http://jakarta.apache.org/oro/api/org/apache/oro/text/regex/Perl5Substitution.html which is smarter (and faster) than <regex> <pattern>A</pattern> <substitution>a</substitution> </regex> <regex> <pattern>B</pattern> <substitution>b</substitution> </regex> ... Of course you could write also a URL normalizer plug-in. This could be aware of the fact that some servers are case-sensitive, i.e., return a 404 for the lowercased URL. On 06/04/2011 04:12 PM, Marseld Dedgjonaj wrote:
Hello Everyone, I am using nutch-1.2 + Solr-1.3 to index a site. I see in my results that nutch-1.2 considers "www.mysite.com/default.aspx" and "www.mysite.com/DEFAULT.ASPX" as 2 different sites. While the site is a aspx site, the url should be not case sensitive. Any help ore suggestion how to have case insensitive during crawl? Thanks in advance. Best Regards, Marseldi

