Please reply to me directly as well, as I'm not on the nutch-dev list regularly.
I'm curious ... Googlebot, Yahoo Slurp, and now CazoodleBot (based on Nutch) are hitting our site at http://www.nines.org and I get all sorts of invalid links crawled. Is our site doing something wrong in our markup? Or are all these crawlers flawed by hitting non-sensible URLs? Thanks, Erik Begin forwarded message: > From: Application Error <[EMAIL PROTECTED]> > Date: July 10, 2007 6:31:45 AM EDT > To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] > Subject: [Collex] application#index > (ActionController::RoutingError) "no route found to match \"/nines/ > escape(document.title) u,\" with {:method=>:get}" > > A ActionController::RoutingError occurred in application#index: > > no route found to match "/nines/ escape(document.title) u," with > {:method=>:get} > [RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ > routing.rb:1292:in `recognize_path' > > ------------------------------- > Request: > ------------------------------- > > * URL: http://www.nines.org/nines/+escape(document.title)+u, > * Parameters: {} > * Rails root: /usr/local/patacriticism/production-web/current > > ------------------------------- > Session: > ------------------------------- > > * @data: {"flash"=>{}} > * @new_session: true > * @write_lock: true > * @session_id: "7faa9cc7b50752112ed9a1b8aab220ec" > > ------------------------------- > Environment: > ------------------------------- > > * DOCUMENT_ROOT : /usr/local/patacriticism/production-web/ > current/public > * FCGI_ROLE : RESPONDER > * GATEWAY_INTERFACE : CGI/1.1 > * HTTP_ACCEPT_ENCODING: x-gzip, gzip > * HTTP_HOST : www.nines.org > * HTTP_USER_AGENT : CazoodleBot/Nutch-0.9-dev (CazoodleBot > Crawler; http://www.cazoodle.com/cazoodlebot; > [EMAIL PROTECTED]) > * PATH : /usr/jdk1.5.0_06/bin:/usr/local/nines/ > bin:/opt/csw/bin:/usr/local/bin:/uva/bin:/usr/bin:/opt/SUNWspro/ > bin:/usr/ucb:/usr/openwin/bin:/usr/dt/bin:/usr/ccs/bin:/gnu/bin:/ > contrib/bin:/home/loadl/bin:/usr/local/ant/bin > * QUERY_STRING : > * REDIRECT_STATUS : 200 > * REDIRECT_UNIQUE_ID : RpNgEYCPFU0AACruKkQ > * REDIRECT_URL : /nines/+escape(document.title)+u, > * REMOTE_ADDR : 72.36.94.103 > * REMOTE_PORT : 58936 > * REQUEST_METHOD : GET > * REQUEST_URI : /nines/+escape(document.title)+u, > * SCRIPT_FILENAME : /usr/local/patacriticism/production-web/ > current/public/dispatch.fcgi > * SCRIPT_NAME : /dispatch.fcgi > * SERVER_ADDR : 128.143.21.77 > * SERVER_ADMIN : [EMAIL PROTECTED] > * SERVER_NAME : nines.org > * SERVER_PORT : 80 > * SERVER_PROTOCOL : HTTP/1.0 > * SERVER_SIGNATURE : <ADDRESS>Apache/1.3.36 Server at > nines.org Port 80</ADDRESS> > * SERVER_SOFTWARE : Apache/1.3.36 (Unix) mod_fastcgi/2.4.2 > PHP/4.3.3 mod_perl/1.29 > * TZ : US/Eastern > * UNIQUE_ID : RpNgEYCPFU0AACruKkQ > > * Process: 10999 > * Server : > > ------------------------------- > Backtrace: > ------------------------------- > > [RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ > routing.rb:1292:in `recognize_path' > [RAILS_ROOT]/vendor/rails/actionpack/lib/action_controller/ > routing.rb:1282:in `recognize' > [RAILS_ROOT]/vendor/rails/railties/lib/dispatcher.rb:40:in > `dispatch' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:168:in > `process_request' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:143:in > `process_each_request!' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:109:in > `with_signal_handler' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:142:in > `process_each_request!' > /opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:612:in > `each_cgi' > /opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in `each' > /opt/csw/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:609:in > `each_cgi' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:141:in > `process_each_request!' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:55:in > `process!' > [RAILS_ROOT]/vendor/rails/railties/lib/fcgi_handler.rb:25:in > `process!' > [RAILS_ROOT]/public/dispatch.fcgi:24 > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
