Check your URL filters e.g. that you removed the lines below which are there by default
*# skip URLs containing certain characters as probable queries, etc.* *-[?*!@=]* Julien On 4 June 2013 22:30, Yves S. Garret <[email protected]> wrote: > Got another issue. When I run my crawler over google search results, I see > _nothing_ in my HBase table... why? > > This is what I'm trying to crawl: > > https://www.google.com/#output=search&sclient=psy-ab&q=xbox&oq=xbox&gs_l=hp.3..0l4.648.1180.0.1354.4.4.0.0.0.0.213.547.0j2j1.3.0...0.0...1c.1.15.psy-ab.jd107GllWZw&pbx=1&bav=on.2,or.r_cp.r_qf.&bvm=bv.47380653,d.eWU&fp=13d973d49a29d61d&biw=1280&bih=635 > > Here are my logs: > http://bin.cakephp.org/view/1619245280 > > Here is my $NUTCH_HOME/conf/nutch-site.xml: > http://bin.cakephp.org/view/1304119856 > > And the output that I see when I run the crawler: > http://bin.cakephp.org/view/260103467 > > In nutch-site.xml, I have all of the needed plugin.includes, I believe... > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

