I have crawled in windows and searched with tomcat that is installed in windows. It is working perfectly fine. Then I moved the same crawled directory and files to linux and searche with the tomcat that is installed in that linux machine. It is giving 0 hits. I have changed the searcher.dir property and I think it is connecting. Because in the logs, the following statements have been printed... Any idea??
INFO [TP-Processor1] (NutchBean.java:69) - creating new bean INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in /home/nutch-0.8/crawl/indexes INFO [TP-Processor1] (Configuration.java:360) - found resource common-terms.utf8 at file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 INFO [TP-Processor1] (NutchBean.java:143) - opening segments in /home/nutch-0.8/crawl/segments INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first summarizer extension found: Basic Summarizer INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in /home/nutch-0.8/crawl/linkdb INFO [TP-Processor1] (search_jsp.java:108) - query request from 192.168.1.64 INFO [TP-Processor1] (search_jsp.java:151) - query: INFO [TP-Processor1] (search_jsp.java:152) - lang: INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 INFO [TP-Processor5] (search_jsp.java:108) - query request from 192.168.1.64 INFO [TP-Processor5] (search_jsp.java:151) - query: ads INFO [TP-Processor5] (search_jsp.java:152) - lang: en INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 Sean Dean-3 wrote: > > Everything looks okay in terms of the files. > > When you copied everything over from windows, other then the operating > system is there anything different with the software? > > Maybe you have an old windows style path somewhere (C:\Nutch\Crawl)? Also > double check to see if your "searcher.dir" property inside your > nutch-site.xml file is correct. > > > ----- Original Message ---- > From: kan001 <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, March 5, 2007 11:48:56 PM > Subject: Re: [SOLVED] moving crawled db from windows to linux > > > Thanks for the immediate reply. > > please find the result from du -h crawl/ command and the logs below: > 32K crawl/crawldb/current/part-00000 > 36K crawl/crawldb/current > 40K crawl/crawldb > 120K crawl/index > 128K crawl/indexes/part-00000 > 132K crawl/indexes > 52K crawl/linkdb/current/part-00000 > 56K crawl/linkdb/current > 60K crawl/linkdb > 40K crawl/segments/20070228143239/content/part-00000 > 44K crawl/segments/20070228143239/content > 20K crawl/segments/20070228143239/crawl_fetch/part-00000 > 24K crawl/segments/20070228143239/crawl_fetch > 12K crawl/segments/20070228143239/crawl_generate > 12K crawl/segments/20070228143239/crawl_parse > 20K crawl/segments/20070228143239/parse_data/part-00000 > 24K crawl/segments/20070228143239/parse_data > 24K crawl/segments/20070228143239/parse_text/part-00000 > 28K crawl/segments/20070228143239/parse_text > 148K crawl/segments/20070228143239 > 136K crawl/segments/20070228143249/content/part-00000 > 140K crawl/segments/20070228143249/content > 20K crawl/segments/20070228143249/crawl_fetch/part-00000 > 24K crawl/segments/20070228143249/crawl_fetch > 12K crawl/segments/20070228143249/crawl_generate > 28K crawl/segments/20070228143249/crawl_parse > 32K crawl/segments/20070228143249/parse_data/part-00000 > 36K crawl/segments/20070228143249/parse_data > 44K crawl/segments/20070228143249/parse_text/part-00000 > 48K crawl/segments/20070228143249/parse_text > 292K crawl/segments/20070228143249 > 20K crawl/segments/20070228143327/content/part-00000 > 24K crawl/segments/20070228143327/content > 20K crawl/segments/20070228143327/crawl_fetch/part-00000 > 24K crawl/segments/20070228143327/crawl_fetch > 16K crawl/segments/20070228143327/crawl_generate > 12K crawl/segments/20070228143327/crawl_parse > 20K crawl/segments/20070228143327/parse_data/part-00000 > 24K crawl/segments/20070228143327/parse_data > 20K crawl/segments/20070228143327/parse_text/part-00000 > 24K crawl/segments/20070228143327/parse_text > 128K crawl/segments/20070228143327 > 20K crawl/segments/20070228143434/content/part-00000 > 24K crawl/segments/20070228143434/content > 20K crawl/segments/20070228143434/crawl_fetch/part-00000 > 24K crawl/segments/20070228143434/crawl_fetch > 16K crawl/segments/20070228143434/crawl_generate > 12K crawl/segments/20070228143434/crawl_parse > 20K crawl/segments/20070228143434/parse_data/part-00000 > 24K crawl/segments/20070228143434/parse_data > 20K crawl/segments/20070228143434/parse_text/part-00000 > 24K crawl/segments/20070228143434/parse_text > 128K crawl/segments/20070228143434 > 700K crawl/segments > 1.1M crawl/ > > INFO [TP-Processor1] (Configuration.java:397) - parsing > jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml > INFO [TP-Processor1] (Configuration.java:397) - parsing > file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml > INFO [TP-Processor1] (Configuration.java:397) - parsing > file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml > INFO [TP-Processor1] (Configuration.java:397) - parsing > file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml > INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking in: > /usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins > INFO [TP-Processor1] (PluginRepository.java:333) - Plugin Auto-activation > mode: [true] > INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins: > INFO [TP-Processor1] (PluginRepository.java:341) - CyberNeko HTML > Parser (lib-nekohtml) > INFO [TP-Processor1] (PluginRepository.java:341) - Site Query Filter > (query-site) > INFO [TP-Processor1] (PluginRepository.java:341) - Html Parse Plug-in > (parse-html) > INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter > Framework (lib-regex-filter) > INFO [TP-Processor1] (PluginRepository.java:341) - Basic Indexing > Filter (index-basic) > INFO [TP-Processor1] (PluginRepository.java:341) - Basic Summarizer > Plug-in (summary-basic) > INFO [TP-Processor1] (PluginRepository.java:341) - Text Parse Plug-in > (parse-text) > INFO [TP-Processor1] (PluginRepository.java:341) - JavaScript Parser > (parse-js) > INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter > (urlfilter-regex) > INFO [TP-Processor1] (PluginRepository.java:341) - Basic Query Filter > (query-basic) > INFO [TP-Processor1] (PluginRepository.java:341) - HTTP Framework > (lib-http) > INFO [TP-Processor1] (PluginRepository.java:341) - URL Query Filter > (query-url) > INFO [TP-Processor1] (PluginRepository.java:341) - Http Protocol > Plug-in (protocol-http) > INFO [TP-Processor1] (PluginRepository.java:341) - the nutch core > extension points (nutch-extensionpoints) > INFO [TP-Processor1] (PluginRepository.java:341) - OPIC Scoring > Plug-in > (scoring-opic) > INFO [TP-Processor1] (PluginRepository.java:345) - Registered > Extension-Points: > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Summarizer > (org.apache.nutch.searcher.Summarizer) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Protocol > (org.apache.nutch.protocol.Protocol) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch URL Filter > (org.apache.nutch.net.URLFilter) > INFO [TP-Processor1] (PluginRepository.java:352) - HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Online Search > Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Indexing > Filter (org.apache.nutch.indexer.IndexingFilter) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Content > Parser > (org.apache.nutch.parse.Parser) > INFO [TP-Processor1] (PluginRepository.java:352) - Ontology Model > Loader (org.apache.nutch.ontology.Ontology) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Analysis > (org.apache.nutch.analysis.NutchAnalyzer) > INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Query Filter > (org.apache.nutch.searcher.QueryFilter) > INFO [TP-Processor1] (NutchBean.java:69) - creating new bean > INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in > /home/nutch-0.8/crawl/indexes > INFO [TP-Processor1] (Configuration.java:360) - found resource > common-terms.utf8 at > file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 > INFO [TP-Processor1] (NutchBean.java:143) - opening segments in > /home/nutch-0.8/crawl/segments > INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first > summarizer extension found: Basic Summarizer > INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in > /home/nutch-0.8/crawl/linkdb > INFO [TP-Processor1] (search_jsp.java:108) - query request from > 192.168.1.64 > INFO [TP-Processor1] (search_jsp.java:151) - query: > INFO [TP-Processor1] (search_jsp.java:152) - lang: > INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits > INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 > > INFO [TP-Processor5] (search_jsp.java:108) - query request from > 192.168.1.64 > INFO [TP-Processor5] (search_jsp.java:151) - query: ads > INFO [TP-Processor5] (search_jsp.java:152) - lang: en > INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits > INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 > > > > > > kan001 wrote: >> >> When I copied crawled db from windows to linux and trying to search >> through tomcat in linux - it returns 0 hits. >> But in windows its getting results from search screen. Any idea?? I have >> given root permissions to the crawled db. >> In the logs it is showing - oening segments.... But hits 0!!! >> > > -- > View this message in context: > http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034 > Sent from the Nutch - User mailing list archive at Nabble.com. > -- View this message in context: http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9335094 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
