As my linux server is a virtual dedicated server and more often it goes outofmemory error, I wont be able to do fetch there right now. I need to upgrade the server or stop all applications running in that and test. This will take time. That is why I was trying to fetch from windows and move that crawled db into the linux box.
Thanks for the reponses. Sean Dean-3 wrote: > > For debugging purposes, could you re-fetch that segment or at least create > a small new segment and fetch it under Linux? > > I want to see if you can get search results from it or not. It might help > us determine if its a problem with Nutch, or something else more specific. > > > ----- Original Message ---- > From: kan001 <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, March 6, 2007 11:05:04 AM > Subject: Re: [SOLVED] moving crawled db from windows to linux > > > I have crawled in windows and searched with tomcat that is installed in > windows. It is working perfectly fine. > Then I moved the same crawled directory and files to linux and searche > with > the tomcat that is installed in that linux machine. It is giving 0 hits. I > have changed the searcher.dir property and I think it is connecting. > Because > in the logs, the following statements have been printed... Any idea?? > > INFO [TP-Processor1] (NutchBean.java:69) - creating new bean > INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in > /home/nutch-0.8/crawl/indexes > INFO [TP-Processor1] (Configuration.java:360) - found resource > common-terms.utf8 at > file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 > INFO [TP-Processor1] (NutchBean.java:143) - opening segments in > /home/nutch-0.8/crawl/segments > INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first > summarizer extension found: Basic Summarizer > INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in > /home/nutch-0.8/crawl/linkdb > INFO [TP-Processor1] (search_jsp.java:108) - query request from > 192.168.1.64 > INFO [TP-Processor1] (search_jsp.java:151) - query: > INFO [TP-Processor1] (search_jsp.java:152) - lang: > INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits > INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 > > INFO [TP-Processor5] (search_jsp.java:108) - query request from > 192.168.1.64 > INFO [TP-Processor5] (search_jsp.java:151) - query: ads > INFO [TP-Processor5] (search_jsp.java:152) - lang: en > INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits > INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 > > > > > Sean Dean-3 wrote: >> >> Everything looks okay in terms of the files. >> >> When you copied everything over from windows, other then the operating >> system is there anything different with the software? >> >> Maybe you have an old windows style path somewhere (C:\Nutch\Crawl)? Also >> double check to see if your "searcher.dir" property inside your >> nutch-site.xml file is correct. >> >> >> ----- Original Message ---- >> From: kan001 <[EMAIL PROTECTED]> >> To: [email protected] >> Sent: Monday, March 5, 2007 11:48:56 PM >> Subject: Re: [SOLVED] moving crawled db from windows to linux >> >> >> Thanks for the immediate reply. >> >> please find the result from du -h crawl/ command and the logs below: >> 32K crawl/crawldb/current/part-00000 >> 36K crawl/crawldb/current >> 40K crawl/crawldb >> 120K crawl/index >> 128K crawl/indexes/part-00000 >> 132K crawl/indexes >> 52K crawl/linkdb/current/part-00000 >> 56K crawl/linkdb/current >> 60K crawl/linkdb >> 40K crawl/segments/20070228143239/content/part-00000 >> 44K crawl/segments/20070228143239/content >> 20K crawl/segments/20070228143239/crawl_fetch/part-00000 >> 24K crawl/segments/20070228143239/crawl_fetch >> 12K crawl/segments/20070228143239/crawl_generate >> 12K crawl/segments/20070228143239/crawl_parse >> 20K crawl/segments/20070228143239/parse_data/part-00000 >> 24K crawl/segments/20070228143239/parse_data >> 24K crawl/segments/20070228143239/parse_text/part-00000 >> 28K crawl/segments/20070228143239/parse_text >> 148K crawl/segments/20070228143239 >> 136K crawl/segments/20070228143249/content/part-00000 >> 140K crawl/segments/20070228143249/content >> 20K crawl/segments/20070228143249/crawl_fetch/part-00000 >> 24K crawl/segments/20070228143249/crawl_fetch >> 12K crawl/segments/20070228143249/crawl_generate >> 28K crawl/segments/20070228143249/crawl_parse >> 32K crawl/segments/20070228143249/parse_data/part-00000 >> 36K crawl/segments/20070228143249/parse_data >> 44K crawl/segments/20070228143249/parse_text/part-00000 >> 48K crawl/segments/20070228143249/parse_text >> 292K crawl/segments/20070228143249 >> 20K crawl/segments/20070228143327/content/part-00000 >> 24K crawl/segments/20070228143327/content >> 20K crawl/segments/20070228143327/crawl_fetch/part-00000 >> 24K crawl/segments/20070228143327/crawl_fetch >> 16K crawl/segments/20070228143327/crawl_generate >> 12K crawl/segments/20070228143327/crawl_parse >> 20K crawl/segments/20070228143327/parse_data/part-00000 >> 24K crawl/segments/20070228143327/parse_data >> 20K crawl/segments/20070228143327/parse_text/part-00000 >> 24K crawl/segments/20070228143327/parse_text >> 128K crawl/segments/20070228143327 >> 20K crawl/segments/20070228143434/content/part-00000 >> 24K crawl/segments/20070228143434/content >> 20K crawl/segments/20070228143434/crawl_fetch/part-00000 >> 24K crawl/segments/20070228143434/crawl_fetch >> 16K crawl/segments/20070228143434/crawl_generate >> 12K crawl/segments/20070228143434/crawl_parse >> 20K crawl/segments/20070228143434/parse_data/part-00000 >> 24K crawl/segments/20070228143434/parse_data >> 20K crawl/segments/20070228143434/parse_text/part-00000 >> 24K crawl/segments/20070228143434/parse_text >> 128K crawl/segments/20070228143434 >> 700K crawl/segments >> 1.1M crawl/ >> >> INFO [TP-Processor1] (Configuration.java:397) - parsing >> jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml >> INFO [TP-Processor1] (Configuration.java:397) - parsing >> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml >> INFO [TP-Processor1] (Configuration.java:397) - parsing >> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml >> INFO [TP-Processor1] (Configuration.java:397) - parsing >> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml >> INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking >> in: >> /usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins >> INFO [TP-Processor1] (PluginRepository.java:333) - Plugin Auto-activation >> mode: [true] >> INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins: >> INFO [TP-Processor1] (PluginRepository.java:341) - CyberNeko HTML >> Parser (lib-nekohtml) >> INFO [TP-Processor1] (PluginRepository.java:341) - Site Query Filter >> (query-site) >> INFO [TP-Processor1] (PluginRepository.java:341) - Html Parse Plug-in >> (parse-html) >> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter >> Framework (lib-regex-filter) >> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Indexing >> Filter (index-basic) >> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Summarizer >> Plug-in (summary-basic) >> INFO [TP-Processor1] (PluginRepository.java:341) - Text Parse Plug-in >> (parse-text) >> INFO [TP-Processor1] (PluginRepository.java:341) - JavaScript Parser >> (parse-js) >> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter >> (urlfilter-regex) >> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Query Filter >> (query-basic) >> INFO [TP-Processor1] (PluginRepository.java:341) - HTTP Framework >> (lib-http) >> INFO [TP-Processor1] (PluginRepository.java:341) - URL Query Filter >> (query-url) >> INFO [TP-Processor1] (PluginRepository.java:341) - Http Protocol >> Plug-in (protocol-http) >> INFO [TP-Processor1] (PluginRepository.java:341) - the nutch core >> extension points (nutch-extensionpoints) >> INFO [TP-Processor1] (PluginRepository.java:341) - OPIC Scoring >> Plug-in >> (scoring-opic) >> INFO [TP-Processor1] (PluginRepository.java:345) - Registered >> Extension-Points: >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Summarizer >> (org.apache.nutch.searcher.Summarizer) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Scoring >> (org.apache.nutch.scoring.ScoringFilter) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Protocol >> (org.apache.nutch.protocol.Protocol) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch URL Filter >> (org.apache.nutch.net.URLFilter) >> INFO [TP-Processor1] (PluginRepository.java:352) - HTML Parse Filter >> (org.apache.nutch.parse.HtmlParseFilter) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Online >> Search >> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Indexing >> Filter (org.apache.nutch.indexer.IndexingFilter) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Content >> Parser >> (org.apache.nutch.parse.Parser) >> INFO [TP-Processor1] (PluginRepository.java:352) - Ontology Model >> Loader (org.apache.nutch.ontology.Ontology) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Analysis >> (org.apache.nutch.analysis.NutchAnalyzer) >> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Query Filter >> (org.apache.nutch.searcher.QueryFilter) >> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean >> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in >> /home/nutch-0.8/crawl/indexes >> INFO [TP-Processor1] (Configuration.java:360) - found resource >> common-terms.utf8 at >> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 >> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in >> /home/nutch-0.8/crawl/segments >> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first >> summarizer extension found: Basic Summarizer >> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in >> /home/nutch-0.8/crawl/linkdb >> INFO [TP-Processor1] (search_jsp.java:108) - query request from >> 192.168.1.64 >> INFO [TP-Processor1] (search_jsp.java:151) - query: >> INFO [TP-Processor1] (search_jsp.java:152) - lang: >> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits >> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 >> >> INFO [TP-Processor5] (search_jsp.java:108) - query request from >> 192.168.1.64 >> INFO [TP-Processor5] (search_jsp.java:151) - query: ads >> INFO [TP-Processor5] (search_jsp.java:152) - lang: en >> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits >> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 >> >> >> >> >> >> kan001 wrote: >>> >>> When I copied crawled db from windows to linux and trying to search >>> through tomcat in linux - it returns 0 hits. >>> But in windows its getting results from search screen. Any idea?? I have >>> given root permissions to the crawled db. >>> In the logs it is showing - oening segments.... But hits 0!!! >>> >> >> -- >> View this message in context: >> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034 >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > -- > View this message in context: > http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9335094 > Sent from the Nutch - User mailing list archive at Nabble.com. > -- View this message in context: http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9343523 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
