will update you once I am done with that testing... just stuck there :(
kan001 wrote: > > As my linux server is a virtual dedicated server and more often it goes > outofmemory error, I wont be able to do fetch there right now. I need to > upgrade the server or stop all applications running in that and test. This > will take time. That is why I was trying to fetch from windows and move > that crawled db into the linux box. > > Thanks for the reponses. > > > > Sean Dean-3 wrote: >> >> For debugging purposes, could you re-fetch that segment or at least >> create a small new segment and fetch it under Linux? >> >> I want to see if you can get search results from it or not. It might help >> us determine if its a problem with Nutch, or something else more >> specific. >> >> >> ----- Original Message ---- >> From: kan001 <[EMAIL PROTECTED]> >> To: [email protected] >> Sent: Tuesday, March 6, 2007 11:05:04 AM >> Subject: Re: [SOLVED] moving crawled db from windows to linux >> >> >> I have crawled in windows and searched with tomcat that is installed in >> windows. It is working perfectly fine. >> Then I moved the same crawled directory and files to linux and searche >> with >> the tomcat that is installed in that linux machine. It is giving 0 hits. >> I >> have changed the searcher.dir property and I think it is connecting. >> Because >> in the logs, the following statements have been printed... Any idea?? >> >> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean >> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in >> /home/nutch-0.8/crawl/indexes >> INFO [TP-Processor1] (Configuration.java:360) - found resource >> common-terms.utf8 at >> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 >> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in >> /home/nutch-0.8/crawl/segments >> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first >> summarizer extension found: Basic Summarizer >> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in >> /home/nutch-0.8/crawl/linkdb >> INFO [TP-Processor1] (search_jsp.java:108) - query request from >> 192.168.1.64 >> INFO [TP-Processor1] (search_jsp.java:151) - query: >> INFO [TP-Processor1] (search_jsp.java:152) - lang: >> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits >> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 >> >> INFO [TP-Processor5] (search_jsp.java:108) - query request from >> 192.168.1.64 >> INFO [TP-Processor5] (search_jsp.java:151) - query: ads >> INFO [TP-Processor5] (search_jsp.java:152) - lang: en >> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits >> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 >> >> >> >> >> Sean Dean-3 wrote: >>> >>> Everything looks okay in terms of the files. >>> >>> When you copied everything over from windows, other then the operating >>> system is there anything different with the software? >>> >>> Maybe you have an old windows style path somewhere (C:\Nutch\Crawl)? >>> Also >>> double check to see if your "searcher.dir" property inside your >>> nutch-site.xml file is correct. >>> >>> >>> ----- Original Message ---- >>> From: kan001 <[EMAIL PROTECTED]> >>> To: [email protected] >>> Sent: Monday, March 5, 2007 11:48:56 PM >>> Subject: Re: [SOLVED] moving crawled db from windows to linux >>> >>> >>> Thanks for the immediate reply. >>> >>> please find the result from du -h crawl/ command and the logs below: >>> 32K crawl/crawldb/current/part-00000 >>> 36K crawl/crawldb/current >>> 40K crawl/crawldb >>> 120K crawl/index >>> 128K crawl/indexes/part-00000 >>> 132K crawl/indexes >>> 52K crawl/linkdb/current/part-00000 >>> 56K crawl/linkdb/current >>> 60K crawl/linkdb >>> 40K crawl/segments/20070228143239/content/part-00000 >>> 44K crawl/segments/20070228143239/content >>> 20K crawl/segments/20070228143239/crawl_fetch/part-00000 >>> 24K crawl/segments/20070228143239/crawl_fetch >>> 12K crawl/segments/20070228143239/crawl_generate >>> 12K crawl/segments/20070228143239/crawl_parse >>> 20K crawl/segments/20070228143239/parse_data/part-00000 >>> 24K crawl/segments/20070228143239/parse_data >>> 24K crawl/segments/20070228143239/parse_text/part-00000 >>> 28K crawl/segments/20070228143239/parse_text >>> 148K crawl/segments/20070228143239 >>> 136K crawl/segments/20070228143249/content/part-00000 >>> 140K crawl/segments/20070228143249/content >>> 20K crawl/segments/20070228143249/crawl_fetch/part-00000 >>> 24K crawl/segments/20070228143249/crawl_fetch >>> 12K crawl/segments/20070228143249/crawl_generate >>> 28K crawl/segments/20070228143249/crawl_parse >>> 32K crawl/segments/20070228143249/parse_data/part-00000 >>> 36K crawl/segments/20070228143249/parse_data >>> 44K crawl/segments/20070228143249/parse_text/part-00000 >>> 48K crawl/segments/20070228143249/parse_text >>> 292K crawl/segments/20070228143249 >>> 20K crawl/segments/20070228143327/content/part-00000 >>> 24K crawl/segments/20070228143327/content >>> 20K crawl/segments/20070228143327/crawl_fetch/part-00000 >>> 24K crawl/segments/20070228143327/crawl_fetch >>> 16K crawl/segments/20070228143327/crawl_generate >>> 12K crawl/segments/20070228143327/crawl_parse >>> 20K crawl/segments/20070228143327/parse_data/part-00000 >>> 24K crawl/segments/20070228143327/parse_data >>> 20K crawl/segments/20070228143327/parse_text/part-00000 >>> 24K crawl/segments/20070228143327/parse_text >>> 128K crawl/segments/20070228143327 >>> 20K crawl/segments/20070228143434/content/part-00000 >>> 24K crawl/segments/20070228143434/content >>> 20K crawl/segments/20070228143434/crawl_fetch/part-00000 >>> 24K crawl/segments/20070228143434/crawl_fetch >>> 16K crawl/segments/20070228143434/crawl_generate >>> 12K crawl/segments/20070228143434/crawl_parse >>> 20K crawl/segments/20070228143434/parse_data/part-00000 >>> 24K crawl/segments/20070228143434/parse_data >>> 20K crawl/segments/20070228143434/parse_text/part-00000 >>> 24K crawl/segments/20070228143434/parse_text >>> 128K crawl/segments/20070228143434 >>> 700K crawl/segments >>> 1.1M crawl/ >>> >>> INFO [TP-Processor1] (Configuration.java:397) - parsing >>> jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml >>> INFO [TP-Processor1] (Configuration.java:397) - parsing >>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml >>> INFO [TP-Processor1] (Configuration.java:397) - parsing >>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml >>> INFO [TP-Processor1] (Configuration.java:397) - parsing >>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml >>> INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking >>> in: >>> /usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins >>> INFO [TP-Processor1] (PluginRepository.java:333) - Plugin >>> Auto-activation >>> mode: [true] >>> INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins: >>> INFO [TP-Processor1] (PluginRepository.java:341) - CyberNeko HTML >>> Parser (lib-nekohtml) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Site Query Filter >>> (query-site) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Html Parse >>> Plug-in >>> (parse-html) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter >>> Framework (lib-regex-filter) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Indexing >>> Filter (index-basic) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Summarizer >>> Plug-in (summary-basic) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Text Parse >>> Plug-in >>> (parse-text) >>> INFO [TP-Processor1] (PluginRepository.java:341) - JavaScript Parser >>> (parse-js) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter >>> (urlfilter-regex) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Query >>> Filter >>> (query-basic) >>> INFO [TP-Processor1] (PluginRepository.java:341) - HTTP Framework >>> (lib-http) >>> INFO [TP-Processor1] (PluginRepository.java:341) - URL Query Filter >>> (query-url) >>> INFO [TP-Processor1] (PluginRepository.java:341) - Http Protocol >>> Plug-in (protocol-http) >>> INFO [TP-Processor1] (PluginRepository.java:341) - the nutch core >>> extension points (nutch-extensionpoints) >>> INFO [TP-Processor1] (PluginRepository.java:341) - OPIC Scoring >>> Plug-in >>> (scoring-opic) >>> INFO [TP-Processor1] (PluginRepository.java:345) - Registered >>> Extension-Points: >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Summarizer >>> (org.apache.nutch.searcher.Summarizer) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Scoring >>> (org.apache.nutch.scoring.ScoringFilter) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Protocol >>> (org.apache.nutch.protocol.Protocol) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch URL Filter >>> (org.apache.nutch.net.URLFilter) >>> INFO [TP-Processor1] (PluginRepository.java:352) - HTML Parse Filter >>> (org.apache.nutch.parse.HtmlParseFilter) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Online >>> Search >>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Indexing >>> Filter (org.apache.nutch.indexer.IndexingFilter) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Content >>> Parser >>> (org.apache.nutch.parse.Parser) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Ontology Model >>> Loader (org.apache.nutch.ontology.Ontology) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Analysis >>> (org.apache.nutch.analysis.NutchAnalyzer) >>> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Query >>> Filter >>> (org.apache.nutch.searcher.QueryFilter) >>> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean >>> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in >>> /home/nutch-0.8/crawl/indexes >>> INFO [TP-Processor1] (Configuration.java:360) - found resource >>> common-terms.utf8 at >>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8 >>> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in >>> /home/nutch-0.8/crawl/segments >>> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first >>> summarizer extension found: Basic Summarizer >>> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in >>> /home/nutch-0.8/crawl/linkdb >>> INFO [TP-Processor1] (search_jsp.java:108) - query request from >>> 192.168.1.64 >>> INFO [TP-Processor1] (search_jsp.java:151) - query: >>> INFO [TP-Processor1] (search_jsp.java:152) - lang: >>> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits >>> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0 >>> >>> INFO [TP-Processor5] (search_jsp.java:108) - query request from >>> 192.168.1.64 >>> INFO [TP-Processor5] (search_jsp.java:151) - query: ads >>> INFO [TP-Processor5] (search_jsp.java:152) - lang: en >>> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits >>> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0 >>> >>> >>> >>> >>> >>> kan001 wrote: >>>> >>>> When I copied crawled db from windows to linux and trying to search >>>> through tomcat in linux - it returns 0 hits. >>>> But in windows its getting results from search screen. Any idea?? I >>>> have >>>> given root permissions to the crawled db. >>>> In the logs it is showing - oening segments.... But hits 0!!! >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034 >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >> >> -- >> View this message in context: >> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9335094 >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > -- View this message in context: http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9378870 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
