Hi Sahar, Can you post your:
1. crawl-urlfilter 2. nutch-site.xml Also how are you running this program below? I'm CC'ing nutch-user@ so the community can benefit from this thread. Cheers, Chris On 1/20/10 1:42 PM, "sahar elkazaz" <saharelka...@hotmail.com> wrote: Dear/ sirur I have follow all steps on your article to run nutch and use this java program to access the segments: package nutch; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.nutch.searcher.Hit; import org.apache.nutch.searcher.HitDetails; import org.apache.nutch.searcher.Hits; import org.apache.nutch.searcher.NutchBean; import org.apache.nutch.searcher.Query; import org.apache.nutch.searcher.Summary; import org.apache.nutch.util.NutchConfiguration; public class nutch { /** For debugging. */ public static void main(String[] args) throws Exception { Configuration conf = NutchConfiguration.create(); conf = NutchConfiguration.create(); NutchBean bean = new NutchBean(conf); Query query = Query.parse("animal" + "", conf); Hits hits = bean.search(query, 10); System.out.println("Total hits: " + hits.getTotal()); int length = (int)Math.min(hits.getTotal(), 10); Hit[] show = hits.getHits(0, length); HitDetails[] details = bean.getDetails(show); Summary[] summaries = bean.getSummary(details, query); for ( int i = 0; i <summaries.length-1;i++){ System.out.println("hhhhhh"); System.out.println(" "+i+" "+ details[i] + "\n" + summaries[i]); } } } and add the path of nutch to the classpath but i recieve exceptions: 10/01/20 22:29:27 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257) at org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) &nb sp; at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:89) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77) at nutch.nutch.main(nutch.java:25) 10/01/20 22:29:28 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation .login(UnixUserGroupInformation.java:257) at org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.searcher.LuceneSearchBean.<init>(LuceneSearchBean.java:50) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:102) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:7 7) at nutch.nutch.main(nutch.java:25) 10/01/20 22:29:28 INFO searcher.SearchBean: opening indexes in crawl/indexes 10/01/20 22:29:28 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257) at org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) ; at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:59) at org.apache.nutch.searcher.LuceneSearchBean.init(LuceneSearchBean.java:77) at org.apache.nutch.searcher.LuceneSearchBean.<init>(LuceneSearchBean.java:51) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:102) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77) at nutch.nutch.main(nutch.java :25) 10/01/20 22:29:28 INFO plugin.PluginRepository: Plugins: looking in: D:\nutch-1.0\plugins 10/01/20 22:29:28 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true] 10/01/20 22:29:28 INFO plugin.PluginRepository: Registered Plugins: 10/01/20 22:29:28 INFO plugin.PluginRepository: the nutch core extension points (nutch-extensionpoints) 10/01/20 22:29:28 INFO plugin.PluginRepository: Basic Query Filter (query-basic) 10/01/20 22:29:28 INFO plugin.PluginRepository: Basic URL Normalizer (urlnormalizer-basic) 10/01/20 22:29:28 INFO plugin.PluginRepository: Html Parse Plug-in (parse-html) 10/01/20 22:29:28 INFO plugin.PluginRepository: Basic Indexing Filter (index-basic) 10/01/20 22:29:28 INFO plugin.Plugi nRepository: Site Query Filter (query-site) 10/01/20 22:29:28 INFO plugin.PluginRepository: Basic Summarizer Plug-in (summary-basic) 10/01/20 22:29:28 INFO plugin.PluginRepository: HTTP Framework (lib-http) 10/01/20 22:29:28 INFO plugin.PluginRepository: Text Parse Plug-in (parse-text) 10/01/20 22:29:28 INFO plugin.PluginRepository: Pass-through URL Normalizer (urlnormalizer-pass) 10/01/20 22:29:28 INFO plugin.PluginRepository: Regex URL Filter (urlfilter-regex) 10/01/20 22:29:28 INFO plugin.PluginRepository: Http Protocol Plug-in (protocol-http) 10/01/20 22:29:28 INFO plugin.PluginRepository: &n bsp; XML Response Writer Plug-in (response-xml) 10/01/20 22:29:28 INFO plugin.PluginRepository: Regex URL Normalizer (urlnormalizer-regex) 10/01/20 22:29:28 INFO plugin.PluginRepository: OPIC Scoring Plug-in (scoring-opic) 10/01/20 22:29:28 INFO plugin.PluginRepository: CyberNeko HTML Parser (lib-nekohtml) 10/01/20 22:29:28 INFO plugin.PluginRepository: Anchor Indexing Filter (index-anchor) 10/01/20 22:29:28 INFO plugin.PluginRepository: JavaScript Parser (parse-js) 10/01/20 22:29:28 INFO plugin.PluginRepository: URL Query Filter (query-url) 10/01/20 22:29:28 INFO plugin.PluginRepository: Regex URL Filter Framewo rk (lib-regex-filter) 10/01/20 22:29:28 INFO plugin.PluginRepository: JSON Response Writer Plug-in (response-json) 10/01/20 22:29:28 INFO plugin.PluginRepository: Registered Extension-Points: 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Protocol (org.apache.nutch.protocol.Protocol) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Field Filter (org.apache.nutch.indexer.field.FieldFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: HTML Parse Filter ( org.apache.nutch.parse.HtmlParseFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Search Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch URL Filter (org.apache.nutch.net.URLFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch I ndexing Filter (org.apache.nutch.indexer.IndexingFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Content Parser (org.apache.nutch.parse.Parser) 10/01/20 22:29:28 INFO plugin.PluginRepository: Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 10/01/20 22:29:28 INFO plugin.PluginRepository: Ontology Model Loader (org.apache.nutch.ontology.Ontology) 10/01/20 22:29:28 INFO conf.Configuration: found resource common-terms.utf8 at file:/D:/nutch-1.0/conf/common-terms.utf8 10/01/20 22:29:28 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGrou pInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257) at org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.searcher.FetchedSegments.<init>(FetchedSegments.java:204) at org.apache.nutch.searcher.Nutc hBean.<init>(NutchBean.java:110) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77) at nutch.nutch.main(nutch.java:25) 10/01/20 22:29:28 INFO searcher.SummarizerFactory: Using the first summarizer extension found: Basic Summarizer 10/01/20 22:29:28 WARN fs.FileSystem: uri=file:/// javax.security.auth.login.LoginException: Login failed: at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257) at org.apache.hadoop.security.UserGroupInformation. login(UserGroupInformation.java:67) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) at org.apache.nutch.crawl.LinkDbReader.init(LinkDbReader.java:59) at org.apache.nutch.crawl.LinkDbReader.<init>(LinkDbReader.java:55) at org.apache.nutch.searcher.LinkDbInlinks.<init>(LinkDbInlinks.java:42) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:113) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77) at nutch.nutch.main(nutch.java:25) Total hits: 0 iam using netbesn6.8 Best regards, Sahar Elkazaz Demonstrator and Ms.c Student. Faculty of Electronic Engineering. Computer Science & Engineering Dep. Menoufia Univ. Egypt. Email: saharelka...@hotmail.com <mailto:saharelka...@hotmail.com> . ________________________________ Windows Live: Friends get your Flickr, Yelp, and Digg updates when they e-mail you. <http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_3:092010> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++