Hi Sahar,

Can you post your:


 1.  crawl-urlfilter
 2.  nutch-site.xml

Also how are you running this program below?

I'm CC'ing nutch-user@ so the community can benefit from this thread.

Cheers,
Chris



On 1/20/10 1:42 PM, "sahar elkazaz" <saharelka...@hotmail.com> wrote:


Dear/ sirur

I have follow all steps on your article to run nutch

and use this java program to access the segments:

     package nutch;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.nutch.searcher.Hit;
import org.apache.nutch.searcher.HitDetails;
import org.apache.nutch.searcher.Hits;
import org.apache.nutch.searcher.NutchBean;
import org.apache.nutch.searcher.Query;
import org.apache.nutch.searcher.Summary;
import org.apache.nutch.util.NutchConfiguration;
public class nutch   {
  /** For debugging. */
  public static void main(String[] args) throws Exception {
     Configuration conf = NutchConfiguration.create();
       conf = NutchConfiguration.create();
      NutchBean bean = new NutchBean(conf);
    Query query = Query.parse("animal" +
            "", conf);
    Hits hits = bean.search(query, 10);
    System.out.println("Total hits: " + hits.getTotal());
    int length = (int)Math.min(hits.getTotal(), 10);
    Hit[] show = hits.getHits(0, length);
    HitDetails[] details = bean.getDetails(show);
 Summary[] summaries = bean.getSummary(details, query);
 for ( int i = 0; i <summaries.length-1;i++){
                System.out.println("hhhhhh");
      System.out.println(" "+i+" "+ details[i] + "\n" + summaries[i]);
    }
  }
}


and add the path of nutch to the classpath

but i recieve exceptions:
10/01/20 22:29:27 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed:
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
        at 
org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67)
        at 
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
     &nb sp;  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:89)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77)
        at nutch.nutch.main(nutch.java:25)
10/01/20 22:29:28 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed:
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.security.UnixUserGroupInformation 
.login(UnixUserGroupInformation.java:257)
        at 
org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67)
        at 
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at 
org.apache.nutch.searcher.LuceneSearchBean.<init>(LuceneSearchBean.java:50)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:102)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:7 7)
        at nutch.nutch.main(nutch.java:25)
10/01/20 22:29:28 INFO searcher.SearchBean: opening indexes in crawl/indexes
10/01/20 22:29:28 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed:
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
        at 
org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67)
        at 
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
     ;    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:59)
        at 
org.apache.nutch.searcher.LuceneSearchBean.init(LuceneSearchBean.java:77)
        at 
org.apache.nutch.searcher.LuceneSearchBean.<init>(LuceneSearchBean.java:51)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:102)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77)
        at nutch.nutch.main(nutch.java :25)
10/01/20 22:29:28 INFO plugin.PluginRepository: Plugins: looking in: 
D:\nutch-1.0\plugins
10/01/20 22:29:28 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
10/01/20 22:29:28 INFO plugin.PluginRepository: Registered Plugins:
10/01/20 22:29:28 INFO plugin.PluginRepository:         the nutch core 
extension points (nutch-extensionpoints)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Basic Query Filter 
(query-basic)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Basic URL Normalizer 
(urlnormalizer-basic)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Html Parse Plug-in 
(parse-html)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Basic Indexing Filter 
(index-basic)
10/01/20 22:29:28 INFO plugin.Plugi nRepository:         Site Query Filter 
(query-site)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Basic Summarizer 
Plug-in (summary-basic)
10/01/20 22:29:28 INFO plugin.PluginRepository:         HTTP Framework 
(lib-http)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Text Parse Plug-in 
(parse-text)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Pass-through URL 
Normalizer (urlnormalizer-pass)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Regex URL Filter 
(urlfilter-regex)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Http Protocol Plug-in 
(protocol-http)
10/01/20 22:29:28 INFO plugin.PluginRepository:     &n bsp;   XML Response 
Writer Plug-in (response-xml)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Regex URL Normalizer 
(urlnormalizer-regex)
10/01/20 22:29:28 INFO plugin.PluginRepository:         OPIC Scoring Plug-in 
(scoring-opic)
10/01/20 22:29:28 INFO plugin.PluginRepository:         CyberNeko HTML Parser 
(lib-nekohtml)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Anchor Indexing Filter 
(index-anchor)
10/01/20 22:29:28 INFO plugin.PluginRepository:         JavaScript Parser 
(parse-js)
10/01/20 22:29:28 INFO plugin.PluginRepository:         URL Query Filter 
(query-url)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Regex URL Filter 
Framewo rk (lib-regex-filter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         JSON Response Writer 
Plug-in (response-json)
10/01/20 22:29:28 INFO plugin.PluginRepository: Registered Extension-Points:
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Summarizer 
(org.apache.nutch.searcher.Summarizer)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Analysis 
(org.apache.nutch.analysis.NutchAnalyzer)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Field Filter 
(org.apache.nutch.indexer.field.FieldFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         HTML Parse Filter ( 
org.apache.nutch.parse.HtmlParseFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Query Filter 
(org.apache.nutch.searcher.QueryFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Search Results 
Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch URL Normalizer 
(org.apache.nutch.net.URLNormalizer)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Online Search 
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch I ndexing Filter 
(org.apache.nutch.indexer.IndexingFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Content Parser 
(org.apache.nutch.parse.Parser)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
10/01/20 22:29:28 INFO plugin.PluginRepository:         Ontology Model Loader 
(org.apache.nutch.ontology.Ontology)
10/01/20 22:29:28 INFO conf.Configuration: found resource common-terms.utf8 at 
file:/D:/nutch-1.0/conf/common-terms.utf8
10/01/20 22:29:28 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed:
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGrou 
pInformation.login(UnixUserGroupInformation.java:275)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
        at 
org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformation.java:67)
        at 
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at 
org.apache.nutch.searcher.FetchedSegments.<init>(FetchedSegments.java:204)
        at org.apache.nutch.searcher.Nutc hBean.<init>(NutchBean.java:110)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77)
        at nutch.nutch.main(nutch.java:25)
10/01/20 22:29:28 INFO searcher.SummarizerFactory: Using the first summarizer 
extension found: Basic Summarizer
10/01/20 22:29:28 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed:
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at 
org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
        at org.apache.hadoop.security.UserGroupInformation. 
login(UserGroupInformation.java:67)
        at 
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
        at org.apache.nutch.crawl.LinkDbReader.init(LinkDbReader.java:59)
        at org.apache.nutch.crawl.LinkDbReader.<init>(LinkDbReader.java:55)
        at org.apache.nutch.searcher.LinkDbInlinks.<init>(LinkDbInlinks.java:42)
        at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:113)
         at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:77)
        at nutch.nutch.main(nutch.java:25)
Total hits: 0

iam using netbesn6.8

Best regards,
Sahar Elkazaz
Demonstrator and Ms.c Student.
Faculty of Electronic Engineering.
Computer Science & Engineering Dep.
Menoufia Univ.
Egypt.
Email: saharelka...@hotmail.com <mailto:saharelka...@hotmail.com> .



________________________________
Windows Live:  Friends get your Flickr, Yelp, and Digg updates when they e-mail 
you. 
<http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_3:092010>


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to