and my hadoop.log reads

2012-11-28 16:28:21,735 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2012-11-28 16:28:22,804 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2012-11-28 16:28:25,804 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2012-11-28 16:28:25,805 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2012-11-28 16:28:25,805 INFO  crawl.AbstractFetchSchedule -
defaultInterval=2592000
2012-11-28 16:28:25,805 INFO  crawl.AbstractFetchSchedule -
maxInterval=7776000
2012-11-28 16:28:28,789 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup


On Wed, Nov 28, 2012 at 4:35 PM, Nicholas Roberts <
[email protected]> wrote:

> correction, my mistake, I am getting a different nullpointer error
>
> Exception in thread "main" java.lang.NullPointerException
>  at java.util.Hashtable.put(Hashtable.java:411)
>  at java.util.Properties.setProperty(Properties.java:160)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:438)
>  at
> org.apache.nutch.indexer.IndexerJob.createIndexJob(IndexerJob.java:128)
> at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:44)
>  at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
> at org.apache.nutch.crawl.Crawler.run(Crawler.java:192)
>  at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>
>
>
> On Wed, Nov 28, 2012 at 4:18 PM, Nicholas Roberts <
> [email protected]> wrote:
>
>> I am working from this tutorial and get a similar error
>> http://nlp.solutions.asia/?p=180
>>
>>
>> On Fri, Nov 2, 2012 at 1:13 PM, cocofan <[email protected]> wrote:
>>
>>> On 12-11-02 12:45 PM, Lewis John Mcgibbney wrote:
>>>
>>>> Hi,
>>>>
>>>> On Fri, Nov 2, 2012 at 5:36 PM, cocofan <[email protected]> wrote:
>>>>
>>>>  2012-11-01 14:46:52,027 ERROR security.UserGroupInformation -
>>>>> PriviledgedActionException as:cocofan
>>>>>
>>>> I've never seen this Exception before...honestly.
>>>>
>>>>  cause:org.apache.hadoop.**mapreduce.lib.input.**InvalidInputException:
>>>>> Input
>>>>> path does not exist:
>>>>> file:/home/cocofan/Dropbox/**project/apache-nutch-2.1/**
>>>>> runtime/local/bin/urls
>>>>> 2012-11-01 14:46:52,027 ERROR crawl.InjectorJob - InjectorJob:
>>>>> org.apache.hadoop.mapreduce.**lib.input.**InvalidInputException:
>>>>> Input path does
>>>>> not exist:
>>>>>
>>>> The rest seems to be pretty straight forward. You appear to be running
>>>> nutch from $NUTCH_HOME/runtime/local/bin with the following command
>>>> ./nutch XYZ
>>>>
>>>              I am running nutch from /runtime/local and I do have the
>>> urls directory in both /runtime/local/bin and /runtime/local (with the
>>> seed.txt file in both).
>>>
>>>             The command I'm using is (from /runtime/local):
>>>                                ./bin/nutch crawl urls -solr
>>> http://localhost:8983/solr/ -depth 3 -topN 5
>>>
>>>            Actually it seems to be a problem with hadoop so I was
>>> wondering if I need to set a directory in a config file there?
>>>
>>>
>>>  Unless you urls directory is located in the ./bin directory (which I
>>>> doubt it is) then you should come up one directory and run the command
>>>> from $NUTCH_HOME/runtime/local e.g. ./bin/nutch XYZ
>>>>
>>>> Does this make sense? Please read the tutorial carefully and
>>>> thoroughly and it will work perfectly.
>>>>
>>>> hth
>>>>
>>>> Lewis
>>>>
>>>>
>>>
>>
>>
>> --
>> --
>> Nicholas Roberts
>> US 510-684-8264
>> http://Permaculture.TV
>>
>>
>
>
> --
> --
> Nicholas Roberts
> US 510-684-8264
> http://Permaculture.TV
>
>


-- 
-- 
Nicholas Roberts
US 510-684-8264
http://Permaculture.TV

Reply via email to