+1
This is a ridiculous size of tmp for a crawldb of minimal size.
There is clearly something wrong

On Friday, February 8, 2013, Tejas Patil <[email protected]> wrote:
> I dont think there is any such property. Maybe its time for you to cleanup
> /tmp :)
>
> Thanks,
> Tejas Patil
>
>
> On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <[email protected]
>wrote:
>
>> Hi lewis an tejas again.
>> I have point the hadoop.tmp.dir property but nutch still consuming to
much
>> space for me.
>> Is posible to reduce the space of nutch in my tmp folder with some
>> property of a fetcher process? I always get an exception because the hard
>> disk is full. my crawldb only have 150 MB not more. but my tmp folder
>> continue increasing without control until 60 GB, and fail at this point.
>> please any help
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Eyeris Rodriguez Rueda" <[email protected]>
>> Para: [email protected]
>> Enviados: Viernes, 8 de Febrero 2013 10:45:52
>> Asunto: Re: Could not find any valid local directory for output/file.out
>>
>> Thanks a lot. lewis and tejas, you are very helpfull for me.
>> It function ok, I have pointed to another partition and ok.
>> Problem solved.
>>
>>
>>
>>
>>
>> ----- Mensaje original -----
>> De: "Tejas Patil" <[email protected]>
>> Para: [email protected]
>> Enviados: Jueves, 7 de Febrero 2013 16:32:33
>> Asunto: Re: Could not find any valid local directory for output/file.out
>>
>> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <[email protected]
>> >wrote:
>>
>> > Thank to all for your replies.
>> > If i want to change the default location for hadoop job(/tmp), where i
>> can
>> > do that ?, because my nutch-site.xml not include nothing pointing to
>> /tmp.
>> >
>> Add this property to nutch-site.xml with appropriate value:
>>
>> <property>
>> <name>hadoop.tmp.dir</name>
>> <value>XXXXXXXXXX</value>
>> </property>
>>
>>
>>
>> > So I have readed about nutch and hadoop but im not sure to understand
at
>> > all. Is posible to use nutch 1.5.1 in distributed mode ?
>>
>> yes
>>
>>
>> > In this case what i need to do for that, I really appreciated your
answer
>> > because I can“t find a good documentation for this topic.
>> >
>> For distributed mode, Nutch is called from runtime/deploy. The conf files
>> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
>> So modify the runtime/local/conf/nutch-site.xml to set
>> http.agent.nameproperly.  I am assuming that the hadoop setup is in
>> place and hadoop
>> variables are exported. Now, run the nutch commands from runtime/deploy.
>>
>> Thanks,
>> Tejas Patil
>>
>> >
>> >
>> >
>> > ----- Mensaje original -----
>> > De: "Tejas Patil" <[email protected]>
>> > Para: [email protected]
>> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
>> > Asunto: Re: Could not find any valid local directory for
output/file.out
>> >
>> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used
by
>> > hadoop to store temporary data required for a job. If you dont
over-ride
>> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
>> > case, /tmp doesnt have ample space left so better over-ride that
property
>> > and point it to some other location which has ample space.
>> >
>> > Thanks,
>> > Tejas Patil
>> >
>> >
>> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <[email protected]
>> > >wrote:
>> >
>> > > Thanks lewis by your answer.
>> > > My doubt is why /tmp is increasing while crawl process is doing, and
>> why
>> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my
nutch
>> > > site not have properties hadoop.tmp.dir. I need reduce the space used
>> for
>> > > that folder because I only have 40 GB for nutch machine and 50 GB for
>> > solr
>> > > machine. Please some advice or expla

-- 
*Lewis*

Reply via email to