+1 This is a ridiculous size of tmp for a crawldb of minimal size. There is clearly something wrong
On Friday, February 8, 2013, Tejas Patil <[email protected]> wrote: > I dont think there is any such property. Maybe its time for you to cleanup > /tmp :) > > Thanks, > Tejas Patil > > > On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <[email protected] >wrote: > >> Hi lewis an tejas again. >> I have point the hadoop.tmp.dir property but nutch still consuming to much >> space for me. >> Is posible to reduce the space of nutch in my tmp folder with some >> property of a fetcher process? I always get an exception because the hard >> disk is full. my crawldb only have 150 MB not more. but my tmp folder >> continue increasing without control until 60 GB, and fail at this point. >> please any help >> >> >> >> >> ----- Mensaje original ----- >> De: "Eyeris Rodriguez Rueda" <[email protected]> >> Para: [email protected] >> Enviados: Viernes, 8 de Febrero 2013 10:45:52 >> Asunto: Re: Could not find any valid local directory for output/file.out >> >> Thanks a lot. lewis and tejas, you are very helpfull for me. >> It function ok, I have pointed to another partition and ok. >> Problem solved. >> >> >> >> >> >> ----- Mensaje original ----- >> De: "Tejas Patil" <[email protected]> >> Para: [email protected] >> Enviados: Jueves, 7 de Febrero 2013 16:32:33 >> Asunto: Re: Could not find any valid local directory for output/file.out >> >> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <[email protected] >> >wrote: >> >> > Thank to all for your replies. >> > If i want to change the default location for hadoop job(/tmp), where i >> can >> > do that ?, because my nutch-site.xml not include nothing pointing to >> /tmp. >> > >> Add this property to nutch-site.xml with appropriate value: >> >> <property> >> <name>hadoop.tmp.dir</name> >> <value>XXXXXXXXXX</value> >> </property> >> >> >> >> > So I have readed about nutch and hadoop but im not sure to understand at >> > all. Is posible to use nutch 1.5.1 in distributed mode ? >> >> yes >> >> >> > In this case what i need to do for that, I really appreciated your answer >> > because I can“t find a good documentation for this topic. >> > >> For distributed mode, Nutch is called from runtime/deploy. The conf files >> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf. >> So modify the runtime/local/conf/nutch-site.xml to set >> http.agent.nameproperly. I am assuming that the hadoop setup is in >> place and hadoop >> variables are exported. Now, run the nutch commands from runtime/deploy. >> >> Thanks, >> Tejas Patil >> >> > >> > >> > >> > ----- Mensaje original ----- >> > De: "Tejas Patil" <[email protected]> >> > Para: [email protected] >> > Enviados: Jueves, 7 de Febrero 2013 14:04:26 >> > Asunto: Re: Could not find any valid local directory for output/file.out >> > >> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by >> > hadoop to store temporary data required for a job. If you dont over-ride >> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your >> > case, /tmp doesnt have ample space left so better over-ride that property >> > and point it to some other location which has ample space. >> > >> > Thanks, >> > Tejas Patil >> > >> > >> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <[email protected] >> > >wrote: >> > >> > > Thanks lewis by your answer. >> > > My doubt is why /tmp is increasing while crawl process is doing, and >> why >> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch >> > > site not have properties hadoop.tmp.dir. I need reduce the space used >> for >> > > that folder because I only have 40 GB for nutch machine and 50 GB for >> > solr >> > > machine. Please some advice or expla -- *Lewis*

