Im using ubuntu server 12.04 only for nutch, I have asigned 40 GB for this. Is /tmp needed for nutch crawl process ? or i can make a crontab for delete /tmp content without problem for nutch crawl.
----- Mensaje original ----- De: "Tejas Patil" <[email protected]> Para: [email protected] Enviados: Viernes, 8 de Febrero 2013 14:33:25 Asunto: Re: Could not find any valid local directory for output/file.out I dont think there is any such property. Maybe its time for you to cleanup /tmp :) Thanks, Tejas Patil On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <[email protected]>wrote: > Hi lewis an tejas again. > I have point the hadoop.tmp.dir property but nutch still consuming to much > space for me. > Is posible to reduce the space of nutch in my tmp folder with some > property of a fetcher process? I always get an exception because the hard > disk is full. my crawldb only have 150 MB not more. but my tmp folder > continue increasing without control until 60 GB, and fail at this point. > please any help > > > > > ----- Mensaje original ----- > De: "Eyeris Rodriguez Rueda" <[email protected]> > Para: [email protected] > Enviados: Viernes, 8 de Febrero 2013 10:45:52 > Asunto: Re: Could not find any valid local directory for output/file.out > > Thanks a lot. lewis and tejas, you are very helpfull for me. > It function ok, I have pointed to another partition and ok. > Problem solved. > > > > > > ----- Mensaje original ----- > De: "Tejas Patil" <[email protected]> > Para: [email protected] > Enviados: Jueves, 7 de Febrero 2013 16:32:33 > Asunto: Re: Could not find any valid local directory for output/file.out > > On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <[email protected] > >wrote: > > > Thank to all for your replies. > > If i want to change the default location for hadoop job(/tmp), where i > can > > do that ?, because my nutch-site.xml not include nothing pointing to > /tmp. > > > Add this property to nutch-site.xml with appropriate value: > > <property> > <name>hadoop.tmp.dir</name> > <value>XXXXXXXXXX</value> > </property> > > > > > So I have readed about nutch and hadoop but im not sure to understand at > > all. Is posible to use nutch 1.5.1 in distributed mode ? > > yes > > > > In this case what i need to do for that, I really appreciated your answer > > because I can“t find a good documentation for this topic. > > > For distributed mode, Nutch is called from runtime/deploy. The conf files > should be modified in runtime/local/conf, not in $NUTCH_HOME/conf. > So modify the runtime/local/conf/nutch-site.xml to set > http.agent.nameproperly. I am assuming that the hadoop setup is in > place and hadoop > variables are exported. Now, run the nutch commands from runtime/deploy. > > Thanks, > Tejas Patil > > > > > > > > > ----- Mensaje original ----- > > De: "Tejas Patil" <[email protected]> > > Para: [email protected] > > Enviados: Jueves, 7 de Febrero 2013 14:04:26 > > Asunto: Re: Could not find any valid local directory for output/file.out > > > > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by > > hadoop to store temporary data required for a job. If you dont over-ride > > hadoop.tmp.dir in any config file, it will use /tmp by default. In your > > case, /tmp doesnt have ample space left so better over-ride that property > > and point it to some other location which has ample space. > > > > Thanks, > > Tejas Patil > > > > > > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <[email protected] > > >wrote: > > > > > Thanks lewis by your answer. > > > My doubt is why /tmp is increasing while crawl process is doing, and > why > > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch > > > site not have properties hadoop.tmp.dir. I need reduce the space used > for > > > that folder because I only have 40 GB for nutch machine and 50 GB for > > solr > > > machine. Please some advice or explanation will be accepted. > > > Thanks for your time. > > > > > > > > > > > > ----- Mensaje original ----- > > > De: "Lewis John Mcgibbney" <[email protected]> > > > Para: [email protected] > > > Enviados: Jueves, 7 de Febrero 2013 13:06:11 > > > Asunto: Re: Could not find any valid local directory for > output/file.out > > > > > > Hi, > > > > > > > > > > > > https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching > > > > > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <[email protected]> > > > wrote: > > > > Hi all. > > > > I have a problem when i do a crawl for few hour or days, im using > nutch > > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to > > fix > > > this problem, im intersted in make a crawl process without limit with > 10 > > > cicles or more but i have problem with space on hard disk, i have > > detected > > > that /etc/tmp have 29 GB used and is not good for me, any body can help > > me > > > or give some advices for configure nutch to make at least one crawl > > process > > > without problems ? > > > > > > > > here some features of my environment > > > > Ram 2 GB > > > > CPU:QuadCore(but im using only 2 cores) > > > > Hard Disk:40 GB > > > > Threads:50 > > > > db.fetch.interval.default=2 days > > > > > > > > > > > > > > > > this is a part of my log file when nutch fails: > > > > > > > > **************************************************************** > > > > 2013-02-06 18:45:25,961 INFO fetcher.Fetcher - fetching > > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf > > > > 2013-02-06 18:45:25,964 INFO fetcher.Fetcher - fetching > > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf > > > > 2013-02-06 18:45:25,977 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=49 > > > > 2013-02-06 18:45:26,109 INFO fetcher.Fetcher - -activeThreads=49, > > > spinWaiting=39, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:26,180 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=48 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=47 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=46 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=44 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=45 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=40 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=39 > > > > 2013-02-06 18:45:26,332 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=38 > > > > 2013-02-06 18:45:26,332 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=37 > > > > 2013-02-06 18:45:26,332 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=36 > > > > 2013-02-06 18:45:26,332 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=35 > > > > 2013-02-06 18:45:26,332 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=34 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=33 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=32 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=31 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=30 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=29 > > > > 2013-02-06 18:45:26,333 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=28 > > > > 2013-02-06 18:45:26,334 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=27 > > > > 2013-02-06 18:45:26,334 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=26 > > > > 2013-02-06 18:45:26,334 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=25 > > > > 2013-02-06 18:45:26,334 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=24 > > > > 2013-02-06 18:45:26,334 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=23 > > > > 2013-02-06 18:45:26,335 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=22 > > > > 2013-02-06 18:45:26,335 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=21 > > > > 2013-02-06 18:45:26,335 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=20 > > > > 2013-02-06 18:45:26,335 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=19 > > > > 2013-02-06 18:45:26,335 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=18 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=41 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=17 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=15 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=13 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=12 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=42 > > > > 2013-02-06 18:45:26,331 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=43 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=9 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=10 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=11 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=14 > > > > 2013-02-06 18:45:26,336 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=16 > > > > 2013-02-06 18:45:26,404 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=8 > > > > 2013-02-06 18:45:26,630 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=7 > > > > 2013-02-06 18:45:27,069 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=6 > > > > 2013-02-06 18:45:27,110 INFO fetcher.Fetcher - -activeThreads=6, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:27,129 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=5 > > > > 2013-02-06 18:45:28,110 INFO fetcher.Fetcher - -activeThreads=5, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:28,502 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=4 > > > > 2013-02-06 18:45:29,111 INFO fetcher.Fetcher - -activeThreads=4, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:30,123 INFO fetcher.Fetcher - -activeThreads=4, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:31,127 INFO fetcher.Fetcher - -activeThreads=4, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:31,187 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=3 > > > > 2013-02-06 18:45:32,171 INFO fetcher.Fetcher - -activeThreads=3, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:32,206 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=2 > > > > 2013-02-06 18:45:33,173 INFO fetcher.Fetcher - -activeThreads=2, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:34,173 INFO fetcher.Fetcher - -activeThreads=2, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:34,205 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=1 > > > > 2013-02-06 18:45:34,457 INFO fetcher.Fetcher - -finishing thread > > > FetcherThread, activeThreads=0 > > > > 2013-02-06 18:45:35,174 INFO fetcher.Fetcher - -activeThreads=0, > > > spinWaiting=0, fetchQueues.totalSize=0 > > > > 2013-02-06 18:45:35,174 INFO fetcher.Fetcher - -activeThreads=0 > > > > 2013-02-06 18:45:35,742 WARN mapred.LocalJobRunner - job_local_0015 > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > any > > > valid local directory for output/file.out > > > > at > > > > > > > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) > > > > at > > > > > > > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) > > > > at > > > > > > > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) > > > > at > > > > > > > > > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69) > > > > at > > > > > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640) > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323) > > > > at > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > > > > > > > > > -- > > > *Lewis* > > > > > >

