I have lot of space left on the /tmp . I don't have separate partition for /tmp... i have a folder called /tmp... There is lot of space left.. close to 1.3Terabytes...
1.4T 55G 1.3T 5% / tmpfs 3.8G 0 3.8G 0% /lib/init/rw varrun 3.8G 120K 3.8G 1% /var/run varlock 3.8G 0 3.8G 0% /var/lock udev 3.8G 152K 3.8G 1% /dev tmpfs 3.8G 0 3.8G 0% /dev/shm lrm 3.8G 2.5M 3.8G 1% /lib/modules/2.6.28-15-server/volatile /dev/sda5 228M 29M 187M 14% /boot /dev/sr0 388K 388K 0 100% /media/cdrom0 I also noticed that /tmp/hadoop-root directory is 6.8 Gb... I have attached the jstack of the process that is doing the update.... below 2009-11-02 19:11:54 Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode): "Attach Listener" daemon prio=10 tid=0x0000000041bb1000 nid=0xd3b waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Comm thread for attempt_local_0001_r_000000_0" daemon prio=10 tid=0x00007f3ff4002800 nid=0x6b8f waiting on condition [0x00007f4000e97000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.Task$1.run(Task.java:403) at java.lang.Thread.run(Thread.java:619) "Thread-12" prio=10 tid=0x0000000041b37800 nid=0x25f3 runnable [0x00007f4000f98000] java.lang.Thread.State: RUNNABLE at java.lang.Byte.hashCode(Byte.java:394) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:882) at org.apache.hadoop.io.AbstractMapWritable.addToMap(AbstractMapWritable.java:78) - locked <0x00007f47ef4d9310> (a org.apache.hadoop.io.MapWritable) at org.apache.hadoop.io.AbstractMapWritable.<init>(AbstractMapWritable.java:128) at org.apache.hadoop.io.MapWritable.<init>(MapWritable.java:42) at org.apache.hadoop.io.MapWritable.<init>(MapWritable.java:52) at org.apache.nutch.crawl.CrawlDatum.set(CrawlDatum.java:321) at org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:73) at org.apache.nutch.crawl.CrawlDbReducer.reduce(CrawlDbReducer.java:35) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) "Low Memory Detector" daemon prio=10 tid=0x00007f3ffc004000 nid=0x25d0 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00007f3ffc001000 nid=0x25cf waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00000000417be800 nid=0x25ce waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00000000417bc800 nid=0x25cd runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x000000004179e000 nid=0x25cc in Object.wait() [0x00007f40016f7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f400f63e6c0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00007f400f63e6c0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0000000041797000 nid=0x25cb in Object.wait() [0x00007f40017f8000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f400f63e6f8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00007f400f63e6f8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0000000041734000 nid=0x25c5 waiting on condition [0x00007f49d75c2000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1152) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94) at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150) "VM Thread" prio=10 tid=0x0000000041790000 nid=0x25ca runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004173e000 nid=0x25c6 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000041740000 nid=0x25c7 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000041742000 nid=0x25c8 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000041744000 nid=0x25c9 runnable "VM Periodic Task Thread" prio=10 tid=0x00007f3ffc006800 nid=0x25d1 waiting on condition JNI global references: 907 Any help related to this would be really helpful... On Mon, Nov 2, 2009 at 3:56 PM, Julien Nioche <lists.digitalpeb...@gmail.com> wrote: > Hi again > > >> i know the process is not stuck.. and the process is running because i >> turned on the hadoop logs and i can see logs being written to it... >> I'm not sure how to check if the task is completely stuck or not... >> > > run jps to identify the process id then *jstack id* several times to see if > it is blocked at the same place > > how much space do you have left on the partition where /tmp is mounted? > > J. > > > >> Below is the sample log as i'm sending this email.... Its been on the >> updatedb process for the last 19 days and the it has been generating >> debug logs similar to this........ >> >> Has anyone else has this same issue before... >> >> >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Creating group >> org.apache.hadoop.mapred.Task$FileSystemCounter with bundle >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Adding LOCAL_READ >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Adding LOCAL_WRITE >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Creating group >> org.apache.hadoop.mapred.Task$Counter with bundle >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Adding >> COMBINE_OUTPUT_RECORDS >> 2009-11-02 13:34:21,112 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS >> 2009-11-02 13:34:21,113 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES >> 2009-11-02 13:34:21,113 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES >> 2009-11-02 13:34:21,113 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS >> 2009-11-02 13:34:21,113 DEBUG mapred.Counters - Adding >> COMBINE_INPUT_RECORDS >> 2009-11-02 13:34:21,643 INFO mapred.JobClient - map 93% reduce 0% >> 2009-11-02 13:34:22,121 INFO mapred.MapTask - Spilling map output: >> record full = true >> 2009-11-02 13:34:22,121 INFO mapred.MapTask - bufstart = 10420198; >> bufend = 13893589; bufvoid = 99614720 >> 2009-11-02 13:34:22,121 INFO mapred.MapTask - kvstart = 131070; kvend >> = 65533; length = 327680 >> 2009-11-02 13:34:22,427 INFO mapred.MapTask - Finished spill 3 >> 2009-11-02 13:34:23,301 INFO mapred.MapTask - Starting flush of map output >> 2009-11-02 13:34:23,384 INFO mapred.MapTask - Finished spill 4 >> 2009-11-02 13:34:23,390 DEBUG mapred.MapTask - >> MapId=attempt_local_0001_m_000003_0 Reducer=0Spill =0(0,224, 228) >> 2009-11-02 13:34:23,390 DEBUG mapred.MapTask - >> MapId=attempt_local_0001_m_000003_0 Reducer=0Spill =1(0,242, 246) >> 2009-11-02 13:34:23,390 DEBUG mapred.MapTask - >> MapId=attempt_local_0001_m_000003_0 Reducer=0Spill =2(0,242, 246) >> 2009-11-02 13:34:23,390 DEBUG mapred.MapTask - >> MapId=attempt_local_0001_m_000003_0 Reducer=0Spill =3(0,242, 246) >> 2009-11-02 13:34:23,390 DEBUG mapred.MapTask - >> MapId=attempt_local_0001_m_000003_0 Reducer=0Spill =4(0,242, 246) >> 2009-11-02 13:34:23,390 INFO mapred.Merger - Merging 5 sorted segments >> 2009-11-02 13:34:23,392 INFO mapred.Merger - Down to the last >> merge-pass, with 5 segments left of total size: 1192 bytes >> 2009-11-02 13:34:23,393 INFO mapred.MapTask - Index: (0, 354, 358) >> 2009-11-02 13:34:23,394 INFO mapred.TaskRunner - >> Task:attempt_local_0001_m_000003_0 is done. And is in the process of >> commiting >> 2009-11-02 13:34:23,395 DEBUG mapred.TaskRunner - >> attempt_local_0001_m_000003_0 Progress/ping thread exiting since it >> got interrupted >> 2009-11-02 13:34:23,395 INFO mapred.LocalJobRunner - >> >> file:/opt/tsweb/nutch-1.0/newHyperseekCrawl/db/current/part-00000/data:100663296+33554432 >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Creating group >> org.apache.hadoop.mapred.Task$FileSystemCounter with bundle >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding LOCAL_READ >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding LOCAL_WRITE >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Creating group >> org.apache.hadoop.mapred.Task$Counter with bundle >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding >> COMBINE_OUTPUT_RECORDS >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS >> 2009-11-02 13:34:23,396 DEBUG mapred.Counters - Adding >> COMBINE_INPUT_RECORDS >> 2009-11-02 13:34:23,397 INFO mapred.TaskRunner - Task >> 'attempt_local_0001_m_000003_0' done. >> 2009-11-02 13:34:23,397 DEBUG mapred.SortedRanges - currentIndex 0 0:0 >> 2009-11-02 13:34:23,397 DEBUG conf.Configuration - >> java.io.IOException: config(config) >> at >> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:192) >> at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:139) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132) >> >> 2009-11-02 13:34:23,398 DEBUG mapred.MapTask - Writing local split to >> /tmp/hadoop-root/mapred/local/localRunner/split.dta >> 2009-11-02 13:34:23,451 DEBUG mapred.TaskRunner - >> attempt_local_0001_m_000004_0 Progress/ping thread started >> 2009-11-02 13:34:23,452 INFO mapred.MapTask - numReduceTasks: 1 >> 2009-11-02 13:34:23,453 INFO mapred.MapTask - io.sort.mb = 100 >> 2009-11-02 13:34:23,644 INFO mapred.JobClient - map 100% reduce 0% >> >> Mathan >> On Mon, Nov 2, 2009 at 4:11 AM, Andrzej Bialecki <a...@getopt.org> wrote: >> > Kalaimathan Mahenthiran wrote: >> >> >> >> I forgot to add the detail... >> >> >> >> The segment i'm trying to do updatedb on has 1.3 millions urls fetched >> >> and 1.08 million urls parsed.. >> >> >> >> Any help related to this would be appreciated... >> >> >> >> >> >> On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran >> >> <matha...@gmail.com> wrote: >> >>> >> >>> hi everyone >> >>> >> >>> I'm using nutch 1.0. I have fetched successfully and currently on the >> >>> updatedb process. I'm doing updatedb and its taking so long. I don't >> >>> know why its taking this long. I have a new machine with quad core >> >>> processor and 8 gb of ram. >> >>> >> >>> I believe this system is really good in terms of processing power. I >> >>> don't think processing power is the problem here. I noticed that all >> >>> the ram is getting using up. close to 7.7gb by the updatedb process. >> >>> The computer is becoming is really slow. >> >>> >> >>> The updatedb process has been running for the last 19 days continually >> >>> with the message merging segment data into db.. Does anyone know why >> >>> its taking so long... Is there any configuration setting i can do to >> >>> increase the speed of the updatedb process... >> > >> > First, this process normally takes just a few minutes, depending on the >> > hardware, and not several days - so something is wrong. >> > >> > * do you run this in "local" or pseudo-distributed mode (i.e. running a >> real >> > jobtracker and tasktracker?) Try the pseudo-distributed mode, because >> then >> > you can monitor the progress in the web UI. >> > >> > * how many reduce tasks do you have? with large updates it helps if you >> run >> >> 1 reducer, to split the final sorting. >> > >> > * if the task appears to be completely stuck, please generate a thread >> dump >> > (kill -SIGQUIT) and see where it's stuck. This could be related to >> > urlfilter-regex or urlnormalizer-regex - you can identify if these are >> > problematic by removing them from the config and re-running the >> operation. >> > >> > * minor issue - when specifying the path names of segments and crawldb, >> do >> > NOT append the trailing slash - it's not harmful in this particular case, >> > but you could have a nasty surprise when doing e.g. copy / mv operations >> ... >> > >> > -- >> > Best regards, >> > Andrzej Bialecki <>< >> > ___. ___ ___ ___ _ _ __________________________________ >> > [__ || __|__/|__||\/| Information Retrieval, Semantic Web >> > ___|||__|| \| || | Embedded Unix, System Integration >> > http://www.sigram.com Contact: info at sigram dot com >> > >> > >> > > > > -- > DigitalPebble Ltd > http://www.digitalpebble.com >