On Jun 10, 2011, at 7:50am, Marek Bachmann wrote: > Thanks to you all, > > so to get it on one point: Is it possible to speed up the map / reduce task > (what ever it exactly does) on a single quad core machine, and if so, does > anyone know a resource where I can get a little documentation? :-)
Get "Hadoop: The Definitive Guide" by Tom White. And then set up your machine to run in pseudo-distributed mode. -- Ken > > Thank you once again. > > Greetings, > > Marek > > On 10.06.2011 16:26, Julien Nioche wrote: >> Raymond, >> >> Hadoop is using a map/reduce algorithm, the reduce phase is that phase which >>> collects the results from // execution. >>> It is inherently not possible to parrallelized that phase. >>> >> >> Sorry to contradict you Raymond but this is incorrect. You can specify the >> number of reducers to use e.g. >> >> -D mapred.reduce.tasks=$numTasks >> >> but obviously this will work only in (pseudo)distributed mode i.e. with the >> various Hadoop services running indepently of Nutch >> >> >> >> >> >> >>> >>> -Raymond- >>> >>> 2011/6/10 Marek Bachmann<[email protected]> >>> >>>> Hello again, >>>> >>>> I noticed that in the reduce phase only use one cpu core. This processes >>>> take very long time with 100 % usage but only on one core. Is there a >>>> possibility to parallelise this processes on multiple cores on one local >>>> machine? Could using Hadoop help in some way? I have no experience with >>>> Hadoop at all. :-/ >>>> >>>> 11/06/10 14:38:21 INFO mapred.JobClient: map 100% reduce 94% >>>> 11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce> reduce >>>> 11/06/10 14:38:57 INFO mapred.JobClient: map 100% reduce 95% >>>> >>>> >>>> Here is a copy of top's output while running a reduce: >>>> >>>> top - 14:30:53 up 12 days, 33 min, 3 users, load average: 0.81, 0.38, >>>> 0.35 >>>> Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie >>>> Cpu(s): 25.1%us, 0.2%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, >>>> 0.0%st >>>> Mem: 8003904k total, 5762520k used, 2241384k free, 120180k buffers >>>> Swap: 418808k total, 4k used, 418804k free, 3713236k cached >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> >>>> 25835 root 20 0 4371m 1.6g 10m S 101 21.3 5:18.69 java >>>> >>>> Tank you >>>> >>> >>> >>> >>> -- >>> -MilleBii- >>> >> >> >> > -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions

