Hi guys, I successfully configured nutch2x + hbase0.94 + hadoop2.5 but have a question regarding how nutch runs it's "phases"(fetch, parse, etc.). I found in Nutch source code that each phase is a set of hadoop Map/Reduce tasks.
Do i have a correct understanding that each Map/Reduce task stores/processes data to/from hbase? Do these tasks store additionally their output to hdfs(for example does fetchjob map task store url content in hdfs, so i can see it directly from hdfs)? Is it necessary to have up and running task tracker to run map/reduce tasks? I found out that execution performs successfully even if no TaskTracker or NodeManager is running. My jps for successful execution: breedish-mbp:~ zenind$ jps 21664 HQuorumPeer 19657 Launcher 8974 NailgunRunner 21706 HMaster 20918 NameNode 21808 HRegionServer 21114 SecondaryNameNode 21005 DataNode 21847 Jps Best Regards, Dzmitry

