derevo wrote: > hi, > (2 servers hadoop nutch) > > I am try to fetch my host with txt files ( http://site.net/file_1.txt ). > More then 150000 txt files. > when i start fetch and look in access.log file in target host, i see only > one slave host do fetch (SLAVE_1). > I try to restart fetching and slave host now is (SLAVE_2). > > in Task Tracker Status i see the same result
Fetchlist is by default partitioned in a way that all urls for same host will end up being fetched by a single node see PartitionUrlByHost. To override this you would need to change the partitioner or stop using it (both would require source code changes) -- Sami Siren
