On Thu, Jun 27, 2013 at 3:38 AM, Sznajder ForMailingList < [email protected]> wrote:
> Hi > > I do not see the usage of "Segments" in nutch 2.x > > In addition, I do not see DB path . > "segments" and "crawldb" are notions in 1.x representing the dir over FS which has the crawlers' data in it (those are nothing but Hadoops' Map files and Sequence files). 2.x leverages datastores to store the crawled data. A table is created in the datastore to have all the information. > > In such condition, how can we two separate crawls, one starting from url1 > and the second from another seed, for example? > You could specify different crawlIDs. Being honest, I have never tried running multiple crawls at the same time with 2.x. Its not seen to be a good thing to do as mentioned by Julien in this thread: http://lucene.472066.n3.nabble.com/Concurrently-running-multiple-nutch-crawls-td3166207.html > > Benjamin >

