Hi all,

I'd like to MapReduce over (latest) cralwed data.

Should input path be crawldb/current/ ?
InputFromatClass = SequenceFileInputFormat.class ?
KV pair = <Text, CrawlDatum> ? where Text represents the URL ?

Thanks.

Reply via email to