As reference to other readers:
 https://issues.apache.org/jira/browse/NUTCH-939


On Friday 26 November 2010 11:59:26 Claudio Martella wrote:
> Hello list,
> 
> I'm porting recrawl script to use hadoop (on an already existing hadoop
> cluster). I attach my version.
> 
> What i found out is that Indexer and SolrIndexer want a list of
> segments. It's difficult to obtain the content of a directory through
> hdfs (/craw/segments/* will be expanded by bash and hadoop dfs -ls will
> return the content with details such as permissions, owners and dates),
> so I wrote these little patches to add the -dir option like
> SegmentMerger and LinkDB. They are attached too.
> 
> They might be of interest for somebody else.

Reply via email to