Wow, thanks for such a quick reply...

It is the simplification for data transmission in index time.  
 Class  MediaSOLRIndexWriter.java
<https://github.com/KIZI/IRAPI/blob/master/nutch-plugin/media-extractor/src/java/org/apache/nutch/indexwriter/media/MediaSOLRIndexWriter.java>
  
implements */IndexWriter/* so must override public*/ void write(final
NutchDocument doc) throws IOException/*.
There is a 1:M relation between webpage and internal media urls, in time the
webpage is indexed also its media have to be indexed.

That was the easiest way how to achieve it.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-media-extractor-plugin-proposal-tp4207382p4207402.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to