Wow, thanks for such a quick reply... It is the simplification for data transmission in index time. Class MediaSOLRIndexWriter.java <https://github.com/KIZI/IRAPI/blob/master/nutch-plugin/media-extractor/src/java/org/apache/nutch/indexwriter/media/MediaSOLRIndexWriter.java> implements */IndexWriter/* so must override public*/ void write(final NutchDocument doc) throws IOException/*. There is a 1:M relation between webpage and internal media urls, in time the webpage is indexed also its media have to be indexed.
That was the easiest way how to achieve it. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-media-extractor-plugin-proposal-tp4207382p4207402.html Sent from the Nutch - User mailing list archive at Nabble.com.

