Hi all. Im using nutch 1.9 and solr 4.10 in my environment. I want to skip of the indexing process, all document that have the field title empty (or another), and of course, avoid it go to solr.
My first solution was clean all document with empty title in solr. this is not good idea for me because i need to execute the clean query after all indexing The second solution that I thought was put the fields as required in schema.xml <field name="title" type="text" stored="true" indexed="true" multiValued="true" required="true"/> After do that, i found that when nutch try to send a batch of 250 documents, if there is one document with title empty, solr fails and nutch throw Job Failed Exception, because solr don't permit to index one document without title value, therefore solr index nothing. Is there any way that nutch take required option in schema.xml and clean it document from the collection of document before to index to solr? Please any body can give me one advice, comment about it or what is the best way to restrict documents with empty field before to index ?. Eyeris.

