Hi all.
Im using nutch 1.9 and solr 4.10 in my environment.
I want to skip of the indexing process, all document that have the field title 
empty (or another), and of course, avoid it go to solr.

My first solution was clean all document with empty title in solr. this is not 
good idea for me because i need to execute the clean query after all indexing

The second solution that I thought was put the fields as required in schema.xml

<field name="title" type="text" stored="true" indexed="true" multiValued="true" 
required="true"/>

After do that, i found that when nutch try to send a batch of 250 documents, if 
there is one document with title empty, solr fails and nutch throw Job Failed 
Exception, because solr don't permit to index one document without title value, 
therefore solr index nothing.  

Is there any way that nutch take required option in schema.xml and clean it 
document from the collection of document before to index to solr?

Please any body can give me one advice, comment about it or what is the best 
way to restrict documents with empty field before to index ?.

Eyeris.
 

Reply via email to