In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100'
But when 100 docs are imported the process keeps running. Here's the log output: Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 100 Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 200 Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 300 Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 400 Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument INFO: Indexing stopped at docCount = 500 and so on. Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08 17:31:22 I've used that exact url in the past and the indexing stopped at the rows number as expected, but I haven't run the command for about two months on a build from back in early July. Here's the dih config: <dataConfig> <dataSource name="dsFiles" type="FileDataSource" encoding="UTF-8"/> <document> <entity name="f" processor="FileListEntityProcessor" baseDir="/path/to/files" fileName=".*xml" recursive="true" rootEntity="false" dataSource="null"> <entity name="wikixml" processor="XPathEntityProcessor" forEach="/mediawiki/page" url="${f.fileAbsolutePath}" dataSource="dsFiles" onError="skip" > <field column="id" xpath="/mediawiki/page/id"/> <field column="title" xpath="/mediawiki/page/title"/> <field column="contributor" xpath="/mediawiki/page/revision/contributor/username"/> <field column="comment" xpath="/mediawiki/page/revision/comment"/> <field column="text" xpath="/mediawiki/page/revision/text"/> </entity> </entity> </document> </dataConfig> -Jay