This is more complex than you need. The Solr update command can accept streamed data, with the stream.url and stream.file options. You can just use solr/update with stream.url=http://your.machine/your.php.script and it will read as fast as it wants. There is no "parallel indexing" support, but you will find that indexing in this way is generally disk-bound, not processor-bound.

Good luck!

Dario Rigolin wrote:
I'm looking to index data in Solr using a PHP page feeding the index.
In my application I have all docs allready "converted" to a solr/add xml
document and I need to make solr able to get all changed documents into the
index. Looking at DIH I decidec to use URLDataSource and useSolrAddSchema=true
pointing to my application url: getchangeddocstoindex.php.

But my PHP page could stream hundreds of megabytes (maybe couple of Gigs!).
Anybody knows if do I need to adapt connectionTimeout and readTimeout in any
way?

Looking at URLDataSource documentation it seems that It's possible to
implement a kind of chunking using Transformer and  $hasMore and $nextURL.

But having useSolrAddSchema I don't know how to setup a Transformer section.

My questions are:
1) Does exist any limit over that it's better to do chunking?
2) It's possible to do chunking having useSolrAddSchema=true?

Thanks


Dario.

Reply via email to