This is more complex than you need. The Solr update command can accept
streamed data, with the stream.url and stream.file options. You can just
use solr/update with stream.url=http://your.machine/your.php.script and
it will read as fast as it wants.
There is no "parallel indexing" support, but you will find that indexing
in this way is generally disk-bound, not processor-bound.
Good luck!
Dario Rigolin wrote:
I'm looking to index data in Solr using a PHP page feeding the index.
In my application I have all docs allready "converted" to a solr/add xml
document and I need to make solr able to get all changed documents into the
index. Looking at DIH I decidec to use URLDataSource and useSolrAddSchema=true
pointing to my application url: getchangeddocstoindex.php.
But my PHP page could stream hundreds of megabytes (maybe couple of Gigs!).
Anybody knows if do I need to adapt connectionTimeout and readTimeout in any
way?
Looking at URLDataSource documentation it seems that It's possible to
implement a kind of chunking using Transformer and $hasMore and $nextURL.
But having useSolrAddSchema I don't know how to setup a Transformer section.
My questions are:
1) Does exist any limit over that it's better to do chunking?
2) It's possible to do chunking having useSolrAddSchema=true?
Thanks
Dario.