Im trying to get off my jdbc data source and move to a Streaming data source. I have successfully implemented a node.js api that will push items to my solr index using the /update/json which is defined out of the box as: <requestHandler name="/update" class="solr.UpdateRequestHandler">
This process replaces the 'delta' We still have our /dataimport DataImportHandler that handles out 'full import' which uses a jdbc connection looks like the following solrconfig.xml <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> <str name="clean">false</str> </lst> </requestHandler> data-config.xml (partial) <dataSource jndiName="SOLR_EXTERNAL_TABLE" batchSize="2000" type="JdbcDataSource"/> <document> <entity name="COLLECTION1" transformer="RegexTransformer, script:transformAddress, script:transformPublishFlag, script:transformSalesChannel, script:collectCustomerNames" query="select * from EXTERNAL_TABLE" > <field column="Column1" name="column_one"/> <field column="Column2" name="column_two"/> <field column="Column3" name="column_three"/> <field column="Column4" name="column_four"/> <field column="Column5" name="column_five"/> <field column="Column6" name="column_six"/> <field column="Column7" name="column_seven"/> <document> I would really like to be able to just stream my indexing and ditch the jdbc one. I have a couple questions. 1. Does the ContentStreamDataSource post out to an api or does it wait for something to post to it? 2. Does ContentStreamDataSource has a JSON processor? I only see XPathEntityProcessor for xml 3. Is there a way to get status of this stream? - Right now I can hit /COLLECTION2/dataimport?_=xxxxxxxx&command=status&indent=on&wt=json - It responds with: { "responseHeader":{ "status":0, "QTime":0}, "initArgs":[ "defaults",[ "config","data-config.xml", "clean","false"]], "command":"status", "status":"idle", "importResponse":"", "statusMessages":{ "Total Requests made to DataSource":"0", "Total Rows Fetched":"0", "Total Documents Processed":"0", "Total Documents Skipped":"0", "Time taken":"0:0:0.0"} } My gut was to implement it like this: solrconfig.xml <requestHandler name="/dataimportStream" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">stream-data-config.xml</str> <str name="clean">false</str> </lst> </requestHandler> stream-data-config.xml <dataSource name="jsonStream" type="ContentStreamDataSource"/> <document> <entity name="CONTRACTS" transformer="RegexTransformer, script:transformAddress, script:transformPublishFlag, script:transformSalesChannel, script:collectCustomerNames"\stream="true" name="streamjson" datasource="jsonStream" > <field column="Column1" name="column_one"/> <field column="Column2" name="column_two"/> <field column="Column3" name="column_three"/> <field column="Column4" name="column_four"/> <field column="Column5" name="column_five"/> <field column="Column6" name="column_six"/> <field column="Column7" name="column_seven"/> <document> I think i might be crossing some streams here on how this all works. Any advice is appriciated. Thanks, Nate -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html