Im trying to get off my jdbc data source and move to a Streaming data source.
I have successfully implemented a node.js api that will push items to my
solr index using the /update/json which is defined out of the box as:
 
<requestHandler name="/update" class="solr.UpdateRequestHandler">

This process replaces the 'delta' 

We still have our /dataimport DataImportHandler that handles out 'full
import' which uses a jdbc connection looks like the following

solrconfig.xml

<requestHandler name="/dataimport"
                   
class="org.apache.solr.handler.dataimport.DataImportHandler">

        <lst name="defaults">
            <str name="config">data-config.xml</str>
            <str name="clean">false</str>
        </lst>
</requestHandler>

data-config.xml (partial)

 <dataSource jndiName="SOLR_EXTERNAL_TABLE" batchSize="2000"
type="JdbcDataSource"/>

    <document>
        <entity name="COLLECTION1"
                transformer="RegexTransformer, script:transformAddress,
script:transformPublishFlag, 
                script:transformSalesChannel, script:collectCustomerNames"
                query="select * from EXTERNAL_TABLE"
        >
         <field column="Column1" name="column_one"/>
         <field column="Column2" name="column_two"/>
         <field column="Column3" name="column_three"/>
         <field column="Column4" name="column_four"/>
         <field column="Column5" name="column_five"/>
         <field column="Column6" name="column_six"/>
         <field column="Column7" name="column_seven"/>
    <document>


I would really like to be able to just stream my indexing and ditch the jdbc
one. I have a couple questions.

1. Does the ContentStreamDataSource post out to an api or does it wait for
something to post to it?
2. Does ContentStreamDataSource has a JSON processor? I only see
XPathEntityProcessor for xml
3. Is there a way to get status of this stream?
      - Right now I can hit
/COLLECTION2/dataimport?_=xxxxxxxx&command=status&indent=on&wt=json
      - It responds with:
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "initArgs":[
    "defaults",[
      "config","data-config.xml",
      "clean","false"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{
    "Total Requests made to DataSource":"0",
    "Total Rows Fetched":"0",
    "Total Documents Processed":"0",
    "Total Documents Skipped":"0",
    "Time taken":"0:0:0.0"}
}


My gut was to implement it like this:

solrconfig.xml

    <requestHandler name="/dataimportStream"
                   
class="org.apache.solr.handler.dataimport.DataImportHandler">

        <lst name="defaults">
            <str name="config">stream-data-config.xml</str>
            <str name="clean">false</str>
        </lst>
    </requestHandler>

stream-data-config.xml


  <dataSource name="jsonStream" type="ContentStreamDataSource"/>
 
    <document>
        <entity name=&quot;CONTRACTS&quot;
                transformer=&quot;RegexTransformer, script:transformAddress,
script:transformPublishFlag, 
                 script:transformSalesChannel,
script:collectCustomerNames&quot;\stream=&quot;true&quot;
                name=&quot;streamjson&quot;
                datasource=&quot;jsonStream&quot;
>
         <field column="Column1" name="column_one"/>
         <field column="Column2" name="column_two"/>
         <field column="Column3" name="column_three"/>
         <field column="Column4" name="column_four"/>
         <field column="Column5" name="column_five"/>
         <field column="Column6" name="column_six"/>
         <field column="Column7" name="column_seven"/>
    <document>


I think i might be crossing some streams here on how this all works. Any
advice is appriciated.

Thanks,

Nate







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to