Hello:

I'm trying to make use of FieldReaderDataSource so that I can read a (Oracle) 
database CLOB, and then use XPathEntityProcessor to derive Solr field values 
via xpath notation.

For an extra bit of fun, the CLOB itself is base 64 encoded and gzip'd.  I 
created a transformer of my own to take care of the encoding and compression 
and that seems to work.  I patterned the new transformer after the existing 
ones (Solr 3.1 trunk).  Anyway, I can see in catalina.out, my own debug output:

------------- Processing field: {toWrite=false, clob=true, column=SUMMARY_XML, 
boost=1.0, gzip64=true}
------------- Updated field: SUMMARY_XML to type: java.lang.String value: 
'<node id="ING:2ylbg" name="LOC677213" type="gene"><synonym-list><synonym 
name="LOC677213"/></synonym-list><macromolecule-list><macromolecule id="677213" 
source="EG" species="MM" name="similar to U2AF homology motif (UHM) kinase 1" 
summary=""/></macromolecule-list><member-of></member-of><molecular-function></molecular-function><biological-process></biological-process><cellular-component></cellular-component><pathway-list></pathway-list><protein-family><term
 
name="unknown"/></protein-family><subcellular-location></subcellular-location><top-findings></top-findings><additional-findings></additional-findings><reference-list
 finding-count="0"></reference-list><copyright>&#169;2000-2010  Ingenuity 
Systems, Inc. All rights reserved.</copyright></node>'

So, the transformer replaces the original CLOB extracted by ClobTransformer 
with a String representing the decoded result. I then want to feed this XML 
string to XPathEntityProcessor.  So, in my DIH data config file:

<dataConfig>
    <dataSource
        name="ipsDb"
        type="JdbcDataSource" 
        driver="oracle.jdbc.driver.OracleDriver"
        
url="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))"
        user="user"
        password="password"
    />
     
    <datasource
        name="fieldSource"
        type="FieldReaderDataSource"
    />

    <document>
        <entity
                rootEntity="false"
                name="ipsNode"
            dataSource="ipsDb"            
                query="select SUMMARY_XML from IPS_NODE where ROWNUM &lt; 10"
            
transformer="ClobTransformer,com.ingenuity.isec.util.SolrDihGzip64Transformer">
            
            <field column="SUMMARY_XML" clob="true" gzip64="true"/>
            
                <entity
                        name="node"
                        dataSource="fieldSource"
                        dataField="ipsNode.SUMMARY_XML"              
                    processor="XPathEntityProcessor"            
                    forEach="/node">
        
                    <field column="n_id" xpath="/node/@id"/>
                    <field column="n_name" xpath="/node/@name"/>
                    ...
                </entity>
        </entity>
    </document>
</dataConfig>

Basically, I'm trying to specify the (former CLOB, now String) SUMMARY_XML 
field as the data field for the FieldReaderDataSource. I can see it has the 
ability to simply return a StringReader() for String fields, rather than have 
to deal with a Clob itself. So, I figured FieldReaderDataSource would be happy 
with that and it would supply XPathEntityProcessor with XML contained in the 
field's value.

But, when I do a full import, I see this:

Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Mar 4, 2011 9:10:26 AM org.apache.solr.core.SolrCore execute
INFO: [ing-nodes] webapp=/solr path=/select 
params={clean=false&commit=true&command=full-import&qt=/dataimport-ips} 
status=0 QTime=31 
Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
WARNING: Unable to read: dataimport-ips.properties
Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity ipsNode with URL: 
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1838
Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity node with URL: 
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1110
Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
SEVERE: Exception while processing: ipsNode document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: null Processing Document # 1
        at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
        at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
        at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:262)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:203)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:183)
        at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:586)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:612)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:266)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374)
Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or 
null
        at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
        at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
        at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208)
        at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:112)
        at 
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1683)
        at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1662)
        at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:246)
        ... 13 more

I looks like XPathEntityProcessor is not using the FieldReaderDataSource I 
configured for the "node" entity. Instead it creates another JDBC connection 
for the "node" entity and the stack trace indicates 
XPathEntityProcessor.initQuery() invokes the getData() method of that data 
source rather than the FieldReaderDataSource.

I see in initQuery():

  private void initQuery(String s) {
    Reader data = null;
    try {
      final List<Map<String, Object>> rows = new ArrayList<Map<String, 
Object>>();
      try {
        data = dataSource.getData(s);
...

dataSource is set up as:

  @Override
  @SuppressWarnings("unchecked")
  public void init(Context context) {
    super.init(context);
    if (xpathReader == null)
      initXpathReader();
    pk = context.getEntityAttribute("pk");
    dataSource = context.getDataSource();
    rowIterator = null;
  }

I'm not sure how all of these DIH components work, but it seems 
context.getDataSource() must be returning the JDBC data source configured for 
the outer entity (ipsNode), not the FieldDataSource configured for the inner 
entity (node) where I'm making use of XPathEntityProcessor.

What am I missing conceptually?  I've found a few of references to the very 
same problem, and I think I'm following the same pattern.

Thanks for any insights you can share,

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com

Reply via email to