Hello: I'm trying to make use of FieldReaderDataSource so that I can read a (Oracle) database CLOB, and then use XPathEntityProcessor to derive Solr field values via xpath notation.
For an extra bit of fun, the CLOB itself is base 64 encoded and gzip'd. I created a transformer of my own to take care of the encoding and compression and that seems to work. I patterned the new transformer after the existing ones (Solr 3.1 trunk). Anyway, I can see in catalina.out, my own debug output: ------------- Processing field: {toWrite=false, clob=true, column=SUMMARY_XML, boost=1.0, gzip64=true} ------------- Updated field: SUMMARY_XML to type: java.lang.String value: '<node id="ING:2ylbg" name="LOC677213" type="gene"><synonym-list><synonym name="LOC677213"/></synonym-list><macromolecule-list><macromolecule id="677213" source="EG" species="MM" name="similar to U2AF homology motif (UHM) kinase 1" summary=""/></macromolecule-list><member-of></member-of><molecular-function></molecular-function><biological-process></biological-process><cellular-component></cellular-component><pathway-list></pathway-list><protein-family><term name="unknown"/></protein-family><subcellular-location></subcellular-location><top-findings></top-findings><additional-findings></additional-findings><reference-list finding-count="0"></reference-list><copyright>©2000-2010 Ingenuity Systems, Inc. All rights reserved.</copyright></node>' So, the transformer replaces the original CLOB extracted by ClobTransformer with a String representing the decoded result. I then want to feed this XML string to XPathEntityProcessor. So, in my DIH data config file: <dataConfig> <dataSource name="ipsDb" type="JdbcDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))" user="user" password="password" /> <datasource name="fieldSource" type="FieldReaderDataSource" /> <document> <entity rootEntity="false" name="ipsNode" dataSource="ipsDb" query="select SUMMARY_XML from IPS_NODE where ROWNUM < 10" transformer="ClobTransformer,com.ingenuity.isec.util.SolrDihGzip64Transformer"> <field column="SUMMARY_XML" clob="true" gzip64="true"/> <entity name="node" dataSource="fieldSource" dataField="ipsNode.SUMMARY_XML" processor="XPathEntityProcessor" forEach="/node"> <field column="n_id" xpath="/node/@id"/> <field column="n_name" xpath="/node/@name"/> ... </entity> </entity> </document> </dataConfig> Basically, I'm trying to specify the (former CLOB, now String) SUMMARY_XML field as the data field for the FieldReaderDataSource. I can see it has the ability to simply return a StringReader() for String fields, rather than have to deal with a Clob itself. So, I figured FieldReaderDataSource would be happy with that and it would supply XPathEntityProcessor with XML contained in the field's value. But, when I do a full import, I see this: Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Mar 4, 2011 9:10:26 AM org.apache.solr.core.SolrCore execute INFO: [ing-nodes] webapp=/solr path=/select params={clean=false&commit=true&command=full-import&qt=/dataimport-ips} status=0 QTime=31 Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport-ips.properties Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity ipsNode with URL: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R))) Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 1838 Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity node with URL: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R))) Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 1110 Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: ipsNode document : null org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: null Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:262) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:203) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:183) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:586) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:612) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:266) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374) Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or null at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:112) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1683) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1662) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:246) ... 13 more I looks like XPathEntityProcessor is not using the FieldReaderDataSource I configured for the "node" entity. Instead it creates another JDBC connection for the "node" entity and the stack trace indicates XPathEntityProcessor.initQuery() invokes the getData() method of that data source rather than the FieldReaderDataSource. I see in initQuery(): private void initQuery(String s) { Reader data = null; try { final List<Map<String, Object>> rows = new ArrayList<Map<String, Object>>(); try { data = dataSource.getData(s); ... dataSource is set up as: @Override @SuppressWarnings("unchecked") public void init(Context context) { super.init(context); if (xpathReader == null) initXpathReader(); pk = context.getEntityAttribute("pk"); dataSource = context.getDataSource(); rowIterator = null; } I'm not sure how all of these DIH components work, but it seems context.getDataSource() must be returning the JDBC data source configured for the outer entity (ipsNode), not the FieldDataSource configured for the inner entity (node) where I'm making use of XPathEntityProcessor. What am I missing conceptually? I've found a few of references to the very same problem, and I think I'm following the same pattern. Thanks for any insights you can share, Jeff -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com