Okey.

Here is my data-config file:

<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource name="db" driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//1.2.3.4:1/d11gr21" user="aaaa" password="aaaa" />
 <dataSource name="dastream" type="FieldStreamDataSource"/>
 <document>
  <entity 
      name="messages" pk="X_MSG_PK" 
      query="select * from table1"
      dataSource="db">
         <field column ="X_MSG_PK" name ="id" />
        <entity name="message"
                        transformer="ClobTransformer"
                                dataSource="dastream"
                        processor="TikaEntityProcessor"
                                dataField="messages.MESSAGE"
                                 format="text">
                        <field column="text" name="mxMsg" clob="true"/>
        </entity>
    </entity> 
 </document>
</dataConfig>

----------------------------------------------------------------------------
----------------------

Solr.log file :

INFO  - 2014-02-25 17:33:40.023; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/admin/mbeans
params={cat=QUERYHANDLER&_=1393329819994&wt=json} status=0 QTime=1 
INFO  - 2014-02-25 17:33:40.094; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/admin/mbeans
params={cat=QUERYHANDLER&_=1393329820083&wt=json} status=0 QTime=0 
INFO  - 2014-02-25 17:33:40.117; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/dataimport
params={indent=true&command=status&_=1393329820089&wt=json} status=0
QTime=16 
INFO  - 2014-02-25 17:33:40.131; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/dataimport
params={indent=true&command=show-config&_=1393329820084} status=0 QTime=29 
INFO  - 2014-02-25 17:33:42.026;
org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration:
/dataconfig/data-config.xml
INFO  - 2014-02-25 17:33:42.031;
org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded
successfully
INFO  - 2014-02-25 17:33:42.033; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/dataimport
params={optimize=false&indent=true&clean=true&commit=true&verbose=false&comm
and=full-import&debug=false&wt=json} status=0 QTime=8 
INFO  - 2014-02-25 17:33:42.035;
org.apache.solr.handler.dataimport.DataImporter; Starting Full Import
INFO  - 2014-02-25 17:33:42.043; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/dataimport
params={indent=true&command=status&_=1393329822040&wt=json} status=0 QTime=0

INFO  - 2014-02-25 17:33:42.064;
org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read
dataimport.properties
INFO  - 2014-02-25 17:33:42.092; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@2a858a73 realtime
INFO  - 2014-02-25 17:33:42.093;
org.apache.solr.handler.dataimport.JdbcDataSource$1; Creating a connection
for entity messages with URL: jdbc:oracle:thin:@//172.16.29.92:1521/d11gr21
INFO  - 2014-02-25 17:33:42.113;
org.apache.solr.handler.dataimport.JdbcDataSource$1; Time taken for
getConnection(): 19
INFO  - 2014-02-25 17:33:42.564;
org.apache.solr.handler.dataimport.DocBuilder; Import completed successfully
INFO  - 2014-02-25 17:33:42.564;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=fa
lse,softCommit=false,prepareCommit=false}
INFO  - 2014-02-25 17:33:42.867; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onCommit: commits: num=2
        
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr
-4.5.1\example\multicore\CHESS_CORE\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c6d8073;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_l,generation=21}
        
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@C:\solr
-4.5.1\example\multicore\CHESS_CORE\data\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2c6d8073;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_m,generation=22}
INFO  - 2014-02-25 17:33:42.868; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 22
INFO  - 2014-02-25 17:33:42.882; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@558ea0cc main
INFO  - 2014-02-25 17:33:42.886; org.apache.solr.core.QuerySenderListener;
QuerySenderListener sending requests to Searcher@558ea0cc
main{StandardDirectoryReader(segments_m:55:nrt _d(4.5.1):C80)}
INFO  - 2014-02-25 17:33:42.889; org.apache.solr.core.QuerySenderListener;
QuerySenderListener done.
INFO  - 2014-02-25 17:33:42.889; org.apache.solr.core.SolrCore; [CHESS_CORE]
Registered new searcher Searcher@558ea0cc
main{StandardDirectoryReader(segments_m:55:nrt _d(4.5.1):C80)}
INFO  - 2014-02-25 17:33:42.893;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2014-02-25 17:33:42.899;
org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read
dataimport.properties
INFO  - 2014-02-25 17:33:42.901;
org.apache.solr.handler.dataimport.SimplePropertiesWriter; Wrote last
indexed time to dataimport.properties
INFO  - 2014-02-25 17:33:42.905;
org.apache.solr.handler.dataimport.DocBuilder; Time taken = 0:0:0.839
INFO  - 2014-02-25 17:33:42.905;
org.apache.solr.update.processor.LogUpdateProcessor; [CHESS_CORE]
webapp=/solr path=/dataimport
params={optimize=false&indent=true&clean=true&commit=true&verbose=false&comm
and=full-import&debug=false&wt=json} status=0 QTime=8 {deleteByQuery=*:*
(-1461012211508969472),add=[2158 (1461012211583418368), 2265
(1461012211591806976), 2225 (1461012211597049856), 2241
(1461012211602292736), 2276 (1461012211607535616), 2277
(1461012211612778496), 2302 (1461012211619069952), 4558
(1461012211624312832), 2144 (1461012211629555712), 2145
(1461012211635847168), ... (80 adds)],commit=} 0 8
INFO  - 2014-02-25 17:33:47.623; org.apache.solr.core.SolrCore; [CHESS_CORE]
webapp=/solr path=/dataimport
params={indent=true&command=status&_=1393329827620&wt=json} status=0 QTime=1


----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------

Part of Query result screen :

"docs": [
      {
        "id": "2158",
        "mxMsg": [
          ""
        ],
        "_version_": 1461012211583418400
      },
      {
        "id": "2265",
        "mxMsg": [
          ""
        ],
        "_version_": 1461012211591807000
      },

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----

As you see, 

'id' is indexed properly, but 'mxMsg' is empty.

----------------------------------------------------------------------------
-------------------------------------------------------

Now, please suggest me so that I can get data in 'mxMsg' field. The binary
data is stored inDB as BLOB type.

Please note:  The same configuration is working fine ('mxMsg' displays data
if XML data are in DB as BLOB type).
 


Please help,

Looking forward,

Chandan


-----Original Message-----
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Tuesday, February 25, 2014 4:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

On 25 February 2014 14:54, Chandan khatua <chand...@nrifintech.com> wrote:
> Hi Gora,
>
> The column type in DB is BLOB. It only stores binary data.
>
> If I do not use TikaEntityProcessor, then the following exception occurs:
[...]

It is difficult to follow what you are doing when you say one thing, and
seem to do another. You say above that you are not using TikaEntityProcessor
but your DIH data configuration file shows that you are. Please start with
one configuration, and show us the *exact* files in use, and the error from
the Solr logs.

Regards,
Gora

Reply via email to