Try replacing the inner entity with something like <entity name="message" dataSource="dastream" processor="TikaEntityProcessor" dataField="messages.MESSAGE" format="xml"> <field column="text" name="mxMsg"/> </entity>
--- this assumes that you get the blob from a column named "MESSAGE" in the outer entity ("messages"). On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua <chand...@nrifintech.com>wrote: > Hi Raymond ! > > I've data-config.xml like bellow: > > <?xml version="1.0" encoding="UTF-8" ?> > <dataConfig> > <dataSource name="db" driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/> > <dataSource name="dastream" type="FieldStreamDataSource" /> > <document> > <entity > name="messages" pk=" PK" transformer='DateFormatTransformer' > query="select * from table1" > dataSource="db"> > <field column =" PK" name ="id" /> > <field column="last_modified" dateTimeFormat="YYYY-MM-DD > HH24:MI:SS" locale="en" /> > <entity > name="message" > dataSource="dastream" > processor="TikaEntityProcessor" > url="message" > dataField="db.MESSAGE" > format="text" > > > > <field column="text" name="mxMsg" blob="true"/> > </entity> > </entity> > > > </document> > </dataConfig> > > > > This is looks like similar to your configuration. But when xml data are in > BLOB in database, indexing is done. But, when binary data are in BLOB in > database, indexing is NOT done. > Please help. > > Thanking you, > -Chandan > > > -----Original Message----- > From: Raymond Wiker [mailto:rwi...@gmail.com] > Sent: Monday, February 24, 2014 4:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Can not index raw binary data stored in Database in BLOB > format. > > I've done something like this; the key was to use a FieldStreamDataSource > to > read from the BLOB field. > > Something like > > <datasource name="main" ...> > <dataSource type="FieldStreamDataSource" name="fieldstream"/> > > then > > <entity name="tika" processor="TikaEntityProcessor" > dataField="main.BLOB" dataSource="fieldstream" format="xml"> > <field column="Author" meta="true" name="..."/> > <field column="title" meta="true" name="title"/> > <field column="text" name="content"/> > <field column="content_type" name="content_type" meta="true"/> > <field column="last_modified" name="last_modified" meta="true"/> > </entity> > > ... > > > > > On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua > <chand...@nrifintech.com>wrote: > > > Hi Gora ! > > > > Your concern was "What is the type of the column used to store the > > binary data in Oracle?" > > The column type is BLOB in DB. The column can also have rich text file. > > > > Regards, > > Chandan > > > > > > -----Original Message----- > > From: Gora Mohanty [mailto:g...@mimirtech.com] > > Sent: Monday, February 24, 2014 3:02 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Can not index raw binary data stored in Database in BLOB > > format. > > > > On 24 February 2014 12:51, Chandan khatua <chand...@nrifintech.com> > wrote: > > > Hi, > > > > > > > > > > > > We have raw binary data stored in database(not word,excel,xml etc > > > files) in BLOB. > > > > > > We are trying to index using TikaEntityProcessor but nothing seems > > > to get indexed. > > > > > > But the same configuration works when xml/word/excel files are > > > stored in the BLOB field. > > > > Please start by reviewing > > http://wiki.apache.org/solr/DataImportHandler as the above seems quite > > confused. Why are you using TikaEntityProcessor if the data in the DB > > are not richtext files? > > > > What is the type of the column used to store the binary data in > > Oracle? You might be able to convert it with a ClobTransformer. Please > > see http://wiki.apache.org/solr/DataImportHandler#ClobTransformer > > > > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_tab > > le_are > > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5 > > > > Regards, > > Gora > > > > > >