Try replacing the inner entity with something like

<entity name="message"
           dataSource="dastream"
           processor="TikaEntityProcessor"
           dataField="messages.MESSAGE"
           format="xml">
    <field column="text" name="mxMsg"/>
  </entity>

--- this assumes that you get the blob from a column named "MESSAGE" in the
outer entity ("messages").


On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua <chand...@nrifintech.com>wrote:

> Hi Raymond !
>
> I've data-config.xml like bellow:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
> <dataSource name="db" driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/>
>  <dataSource name="dastream" type="FieldStreamDataSource" />
>  <document>
>   <entity
>       name="messages" pk=" PK" transformer='DateFormatTransformer'
>       query="select * from table1"
>       dataSource="db">
>          <field column =" PK" name ="id" />
>          <field column="last_modified"  dateTimeFormat="YYYY-MM-DD
> HH24:MI:SS" locale="en" />
>     <entity
>         name="message"
>         dataSource="dastream"
>         processor="TikaEntityProcessor"
>         url="message"
>         dataField="db.MESSAGE"
>                 format="text"
>         >
>
>         <field column="text" name="mxMsg" blob="true"/>
>       </entity>
>     </entity>
>
>
>  </document>
> </dataConfig>
>
>
>
> This is looks like similar to your configuration. But when xml data are in
> BLOB in database, indexing is done. But, when binary data are in BLOB in
> database, indexing is NOT done.
> Please help.
>
> Thanking you,
> -Chandan
>
>
> -----Original Message-----
> From: Raymond Wiker [mailto:rwi...@gmail.com]
> Sent: Monday, February 24, 2014 4:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can not index raw binary data stored in Database in BLOB
> format.
>
> I've done something like this; the key was to use a FieldStreamDataSource
> to
> read from the BLOB field.
>
> Something like
>
> <datasource name="main" ...>
> <dataSource type="FieldStreamDataSource" name="fieldstream"/>
>
> then
>
>       <entity name="tika" processor="TikaEntityProcessor"
> dataField="main.BLOB" dataSource="fieldstream" format="xml">
>         <field column="Author" meta="true" name="..."/>
>         <field column="title" meta="true" name="title"/>
>         <field column="text" name="content"/>
>         <field column="content_type" name="content_type" meta="true"/>
>         <field column="last_modified" name="last_modified" meta="true"/>
>     </entity>
>
> ...
>
>
>
>
> On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua
> <chand...@nrifintech.com>wrote:
>
> > Hi Gora !
> >
> > Your concern was "What is the type of the column used to store the
> > binary data in Oracle?"
> > The column type is BLOB in DB.  The column can also have rich text file.
> >
> > Regards,
> > Chandan
> >
> >
> > -----Original Message-----
> > From: Gora Mohanty [mailto:g...@mimirtech.com]
> > Sent: Monday, February 24, 2014 3:02 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Can not index raw binary data stored in Database in BLOB
> > format.
> >
> > On 24 February 2014 12:51, Chandan khatua <chand...@nrifintech.com>
> wrote:
> > > Hi,
> > >
> > >
> > >
> > > We have raw binary data stored in database(not word,excel,xml etc
> > > files) in BLOB.
> > >
> > > We are trying to index using TikaEntityProcessor but nothing seems
> > > to get indexed.
> > >
> > > But the same configuration works when xml/word/excel files are
> > > stored in the BLOB field.
> >
> > Please start by reviewing
> > http://wiki.apache.org/solr/DataImportHandler as the above seems quite
> > confused. Why are you using TikaEntityProcessor if the data in the DB
> > are not richtext files?
> >
> > What is the type of the column used to store the binary data in
> > Oracle? You might be able to convert it with a ClobTransformer. Please
> > see http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
> >
> > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_tab
> > le_are
> > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5
> >
> > Regards,
> > Gora
> >
> >
>
>

Reply via email to