Re: Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Yes, I know the reasons why put this work on a client rather than use Solr
directly and it should be maybe the next my task.
But I need to finish first my task - index a pdf files stored in SqlBase
database. The pdf files are pretty simple, sometimes only dozens text lines.

Regards,

Aruna

On Wed, Apr 3, 2019 at 5:03 PM Erick Erickson 
wrote:

> For a lot of reasons, I greatly prefer to put this work on a client rather
> than use Solr directly. Here’s a place to get started, it connects to a DB
> and also scans local file directory for docs to push through (local) Tika
> and index. So you should be able to modify it relatively easily to get the
> data from SqlBase, read the associated PDF, combine the two and send to
> Solr.
>
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> The code itself is a bit old, but illustrates the process.
>
> Best,
> Erick
>
> > On Apr 2, 2019, at 11:46 PM, Arunas Spurga  wrote:
> >
> > Hello,
> >
> > I got a task to index in Solr 7.71 a PDF files which are stored in
> SqlBase
> > database. I did half the job - I can to index all table fields, I can do
> a
> > search in these fields except field in which is stored a pdf file
> content.
> > As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
> > understand how to force to extract and index field with pdf content. I
> need
> > a help.
> >
> > Regards,
> >
> > Aruna
> >
> > in solrconfig.xml i have
> >
> >
> > *  dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib"
> > regex=".*\.jar" />   > regex="solr-dataimporthandler-.*\.jar" /> *
> > *   > regex=".*\.jar" />*
> > *   > regex="solr-cell-\d.*\.jar" />*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * > startup="lazy"
> > class="solr.extraction.ExtractingRequestHandler" > > name="defaults">  true   > name="fmap.meta">ignored_   > name="fmap.content">_text_  *
> >
> >
> >
> >
> >
> > * > class="org.apache.solr.handler.dataimport.DataImportHandler">> name="defaults">db-data-config.xml   
> > *
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> *-db-data-config.xml > type="JdbcDataSource"
> > driver="jdbc.unify.sqlbase.SqlbaseDriver"
> > url="jdbc:sqlbase://localhost:2155/PDFDOCS"
> > user="sysadm"password="sysadm" />  > name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
> >  > name="PDF" />
> > *
>
>


Indexing PDF files in SqlBase database

2019-04-03 Thread Arunas Spurga
Hello,

I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase
database. I did half the job - I can to index all table fields, I can do a
search in these fields except field in which is stored a pdf file content.
As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
understand how to force to extract and index field with pdf content. I need
a help.

Regards,

Aruna

in solrconfig.xml i have


**
*  *
*  *









*  true  ignored_  _text_  *





*   db-data-config.xml   
*



















*-db-data-config.xml 
 
*