Yes, I know the reasons why put this work on a client rather than use Solr
directly and it should be maybe the next my task.
But I need to finish first my task - index a pdf files stored in SqlBase
database. The pdf files are pretty simple, sometimes only dozens text lines.
Regards,
Aruna
On Wed, Apr 3, 2019 at 5:03 PM Erick Erickson
wrote:
> For a lot of reasons, I greatly prefer to put this work on a client rather
> than use Solr directly. Here’s a place to get started, it connects to a DB
> and also scans local file directory for docs to push through (local) Tika
> and index. So you should be able to modify it relatively easily to get the
> data from SqlBase, read the associated PDF, combine the two and send to
> Solr.
>
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> The code itself is a bit old, but illustrates the process.
>
> Best,
> Erick
>
> > On Apr 2, 2019, at 11:46 PM, Arunas Spurga wrote:
> >
> > Hello,
> >
> > I got a task to index in Solr 7.71 a PDF files which are stored in
> SqlBase
> > database. I did half the job - I can to index all table fields, I can do
> a
> > search in these fields except field in which is stored a pdf file
> content.
> > As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
> > understand how to force to extract and index field with pdf content. I
> need
> > a help.
> >
> > Regards,
> >
> > Aruna
> >
> > in solrconfig.xml i have
> >
> >
> > * dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib"
> > regex=".*\.jar" /> > regex="solr-dataimporthandler-.*\.jar" /> *
> > * > regex=".*\.jar" />*
> > * > regex="solr-cell-\d.*\.jar" />*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * > startup="lazy"
> > class="solr.extraction.ExtractingRequestHandler" > > name="defaults"> true > name="fmap.meta">ignored_ > name="fmap.content">_text_ *
> >
> >
> >
> >
> >
> > * > class="org.apache.solr.handler.dataimport.DataImportHandler">> name="defaults">db-data-config.xml
> > *
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> *-db-data-config.xml > type="JdbcDataSource"
> > driver="jdbc.unify.sqlbase.SqlbaseDriver"
> > url="jdbc:sqlbase://localhost:2155/PDFDOCS"
> > user="sysadm"password="sysadm" /> > name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
> > > name="PDF" />
> > *
>
>