t;>>> should run Tika separately as it's entirely possible for it to fail to
>>>> parse a PDF and crash - and if you're running it in DIH & Solr it then
>>>> brings down everything. Separate your PDF processing from your Solr
>>>> indexing.
>
memory footprint. For example, the
following will limit it to 2GB
> java -Xmx2048m -jar tika-server-1.24.jar
- H
-Original Message-
From: Jan Høydahl [mailto:jan@cominvent.com]
Sent: August 26, 2020 6:19 AM
To: solr-user
Subject: [EXT] Re: PDF extraction using Tika
When I wor
a PDF
>>> and crash - and if you're running it in DIH & Solr it then brings down
>>> everything. Separate your PDF processing from your Solr indexing.
>>>
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>>>>
>>>&
- and if you're running it in DIH & Solr it
then brings down everything. Separate your PDF processing from your
Solr indexing.
Cheers
Charlie
Thanks,
Srinivas Kashyap
-Original Message-
From: Alexandre Rafalovitch
Sent: 24 August 2020 20:54
To: solr-user
Subject: Re: PDF extraction using
Thanks Phil,
I will modify it according to the need.
Thanks,
Srinivas
-Original Message-
From: Phil Scadden
Sent: 26 August 2020 02:44
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction using Tika
Code for solrj is going to be very dependent on your needs but the beating
Admin", password);
UpdateResponse ur = req.process(solr,"prindex");
req.commit(solr, "prindex");
-----Original Message-----
From: Srinivas Kashyap
Sent: Tuesday, 25 August 2020 17:04
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction usi
PDF extraction using Tika
The issue seems to be more with a specific file and at the level way
below Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.ja
Sent: 24 August 2020 20:54
To: solr-user
Subject: Re: PDF extraction using Tika
The issue seems to be more with a specific file and at the level way below
Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offs
from PDF and pushes into solr?
Thanks,
Srinivas Kashyap
-Original Message-
From: Alexandre Rafalovitch
Sent: 24 August 2020 20:54
To: solr-user
Subject: Re: PDF extraction using Tika
The issue seems to be more with a specific file and at the level way below
Solr's or possibly even
The issue seems to be more with a specific file and at the level way
below Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045)
Are you indexing the
10 matches
Mail list logo