Re: Tika Error work around?

2019-03-15 Thread Erick Erickson
Assuming here that you’re using DIH or extrracting request handler. 

There are quite a number of reasons to run Tika outside Solr so you
can handle exceptional cases as you see fit, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

> On Mar 14, 2019, at 7:39 PM, wclarke  wrote:
> 
> I am getting an error that stops Tika fetching/processing/and committing when
> it reaches a specific language (Malayalam).  Is there a work around?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Tika Error work around?

2019-03-15 Thread wclarke
I am getting an error that stops Tika fetching/processing/and committing when
it reaches a specific language (Malayalam).  Is there a work around?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Tika error

2012-12-07 Thread Mattmann, Chris A (388J)
Hi Arkadi,

You may want to post this on the u...@tika.apache.org list -- looks like
you are missing the univerisalchardetector library as part of your Solr
Cell installation.

Cheers,
Chris

On 12/6/12 12:02 AM, Arkadi Colson ark...@smartbit.be wrote:

Anybody an idea?

Dec 5, 2012 3:52:32 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=500maxConnectionsPerHost=16
Dec 5, 2012 3:52:33 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk] webapp=/solr path=/update/extract
params={literal.smsc_ssid=1499commit=trueliteral.id=1354722015literal.s
msc_date_edited=2012-11-05T15:09:47Zliteral.smsc_courseID=0literal.smsc_
date_created=2012-1
1-05T15:09:47Zwt=jsonliteral.smsc_module=intradesk} {} 0 313
Dec 5, 2012 3:52:33 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
org/mozilla/universalchardet/CharsetListener
 at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.ja
va:469)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
a:297)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
onFilterChain.java:243)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
Chain.java:210)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.
java:222)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.
java:123)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1
71)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:9
9)
 at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja
va:118)
 at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407
)
 at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Pro
cessor.java:1004)
 at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstr
actProtocol.java:589)
 at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.jav
a:310)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.
java:886)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
:908)
 at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoClassDefFoundError:
org/mozilla/universalchardet/CharsetListener
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 at 
org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClass
Loader.java:2904)
 at 
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.j
ava:1173)
 at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j
ava:1681)
 at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j
ava:1559)
 at 
org.apache.tika.parser.txt.UniversalEncodingDetector.detect(UniversalEncod
ingDetector.java:40)
 at 
org.apache.tika.detect.AutoDetectReader.detect(AutoDetectReader.java:51)
 at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:92)
 at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:98)
 at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:70)
 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extractin
gDocumentLoader.java:219)
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
StreamHandlerBase.java:74)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
e.java:129)
 at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReque
st(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
:455)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
a:276)
 ... 15 more
Caused by: java.lang.ClassNotFoundException:
org.mozilla.universalchardet.CharsetListener
 at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j
ava:1714)
 at 

Tika error

2012-12-06 Thread Arkadi Colson

Anybody an idea?

Dec 5, 2012 3:52:32 PM org.apache.solr.client.solrj.impl.HttpClientUtil 
createClient
INFO: Creating new http client, 
config:maxConnections=500maxConnectionsPerHost=16
Dec 5, 2012 3:52:33 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk] webapp=/solr path=/update/extract 
params={literal.smsc_ssid=1499commit=trueliteral.id=1354722015literal.smsc_date_edited=2012-11-05T15:09:47Zliteral.smsc_courseID=0literal.smsc_date_created=2012-1

1-05T15:09:47Zwt=jsonliteral.smsc_module=intradesk} {} 0 313
Dec 5, 2012 3:52:33 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
org/mozilla/universalchardet/CharsetListener
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoClassDefFoundError: 
org/mozilla/universalchardet/CharsetListener

at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at 
org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:2904)
at 
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:1173)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1681)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1559)
at 
org.apache.tika.parser.txt.UniversalEncodingDetector.detect(UniversalEncodingDetector.java:40)
at 
org.apache.tika.detect.AutoDetectReader.detect(AutoDetectReader.java:51)
at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:92)
at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:98)

at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:70)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)

... 15 more
Caused by: java.lang.ClassNotFoundException: 
org.mozilla.universalchardet.CharsetListener
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1714)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1559)

... 38 more

Dec 6, 2012 7:58:02 AM 
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk] webapp=/solr path=/update/extract 

Fwd: Tika error

2012-12-06 Thread Arkadi Colson

However the tomcat logs are reporting:

INFO: Adding 
'file:/opt/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to 
classloader
Dec 6, 2012 3:42:57 PM org.apache.solr.core.SolrResourceLoader 
replaceClassLoader




 Original Message 
Subject:Tika error
Date:   Thu, 06 Dec 2012 09:02:14 +0100
From:   Arkadi Colson ark...@smartbit.be
Reply-To:   ark...@smartbit.be
Organization:   Smartbit bvba
To: solr-user@lucene.apache.org solr-user@lucene.apache.org



Anybody an idea?

Dec 5, 2012 3:52:32 PM org.apache.solr.client.solrj.impl.HttpClientUtil 
createClient
INFO: Creating new http client, 
config:maxConnections=500maxConnectionsPerHost=16
Dec 5, 2012 3:52:33 PM 
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk] webapp=/solr path=/update/extract 
params={literal.smsc_ssid=1499commit=trueliteral.id=1354722015literal.smsc_date_edited=2012-11-05T15:09:47Zliteral.smsc_courseID=0literal.smsc_date_created=2012-1

1-05T15:09:47Zwt=jsonliteral.smsc_module=intradesk} {} 0 313
Dec 5, 2012 3:52:33 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
org/mozilla/universalchardet/CharsetListener
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:931)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoClassDefFoundError: 
org/mozilla/universalchardet/CharsetListener

at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at 
org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:2904)
at 
org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:1173)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1681)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1559)
at 
org.apache.tika.parser.txt.UniversalEncodingDetector.detect(UniversalEncodingDetector.java:40)
at 
org.apache.tika.detect.AutoDetectReader.detect(AutoDetectReader.java:51)
at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:92)
at 
org.apache.tika.detect.AutoDetectReader.init(AutoDetectReader.java:98)

at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:70)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)

... 15 more
Caused