Here's the response: >>>>>> Karl -
There’s expandMacros=false, as covered here: https://cwiki.apache.org/ confluence/display/solr/Parameter+Substitution But… what exactly is being sent to Solr? Is there some kind of “${…” being sent as a parameter? Just curious what’s getting you into this in the first place. But disabling probably is your most desired solution. Erik <<<<<< Karl On Wed, Jun 14, 2017 at 9:36 AM, Karl Wright <[email protected]> wrote: > Here's the question I posted: > > >>>>>> > Hi all, > > I've got a ManifoldCF user who is posting content to Solr using the MCF > Solr output connector. This connector uses SolrJ under the covers -- a > fairly recent version -- but also has overridden some classes to insure > that multipart form posts will be used for most content. > > The problem is that, for a specific document, the user is getting an > ArrayIndexOutOfBounds exception in Solr, as follows: > > >>>>>> > 2017-06-14T08:25:16,546 - ERROR [qtp862890654-69725:SolrException@148] - > {collection=c:documentum_manifoldcf_stg, > core=x:documentum_manifoldcf_stg_shard1_replica1, > node_name=n:**********:8983_solr, replica=r:core_node1, shard=s:shard1} - > java.lang.StringIndexOutOfBoundsException: String index out of range: -296 > at java.lang.String.substring(String.java:1911) > at org.apache.solr.request.macro.MacroExpander._expand(MacroExp > ander.java:143) > at org.apache.solr.request.macro.MacroExpander.expand(MacroExpa > nder.java:93) > at org.apache.solr.request.macro.MacroExpander.expand(MacroExpa > nder.java:59) > at org.apache.solr.request.macro.MacroExpander.expand(MacroExpa > nder.java:45) > at org.apache.solr.request.json.RequestUtil.processParams(Reque > stUtil.java:157) > at org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginU > tils.java:172) > at org.apache.solr.handler.RequestHandlerBase.handleRequest(Req > uestHandlerBase.java:152) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall. > java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java: > 460) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp > atchFilter.java:257) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp > atchFilter.java:208) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte > r(ServletHandler.java:1652) > at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan > dler.java:585) > at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped > Handler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa > ndler.java:577) > at org.eclipse.jetty.server.session.SessionHandler.doHandle( > SessionHandler.java:223) > at org.eclipse.jetty.server.handler.ContextHandler.doHandle( > ContextHandler.java:1127) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand > ler.java:515) > at org.eclipse.jetty.server.session.SessionHandler.doScope( > SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler.doScope( > ContextHandler.java:1061) > at org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped > Handler.java:141) > at org.eclipse.jetty.server.handler.ContextHandlerCollection.ha > ndle(ContextHandlerCollection.java:215) > at org.eclipse.jetty.server.handler.HandlerCollection.handle( > HandlerCollection.java:110) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl > erWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:499) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel. > java:310) > at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne > ction.java:257) > at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnec > tion.java:540) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued > ThreadPool.java:635) > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedT > hreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > <<<<<< > > It looks worrisome to me that there's now possibly some kind of "macro > expansion" that is being triggered within parameters being sent to Solr. > Can anyone tell me either how to (a) disable this feature, or (b) how the > MCF Solr output connector should escape parameters being posted so that > Solr does not attempt any macro expansion? If the latter, I also need to > know when this feature appeared, since obviously whether or not to do the > escaping will depend on the precise version of the Solr instance involved. > > I'm also quite concerned that considerations of backwards compatibility > may have been lost at some point with Solr, since heretofore I could count > on older versions of SolrJ working with newer versions of Solr. Please > clarify what the current policy is.... > > > Thanks, > Karl > <<<<<< > > > > On Wed, Jun 14, 2017 at 9:35 AM, Karl Wright <[email protected]> wrote: > >> I posted the pertinent question to the solr dev list. Let's see what >> they say. >> >> Thanks, >> Karl >> >> >> On Wed, Jun 14, 2017 at 9:04 AM, Karl Wright <[email protected]> wrote: >> >>> Hi, >>> >>> The exception in the solr.log should be reported as a Solr bug. It is >>> not emanating from the Tika extractor (Solr Cell), but is in Solr itself. >>> >>> I wish there was an easy fix for this. The problem is *not* an empty >>> stream; it's that Solr is attempting to do something with it that it >>> shouldn't. MCF just gets back a 500 error from Solr, and we can't recover >>> from that. >>> >>> >>>>>> >>> https://**********/webtop/component/drl?versionLabel=CURRENT&objectId=091e8486805142f5 >>> (500) >>> <<<<<< >>> >>> Karl >>> >>> >>> >>> >>> On Wed, Jun 14, 2017 at 8:29 AM, Tamizh Kumaran Thamizharasan < >>> [email protected]> wrote: >>> >>>> Hi Karl, >>>> >>>> >>>> >>>> After configuring Solr to ignore Tika errors by adding Tika transformer >>>> in the job, below behavior is observed. >>>> >>>> >>>> >>>> 1) ManifoldCF fetches the content from documentum, which contains >>>> null content and tries to push it to the output connector(Solr). >>>> >>>> 2) Solr couldn’t accept the null as a value and throwing “Missing >>>> content stream” error. >>>> >>>> 3) Each agent thread In ManifoldCF internally held-up with >>>> different r_object_id’s that don’t have body content and keeps trying to >>>> push the content to Solr after each failure, but Solr couldn’t accept the >>>> content and throws the same error. >>>> >>>> 4) Over the time, the manifold job stops with the error thrown by >>>> Solr >>>> >>>> >>>> >>>> Please let know if there is any configuration change which can help us >>>> resolve this issue. >>>> >>>> >>>> >>>> Please find the attached manifoldCF error log,Solr error log and agent >>>> log. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Tamizh Kumaran. >>>> >>>> >>>> >>>> *From:* Karl Wright [mailto:[email protected]] >>>> *Sent:* Tuesday, June 13, 2017 2:23 PM >>>> *To:* [email protected] >>>> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani >>>> *Subject:* Re: ManifoldCF documentum indexing issue >>>> >>>> >>>> >>>> Hi Tamizh, >>>> >>>> >>>> >>>> The reported error is 'Error from server at http://localhost:8983/solr/ >>>> documentum_manifoldcf_stg: String index out of range: -188'. The >>>> message seemingly indicates that the error was *received* from the solr >>>> server for one specific document. ManifoldCF does not recognize the error >>>> as being innocuous and therefore it will retry for a while until it >>>> eventually gives up and halts the job. However, I cannot find that exact >>>> text anywhere in the Solr output connector code, so I wonder if you >>>> transcribed it correctly? >>>> >>>> There should also be the following: >>>> >>>> (1) A record of the attempts in the manifoldcf.log file, with a MCF >>>> stack trace attached to each one; >>>> >>>> (2) Simple history records for that document that are of the type >>>> INGESTDOCUMENT. >>>> >>>> (3) Solr log entries that have a Solr stack trace. >>>> >>>> >>>> >>>> The last one is the one that would be the most helpful. It is possible >>>> that you are seeing a problem in Solr Cell (Tika) that is manifesting >>>> itself in this way. You can (and should) configure your Solr to ignore >>>> Tika errors. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jun 13, 2017 at 3:20 AM, Tamizh Kumaran Thamizharasan < >>>> [email protected]> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> The Manifoldcf 2.7.1 is running in the multiprocess zk model and >>>> integrated with PostgreSQL 9.3. The expected setup is to crawl the >>>> Documentum contents and pushed on to the output SOLR 5.3.2. The crawler-ui >>>> app is installed on the tomcat and startup script is pointed with the MF >>>> properties.xml during server startup. Manifold along with the bundled ZK, >>>> tomcat are running on the same host with OS as Red Hat Enterprise Linux >>>> Server release 6.9 (Santiago). The DB is running on a windows box. >>>> >>>> The ZK is integrated with the DB through the properties.xml and >>>> properties-global.xml >>>> >>>> The ZK, the documentum related processes(registry and server) are up >>>> and the two agents (start-agents.sh and start-agents-2.sh) are started >>>> which produce multiple threads to index the documemtum contents into SOLR >>>> through ManifoldCF. >>>> >>>> >>>> >>>> The Current no of the connections configured on the MF are as below. >>>> >>>> SOLR Output max connection : 25 >>>> >>>> Document repository Max Connections: 25 >>>> >>>> Properties.xml: >>>> >>>> <property name="org.apache.manifoldcf.database.maxhandles" value="50"/> >>>> >>>> <property name="org.apache.manifoldcf.crawler.threads" value="25"/> >>>> >>>> Total documentum document count : 0.5 million >>>> >>>> >>>> >>>> After the Job is started, it indexed some 20000+ documents and gets >>>> terminated with the below error on the Manifold JOB. >>>> >>>> Error: Repeated service interruptions - failure processing document: >>>> Error from server at http://localhost:8983/solr/doc >>>> umentum_manifoldcf_stg: String index out of range: -188 >>>> >>>> >>>> >>>> Please find the attached manifoldCF error log and agent log. >>>> >>>> >>>> >>>> Please let me know the observations on the cause of the issue and the >>>> configuration on the threads used for crawling. Please share your >>>> thoughts. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Tamizh Kumaran >>>> >>>> >>>> >>>> >>>> >>> >>> >> >
