Thank you very much for replying. I know it's holiday season and you probably
have a million things to do!
OMG, it is working now that I am using the version of SolrUtils you pointed to.
I had previously focused on a version where it uses SystemDefaultHttpClient but
not as a static. It seems that making it static made a critical difference. So
this is awesome.
For the record, I would say I am using solrj 5.4.1, based on the presence of
the following files in my Nutch directories.
./apache-nutch-1.12/runtime/local/plugins/indexer-solr/solr-solrj-5.4.1.jar
./apache-nutch-1.12/build/plugins/indexer-solr/solr-solrj-5.4.1.jar
For httpclient, within the nutch.12 directories, I have a lot of jars in my
nutch folder.
./apache-nutch-1.12/runtime/local/lib/httpclient-4.3.5.jar
./apache-nutch-1.12/runtime/local/lib/commons-httpclient-3.1.jar
./apache-nutch-1.12/runtime/local/plugins/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/runtime/local/plugins/indexer-solr/httpclient-4.4.1.jar
./apache-nutch-1.12/runtime/local/plugins/lib-htmlunit/httpclient-4.3.4.jar
./apache-nutch-1.12/runtime/local/plugins/lib-selenium/httpclient-4.5.1.jar
./apache-nutch-1.12/runtime/local/plugins/indexer-cloudsearch/httpclient-4.3.6.jar
./apache-nutch-1.12/build/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/build/lib/httpclient-4.3.5.jar
./apache-nutch-1.12/build/lib/commons-httpclient-3.1.jar
./apache-nutch-1.12/build/plugins/protocol-httpclient/protocol-httpclient.jar
./apache-nutch-1.12/build/plugins/indexer-solr/httpclient-4.4.1.jar
./apache-nutch-1.12/build/plugins/lib-htmlunit/httpclient-4.3.4.jar
./apache-nutch-1.12/build/plugins/lib-selenium/httpclient-4.5.1.jar
./apache-nutch-1.12/build/plugins/indexer-cloudsearch/httpclient-4.3.6.jar
The hadoop directory has the following httpclient-related
jars/posix/hadoop-2.7.2/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/tools/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/tools/lib/commons-httpclient-3.1.jar
/posix/hadoop-2.7.2/share/hadoop/common/lib/httpclient-4.2.5.jar
/posix/hadoop-2.7.2/share/hadoop/common/lib/commons-httpclient-3.1.jar
Over on the Solr5 machine, we
have./solr-5.4.1/dist/solrj-lib/httpclient-4.4.1.jar
./solr-5.4.1/server/solr-webapp/webapp/WEB-INF/lib/httpclient-4.4.1.jar
thanks again
From: Furkan KAMACI <[email protected]>
To: Michael Coffey <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Thursday, December 22, 2016 10:29 AM
Subject: Re: nutch 1.12 and Solr 5.4.1
Hi Michael,
That dependencies you sent are from ivy cache. I need to know the versions
of Solr and HTTP Client. You problem is probably a jar mismatch between
hadoop and Solr. Nutch 1.12 should work with Solr 5.4.1 as you can check
from here:
https://github.com/apache/nutch/blob/release-1.12/src/plugin/indexer-solr/ivy.xml
So, there maybe a bug at Nutch. Here is a workaround at given issue by you:
https://issues.apache.org/jira/browse/NUTCH-2267 Could you apply it to
SolrUtils.java (
https://github.com/sjwoodard/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java)
and check again? If you still get that error, I can try to fix it.
Kind Regards,
Furkan KAMACI
On Thu, Dec 22, 2016 at 6:26 PM, Michael Coffey <[email protected]> wrote:
> Is it possible to get around this problem by using an older version of
> Solr or Nutch or both?
>
>
> ------------------------------
> *From:* Michael Coffey <[email protected]>
> *To:* "[email protected]" <[email protected]>; Furkan KAMACI <
> [email protected]>; Michael Coffey <[email protected]>
> *Sent:* Tuesday, December 20, 2016 8:41 PM
> *Subject:* Re: nutch 1.12 and Solr 5.4.1
>
> This should work, shouldn't it? But it is not working. I am using Nutch
> 1.12 with the recommended version of Solr (5.4.1) and Hadoop 2.7.2. I
> haven't changed any Java code, but I get a low-level Java error when trying
> to write to the index. Is this not a tested configuration? Based on web
> searching, I know that others have had similar problems, going back several
> months, but I haven't seen any solutions. I did try a couple of variations
> on the patch posted for NUTCH-2267 (a slightly different manifestation) and
> that did not help. I notice that the 2267 patch has been reverted in the
> master branch.
> I am willing to work on some Java code, if necessary, to help resolve
> this. At this point, I don't know what to try next, other than switching to
> ElasticSearch.
>
> From: Michael Coffey <[email protected]>
>
> To: "[email protected]" <[email protected]>; Furkan KAMACI <
> [email protected]>; Michael Coffey <[email protected]>
> Sent: Monday, December 19, 2016 7:13 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> Some additional info: I am using solr.server.type=http, not cloud. I have
> tried plugins.include with protocol-http and also with protocol-httpclient.
> My current settings are listed below. Also, I am using hadoop 2.7.2, in
> case that matters.
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|urlfilter-regex|parse-(html|
> tika)|index-(basic|anchor)|indexer-solr|scoring-opic|
> urlnormalizer-(pass|regex|basic)</value>
> </property>
>
> <property>
> <name>solr.server.type</name>
> <value>http</value>
> <description>
> Specifies the SolrServer implementation to use. This is a string value
> of one of the following 'cloud', 'concurrent', 'http' or 'lb'.
> The values represent CloudSolrServer, ConcurrentUpdateSolrServer,
> HttpSolrServer or LBHttpSolrServer respectively.
> </description>
> </property>
>
> <property>
> <name>solr.server.url</name>
> <value>http://solr5-00:8983/solr/nutch-0</value>
> <description>
> Defines the Solr URL into which data should be indexed using the
> indexer-solr plugin.
> </description>
> </property>
>
> From: Michael Coffey <[email protected]>
> To: Furkan KAMACI <[email protected]>; "[email protected]" <
> [email protected]>
> Sent: Monday, December 19, 2016 5:10 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> I'm not sure how to do that. According to a find command, I have more than
> one solrj on the nutch machine../hadass/apache-nutch-
> 1.12/runtime/local/plugins/indexer-solr/solr-solrj-5.4.1.
> jar./hadass/apache-nutch-1.12/build/plugins/indexer-solr/
> solr-solrj-5.4.1.jar./.ivy2/cache/org.apache.solr/solr-
> solrj./.ivy2/cache/org.apache.solr/solr-solrj/jars/solr-
> solrj-5.4.1.jar./.ivy2/cache/org.apache.solr/solr-solrj/jars/solr-solrj-4.6.0.jar
> On
> the solr machine, I have./solr-5.4.1/dist/solrj-lib
> ./solr-5.4.1/server/solr-webapp/webapp/WEB-INF/lib/solr-solrj-5.4.1.jar
> ./solr-5.4.1/docs/solr-solrj
> ./solr-5.4.1/docs/solr-solrj/org/apache/solr/client/solrj
> ./solr-5.4.1/docs/solr-core/org/apache/solr/client/solrj
>
> Should I make the change to SolrUtils.java, mentioned in
> https://issues.apache.org/jira/browse/NUTCH-2267
> Lewis and Stephen might know about this.
>
> From: Furkan KAMACI <[email protected]>
> To: Michael Coffey <[email protected]>; [email protected]
> Sent: Monday, December 19, 2016 4:13 PM
> Subject: Re: nutch 1.12 and Solr 5.4.1
>
> Hi Michael,
> Could you check the version of solrj at your Nutch and compare it with
> version of Solr at your server?
> Kind Regards,Furkan KAMACI
> On Dec 20, 2016 1:01 AM, "Michael Coffey" <[email protected]>
> wrote:
>
> What is the recommended fix (or workaround) for the "bad return type"
> error related to "Type 'org/apache/http/impl/client/ DefaultHttpClient'
> (current frame, stack[0]) is not assignable to
> 'org/apache/http/impl/client/ CloseableHttpClient'"
> It seems that switching to different versions of Solr has not helped
> (6.3.0, 5.5.3, 5.4.1). FWIW, I have same version of Java on both machines.
>
> OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.14.04.1)
> OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
>
>
>
> From: Michael Coffey <[email protected]>
> To: "[email protected]" <[email protected]>; Michael Coffey <
> [email protected]>
> Sent: Saturday, November 19, 2016 8:05 AM
> Subject: Re: nutch 1.12 and Solr 6.3.0
>
> I think this is what Lewis and Furkan know as NUTCH-2267. I get the same
> problem with Solr 5.5.3.
>
> I really would like to know which versions of nutch/solar work together
> "out of the box".
>
> From: Michael Coffey <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Friday, November 18, 2016 2:04 PM
> Subject: nutch 1.12 and Solr 6.3.0
>
> I decided to plunge ahead with Solr indexing, but so far it doesn't work.
> The first error I got is listed below. Could it be that I am running JDK 7
> on the nutch server and JDK 8 on the Solr server. As far as I know Nutch
> 1.x won't work with JDK 8 and Solr 6.3 wont work with JDK less than 8. Any
> suggestions or advice?
>
> 16/11/18 13:59:52 INFO mapreduce.Job: Task Id :
> attempt_1479499237600_0021_r_ 000000_0, Status : FAILED
> Error: Bad return type
> Exception Details:
> Location:
> org/apache/solr/client/solrj/ impl/HttpClientUtil.
> createClient(Lorg/apache/solr/ common/params/SolrParams;Lorg/
> apache/http/conn/ ClientConnectionManager;)Lorg/ apache/http/impl/client/
> CloseableHttpClient; @58: areturn
> Reason:
> Type 'org/apache/http/impl/client/ DefaultHttpClient' (current frame,
> stack[0]) is not assignable to 'org/apache/http/impl/client/
> CloseableHttpClient' (from method signature)
> Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/ params/SolrParams',
> 'org/apache/http/conn/ ClientConnectionManager', 'org/apache/solr/common/
> params/ModifiableSolrParams', 'org/apache/http/impl/client/
> DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/ DefaultHttpClient' }
> Bytecode:
> 0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0000030: b800 104e 2d2c b800 0f2d b0
> Stackmap Table:
> append_frame(@47,Object[#143])
>
> Container killed by the ApplicationMaster.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>