[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2020-04-21 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1086:
---
Fix Version/s: (was: 1.17)
   1.18

> Rewrite protocol-httpclient
> ---
>
> Key: NUTCH-1086
> URL: https://issues.apache.org/jira/browse/NUTCH-1086
> Project: Nutch
>  Issue Type: Improvement
>  Components: protocol
>Affects Versions: nutchgora, 1.5
>Reporter: Markus Jelsma
>Assignee: Fabio Santagostino
>Priority: Major
> Fix For: 1.18
>
> Attachments: Http.java, HttpResponse.java
>
>
> There are several issues about protocol-httpclient and several comments about 
> rewriting the plugin with the new http client libraries. There is, however, 
> not yet an issue for rewriting/reimplementing protocol-httpclient.
> http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2019-10-11 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1086:
---
Fix Version/s: (was: 2.5)

> Rewrite protocol-httpclient
> ---
>
> Key: NUTCH-1086
> URL: https://issues.apache.org/jira/browse/NUTCH-1086
> Project: Nutch
>  Issue Type: Improvement
>  Components: protocol
>Affects Versions: nutchgora, 1.5
>Reporter: Markus Jelsma
>Assignee: Fabio Santagostino
>Priority: Major
> Fix For: 1.17
>
> Attachments: Http.java, HttpResponse.java
>
>
> There are several issues about protocol-httpclient and several comments about 
> rewriting the plugin with the new http client libraries. There is, however, 
> not yet an issue for rewriting/reimplementing protocol-httpclient.
> http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2019-09-10 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1086:
---
Fix Version/s: (was: 1.16)
   1.17

> Rewrite protocol-httpclient
> ---
>
> Key: NUTCH-1086
> URL: https://issues.apache.org/jira/browse/NUTCH-1086
> Project: Nutch
>  Issue Type: Improvement
>  Components: protocol
>Affects Versions: nutchgora, 1.5
>Reporter: Markus Jelsma
>Assignee: Fabio Santagostino
>Priority: Major
> Fix For: 2.5, 1.17
>
> Attachments: Http.java, HttpResponse.java
>
>
> There are several issues about protocol-httpclient and several comments about 
> rewriting the plugin with the new http client libraries. There is, however, 
> not yet an issue for rewriting/reimplementing protocol-httpclient.
> http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2018-06-13 Thread Sebastian Nagel (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1086:
---
Fix Version/s: 1.16

> Rewrite protocol-httpclient
> ---
>
> Key: NUTCH-1086
> URL: https://issues.apache.org/jira/browse/NUTCH-1086
> Project: Nutch
>  Issue Type: Improvement
>  Components: protocol
>Affects Versions: nutchgora, 1.5
>Reporter: Markus Jelsma
>Assignee: Fabio Santagostino
>Priority: Major
> Fix For: 2.5, 1.16
>
> Attachments: Http.java, HttpResponse.java
>
>
> There are several issues about protocol-httpclient and several comments about 
> rewriting the plugin with the new http client libraries. There is, however, 
> not yet an issue for rewriting/reimplementing protocol-httpclient.
> http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2015-07-24 Thread Nikolai Vasilev (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640872#comment-14640872
 ] 

Nikolai Vasilev commented on NUTCH-1086:


Hello Peter,
the deprecation warning you see tells that you should no longer create 
HttpClient with DefaultHttpClient, and use HttpClientBuilder instead:
http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/DefaultHttpClient.html
{code}
Deprecated. 
(4.3) use HttpClientBuilder see also CloseableHttpClient.
{code}

There is a flaw in Fabio's implementation. By default DefaultHttpClient uses 
[BasicConnectionManager|http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/conn/BasicClientConnectionManager.html],
 which is not supposed to manage connections in multithreaded environment. 
Which is crucial for Nutch. The 
[PoolingClientConnectionManager|http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/conn/PoolingClientConnectionManager.html]
 should be used instead.

In our project we launch Nutch at Amazon EMR and we suffered some weird 
dependency clashing, when tried to rewrite protocol-httpclient to 
HttpClient4.X. Unfortunatelly I have lost logs with errors and cannot tell 
exactly what was wrong.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Assignee: Fabio Santagostino
 Fix For: 2.4

 Attachments: Http.java, HttpResponse.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2015-07-05 Thread Peter Ciuffetti (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614244#comment-14614244
 ] 

Peter Ciuffetti commented on NUTCH-1086:


After and unsuccessful an attempt to resolve NUTCH-2059 I though Id try this 
upgrade to httpclient.  I placed the attached java files into a branch based on 
the v1.11 trunk.  But Im getting a unit test failure and some deprecation 
compiler warnings.

{code}
compile:
 [echo] Compiling plugin: protocol-httpclient
[javac] Compiling 10 source files to 
/Users/pciuffetti/Documents/Dev/workspace/nutch/build/protocol-httpclient/classes
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:35:
 warning: [deprecation] ConnRoutePNames in org.apache.http.conn.params has been 
deprecated
[javac] import org.apache.http.conn.params.ConnRoutePNames;
[javac]   ^
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:37:
 warning: [deprecation] DefaultHttpClient in org.apache.http.impl.client has 
been deprecated
[javac] import org.apache.http.impl.client.DefaultHttpClient;
[javac]   ^
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:68:
 warning: [deprecation] DefaultHttpClient in org.apache.http.impl.client has 
been deprecated
[javac]   private static DefaultHttpClient client = new DefaultHttpClient();
[javac]  ^
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:68:
 warning: [deprecation] DefaultHttpClient in org.apache.http.impl.client has 
been deprecated
[javac]   private static DefaultHttpClient client = new DefaultHttpClient();
[javac] ^
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:96:
 warning: [deprecation] DefaultHttpClient in org.apache.http.impl.client has 
been deprecated
[javac]   static synchronized DefaultHttpClient getClient() {
[javac]   ^
[javac] 
/Users/pciuffetti/Documents/Dev/workspace/nutch/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java:201:
 warning: [deprecation] ConnRoutePNames in org.apache.http.conn.params has been 
deprecated
[javac]   
client.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
[javac]   ^
[javac] 6 warnings
{code}

{code}
Testcase: testNtlmAuth took 1.791 sec
FAILED
HTTP Status Code for http://127.0.0.1:47501/ntlm.jsp expected:200 but 
was:401
junit.framework.AssertionFailedError: HTTP Status Code for 
http://127.0.0.1:47501/ntlm.jsp expected:200 but was:401
at 
org.apache.nutch.protocol.httpclient.TestProtocolHttpClient.fetchPage(TestProtocolHttpClient.java:200)
at 
org.apache.nutch.protocol.httpclient.TestProtocolHttpClient.testNtlmAuth(TestProtocolHttpClient.java:162)
{code}

...will investigate if I can resolve these.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Assignee: Fabio Santagostino
 Fix For: 2.4

 Attachments: Http.java, HttpResponse.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2015-02-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1086:

Assignee: Fabio Santagostino

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Assignee: Fabio Santagostino
 Fix For: 2.4

 Attachments: Http.java, HttpResponse.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2015-02-15 Thread Fabio Santagostino (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabio Santagostino updated NUTCH-1086:
--
Attachment: HttpResponse.java

Add httpclient 4.4 library

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
 Fix For: 2.4

 Attachments: Http.java, HttpResponse.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2015-02-15 Thread Fabio Santagostino (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322166#comment-14322166
 ] 

Fabio Santagostino commented on NUTCH-1086:
---

Hi,
I've done an attempt to rewrite the component using httpclient 4.4. It works 
for me !
My main goal was to use a correct implementation of NTLMv2 auhentication for my 
corporate web sites.
Anyway it seams to be backward compatible  with previous implementation. Proxy 
support is the only part I've not tested yet.

I had to change only 2 classes (in attachment)  :
- 
/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
- 
/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java


Of course package dependency files must be modified also. In /ivy/ivy.xml :

+ added httpclient 4.4 version
{code:xml}
  dependency org=org.apache.httpcomponents name=httpclient rev=4.4 
conf=*-master /
{code}

+ updated codec version from {code:xml}dependency org=commons-codec 
name=commons-codec rev=1.3 conf=*-default /{code}  to 
{code:xml}dependency org=commons-codec name=commons-codec rev=1.4 
conf=*-default /{code}

Files in attachment are tested for v1.9 branch, but probably minor changes are 
needed to make it suitable for v2.3.

Regards,
Fabio

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
 Fix For: 2.4

 Attachments: Http.java, HttpResponse.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2015-02-15 Thread Fabio Santagostino (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabio Santagostino updated NUTCH-1086:
--
Attachment: Http.java

Add httpclient 4.4 library

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
 Fix For: 2.4

 Attachments: Http.java


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2014-07-16 Thread Simon Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063643#comment-14063643
 ] 

Simon Zhu commented on NUTCH-1086:
--

Hi Talat/Julien/Markus,

I tested NTCredentials in components httpclient 4.3.4 by using a proxy server 
that requires NTLM authentication, and the response code was 200 OK, However 
when used NTCredentials of commons httpclient 3.1, which is currently used by 
protocol-httpclient, the returned code was 407, indicated the proxy server I'am 
using found NTCredentials in httpclient 3.1 could not explain NTLM protocol 
correctly. I supposed the reason is commons httpclient 3.1 was EOL in 2007 but 
the current NTLM version was released in 2008.

Since httpclient 4.x does not compatible with 3.1, so IMHO it's not easy to 
address the NTLM authentication issue by adding a patch. But will be very happy 
if anyone can help to develop such a patch for the issue.

Appreciate all kinds of advice/suggestions/clues for the proxy server 
authentication issue, more than happy to have further discussions on this.

Regards

Simon



 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2014-04-20 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1086:
-

Priority: Major  (was: Critical)

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2014-04-18 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1086:
-

Component/s: (was: fetcher)
 protocol

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: protocol
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2013-09-17 Thread Talat UYARER (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769338#comment-13769338
 ] 

Talat UYARER commented on NUTCH-1086:
-

Hi Markus,

Yes I know that Httpclient is still in development as part of Apache 
HttpComponents. Second comment is very good information for me. Actually i 
asked that question because i found a little bug in protocol-http: Even If I 
have http.content.limit value set, protocol-http fetches files of all sizes 
(larger files are fetched until limit allows). 
But when Parsing, parser skips incomplete files (parser.skip.truncated 
configuration). It seems like an unnecessary effort to partially fetch contents 
larger than limit if they are not gonna be parsed.
What do you think about this? I will upload a patch about this issue.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2013-09-16 Thread Talat UYARER (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768755#comment-13768755
 ] 

Talat UYARER commented on NUTCH-1086:
-

Markus,

I guess httpclient is end of life. Are you make any development for this issue 
?  

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2013-09-16 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768814#comment-13768814
 ] 

Markus Jelsma commented on NUTCH-1086:
--

Hi Talat - what do you mean by EOL of HttpClient? Version 4.3 was just releases 
a few months ago. I assume you mean that Nutch' implementation of it is old, it 
is indeed! This issue is about completely rewriting Nutch' protocol-httpclient 
plugin to the most recent version of the HttpClient 4.x.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2013-09-16 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768819#comment-13768819
 ] 

Markus Jelsma commented on NUTCH-1086:
--

And to answer your question, no, i'm not working on this issue. We still manage 
with protocol-http and only use protocol-httpclient for TLS connections. It 
still works, for now :)

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.4


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2013-05-08 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1086:
---

Fix Version/s: (was: 1.7)
   1.8
   2.3

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 2.3, 1.8


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2012-09-18 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1086:


Fix Version/s: (was: 2.1)
   2.2

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 1.6, 2.2


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2012-05-11 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1086:


Affects Version/s: 1.5
   nutchgora
Fix Version/s: 2.1
   1.6

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: nutchgora, 1.5
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 1.6, 2.1


 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-04-22 Thread Ross Judson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259288#comment-13259288
 ] 

Ross Judson commented on NUTCH-1086:


The Oracle bug report # is 7129065. HttpUrlConnection-based NTLM auth to 
Sharepoint succeeds with JDK 6, and crashes the VM on JDK. I am investigating 
other solutions to this. 

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma
Priority: Critical

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1086) Rewrite protocol-httpclient

2012-02-17 Thread Lewis John McGibbney (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1086:


Priority: Critical  (was: Major)

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma
Priority: Critical

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-25 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193007#comment-13193007
 ] 

Ferdy Galema commented on NUTCH-1086:
-

Seems like a JVM bug, perhaps you could reproduce it using specific urls? Btw, 
does anyone has an NTLMv2 example URL that is publicly accessible?

Besides lacking NTLMv2 support, is there anything else that isn't working 
properly? Support for https is not entirely broken, because 
https://www.iana.org/; for example can be fetched perfectly fine.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-25 Thread Oleg Kalnichevski (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193031#comment-13193031
 ] 

Oleg Kalnichevski commented on NUTCH-1086:
--

For what it is worth to you, HttpClient users have been reporting the best 
NTLMv2 compatibility results when using JCIFS as an NTLM engine. The trouble is 
the library is LGPL licensed and therefore may not be directly incorporated 
into ASF works. However, you might consider giving your users an option of 
hooking JCIFS up though an extension mechanism of some sort similar to that 
used by HttpClient [1]

Oleg

[1] http://hc.apache.org/httpcomponents-client-ga/ntlm.html

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-19 Thread Lewis John Mcgibbney
Thanks for dropping this on Remi.

For future reference you might want to check out this online book on
subversion [1]. Here at Nutch we use subversion for SCM and therefore this
is the program we use to create patches, applying them and hopefully
improving Nutch in the process ;0) It's straight forward no nonsense source
code management and is real easy to get to grips with given a little time.

Regarding this issue, unfortunately it has been open for a while and
additionally it doesn't look like there is quite enough of a requirement
from those using it to get a new implementation written up yet... I'm not
even using it at all...

Thanks again

Lewis

[1] http://svnbook.red-bean.com/en/1.7/index.html

On Thu, Jan 19, 2012 at 6:56 AM, Remi Tassing (Commented) (JIRA) 
j...@apache.org wrote:


[
 https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188961#comment-13188961]

 Remi Tassing commented on NUTCH-1086:
 -

 For the NTLMv2 issue I used a dirty solution in HttpResponse.java. Inside
 the creator and after the getResponseBodyAsStream()attempt:
 1. I check the result code, if it's 500 (inside finally{...})
 2. I use HttpUrlConnection to authenticate and open a connection
 3. Then read the InputStream, get the Content and change the code to 200

 The problems with that solution are that:
 1. The authentication keys are hardcoded
 2. It doesn't check if the content is valid or not but set the return code
 to 200
 3. Error code 500 doesn't necessarily mean that it's a NTLMv2
 authentication problem

 I have no idea on how to write patches to the trunk...

 Remi

  Rewrite protocol-httpclient
  ---
 
  Key: NUTCH-1086
  URL: https://issues.apache.org/jira/browse/NUTCH-1086
  Project: Nutch
   Issue Type: Improvement
   Components: fetcher
 Reporter: Markus Jelsma
 
  There are several issues about protocol-httpclient and several comments
 about rewriting the plugin with the new http client libraries. There is,
 however, not yet an issue for rewriting/reimplementing protocol-httpclient.
  http://hc.apache.org/httpcomponents-client-ga/

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
 administrators:
 https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
 For more information on JIRA, see: http://www.atlassian.com/software/jira





-- 
*Lewis*


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-18 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188505#comment-13188505
 ] 

Lewis John McGibbney commented on NUTCH-1086:
-

When trying to access some SharePoint(IIS) website using NTLMv2 authentication, 
Nutch fails and gets an error code 500. HttpClient only supports an early 
version of NTLM but not NTLMv2. HttpUrlConnection can be used instead.

[1]http://oaklandsoftware.com/papers/ntlm.html
[2]http://developer-resource.blogspot.com/2008/06/ntlm-authentication-from-java.html


 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2012-01-18 Thread Remi Tassing (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188961#comment-13188961
 ] 

Remi Tassing commented on NUTCH-1086:
-

For the NTLMv2 issue I used a dirty solution in HttpResponse.java. Inside the 
creator and after the getResponseBodyAsStream()attempt:
1. I check the result code, if it's 500 (inside finally{...})
2. I use HttpUrlConnection to authenticate and open a connection
3. Then read the InputStream, get the Content and change the code to 200

The problems with that solution are that:
1. The authentication keys are hardcoded
2. It doesn't check if the content is valid or not but set the return code to 
200
3. Error code 500 doesn't necessarily mean that it's a NTLMv2 authentication 
problem

I have no idea on how to write patches to the trunk...

Remi

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089314#comment-13089314
 ] 

Aravind Srini commented on NUTCH-1086:
--

Some transitive dependencies:

* Solr 3.1.0 , seems to depend on commons-httpclient 3.1. 

Started an independent email thread with the solr community ( solr - 
httpclient from 3.x to 4.1.x ) to open it up for discussion.

* hadoop 0.20.2 , depends on commons-httpclient 3.0.1 as well.




Also - httpclient 4.1.2, depends on httpcore 4.1.2 - but there seems to have 
been an emergency release of httpcore 4.1.3 ( and httpclient , not republished 
after the same) so both needs to be explicitly published in ivy.xml (or pom.xml 
). 



 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Rewrite protocol-httpclient

2011-08-23 Thread Markus Jelsma
In branch 1.4 at first. It should be easy to port to trunk however. You're 
more than welcome to contribute.

 On Tue, Aug 23, 2011 at 12:28 AM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  Hi,
  
  Please see Julien's comment in this recent thread:
  Re: Future of Nutch 2.0 [Was: Unresolved dependencies
  org.apache.gora#gora- hbase;0.1: not found in Nutch trunk
  
  To be short: no. The bulk of the work is code and manual testing, not
  building
  or pushing deps around :)
 
 Agreed . Which branch would this go into, since I would like to pitch into
 the same and start contributing as well.
 
  Cheers,
  
   just a thought - while we are talking about package upgradation here, I
  
  see
  
   that the current build system uses ant/ build.xml , would there be any
   interest in moving towards a maven-ized build , to make upgradation /
  
  test
  
   upgradation a bit more simpler ?
   
   On Mon, Aug 22, 2011 at 11:39 PM, Markus Jelsma (JIRA)
  
  j...@apache.orgwrote:
   [
  
  https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.
  
  plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088871#c
  
omment-13088871]

Markus Jelsma commented on NUTCH-1086:
--

Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and
there are a lot of issues to be resolved such as HTTPS support.


[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Oleg Kalnichevski (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089466#comment-13089466
 ] 

Oleg Kalnichevski commented on NUTCH-1086:
--

The 4.1.3 release of HttpCore patched a regression affecting non-blocking (NIO) 
SSL transports only. There have been no changes between 4.1.2 and 4.1.3 
releases in blocking transport components relevant for HttpClient.

Please let me know if you need any help migrating off HttpClient 3.1 to 
HttpClient 4.1.x.

Oleg

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089503#comment-13089503
 ] 

Aravind Srini commented on NUTCH-1086:
--

Thanks, Oleg for pitching in and confirming the right thing. 

Meanwhile - SOLR-2727 logged independently, to upgrade that to httpclient 4.x 
codeline. 



 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-22 Thread Markus Jelsma (JIRA)
Rewrite protocol-httpclient
---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma


There are several issues about protocol-httpclient and several comments about 
rewriting the plugin with the new http client libraries. There is, however, not 
yet an issue for rewriting/reimplementing protocol-httpclient.

http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-22 Thread Aravind Srini (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088850#comment-13088850
 ] 

Aravind Srini commented on NUTCH-1086:
--

Are we talking about httpclient 4.0.1 ? 

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-22 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088871#comment-13088871
 ] 

Markus Jelsma commented on NUTCH-1086:
--

Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and there are 
a lot of issues to be resolved such as HTTPS support.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-22 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088875#comment-13088875
 ] 

Ken Krugler commented on NUTCH-1086:


For what it's worth, there's a SimpleHttpFetcher in crawler-commons that uses 
HttpClient 4.1.

 Rewrite protocol-httpclient
 ---

 Key: NUTCH-1086
 URL: https://issues.apache.org/jira/browse/NUTCH-1086
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Reporter: Markus Jelsma

 There are several issues about protocol-httpclient and several comments about 
 rewriting the plugin with the new http client libraries. There is, however, 
 not yet an issue for rewriting/reimplementing protocol-httpclient.
 http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Rewrite protocol-httpclient

2011-08-22 Thread Markus Jelsma
Hi,

Please see Julien's comment in this recent thread:
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-
hbase;0.1: not found in Nutch trunk

To be short: no. The bulk of the work is code and manual testing, not building 
or pushing deps around :)

Cheers,

 just a thought - while we are talking about package upgradation here, I see
 that the current build system uses ant/ build.xml , would there be any
 interest in moving towards a maven-ized build , to make upgradation / test
 upgradation a bit more simpler ?
 
 On Mon, Aug 22, 2011 at 11:39 PM, Markus Jelsma (JIRA) 
j...@apache.orgwrote:
 [
  
  https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.
  plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088871#c
  omment-13088871]
  
  Markus Jelsma commented on NUTCH-1086:
  --
  
  Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and
  there are a lot of issues to be resolved such as HTTPS support.


Re: Rewrite protocol-httpclient

2011-08-22 Thread Arvind Srini
On Tue, Aug 23, 2011 at 12:28 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi,

 Please see Julien's comment in this recent thread:
 Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-
 hbase;0.1: not found in Nutch trunk

 To be short: no. The bulk of the work is code and manual testing, not
 building
 or pushing deps around :)


Agreed . Which branch would this go into, since I would like to pitch into
the same and start contributing as well.






 Cheers,

  just a thought - while we are talking about package upgradation here, I
 see
  that the current build system uses ant/ build.xml , would there be any
  interest in moving towards a maven-ized build , to make upgradation /
 test
  upgradation a bit more simpler ?
 
  On Mon, Aug 22, 2011 at 11:39 PM, Markus Jelsma (JIRA)
 j...@apache.orgwrote:
  [
  
  
 https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.
  
 plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088871#c
   omment-13088871]
  
   Markus Jelsma commented on NUTCH-1086:
   --
  
   Preferably the 4.1.x version. Nutch still uses the deprecated 3.x and
   there are a lot of issues to be resolved such as HTTPS support.