Patch für httpResponse

2011-08-23 Thread Simone Frenzel
-- Forwarded message -- From: Simone Frenzel psimon...@googlemail.com Date: 2011/8/22 Subject: Patch für httpResponse To: dev-subscr...@nutch.apache.org Hi, tested nutch on differnt webpages. In case of a short ziped pages it thrwos an IO_Exception: java.io.IOException:

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089314#comment-13089314 ] Aravind Srini commented on NUTCH-1086: -- Some transitive dependencies: * Solr 3.1.0 ,

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
+1 let's replace it with a shell script instead. On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io wrote: Hi, The crawl command seems to add a lot of confusion. It hides the entire crawl cycle logic from new users, leading to questions, lack of understanding of basic Nutch

Re: Rewrite protocol-httpclient

2011-08-23 Thread Markus Jelsma
In branch 1.4 at first. It should be easy to port to trunk however. You're more than welcome to contribute. On Tue, Aug 23, 2011 at 12:28 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Please see Julien's comment in this recent thread: Re: Future of Nutch 2.0 [Was:

[jira] [Assigned] (NUTCH-578) URL fetched with 403 is generated over and over again

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-578: --- Assignee: Markus Jelsma (was: Dennis Kubes) URL fetched with 403 is generated over and over

Re: The crawl command, keep or get rid of

2011-08-23 Thread Markus Jelsma
What kind of shell script did you have in mind? The wiki already provides some useful scripts. The tutorials on Nutch also show commands that can be used in custom scripts. Is an immediate crawl-with-one-command a desired feature? Provided as Java code or shell script? On Tuesday 23 August

[jira] [Commented] (NUTCH-578) URL fetched with 403 is generated over and over again

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089382#comment-13089382 ] Markus Jelsma commented on NUTCH-578: - I just confirmed this is still an issue. I've

Re: The crawl command, keep or get rid of

2011-08-23 Thread Julien Nioche
What kind of shell script did you have in mind? The wiki already provides some useful scripts. The tutorials on Nutch also show commands that can be used in custom scripts. That's exactly my point. There are various scripts in the wiki, based on different versions of Nutch and of variable

[jira] [Created] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-08-23 Thread Markus Jelsma (JIRA)
Deprecate crawl command and replace with example script --- Key: NUTCH-1087 URL: https://issues.apache.org/jira/browse/NUTCH-1087 Project: Nutch Issue Type: Task Affects Versions: 1.4

Re: The crawl command, keep or get rid of

2011-08-23 Thread Markus Jelsma
You're right: https://issues.apache.org/jira/browse/NUTCH-1087 On Tuesday 23 August 2011 13:24:27 Julien Nioche wrote: What kind of shell script did you have in mind? The wiki already provides some useful scripts. The tutorials on Nutch also show commands that can be used in custom

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2011-08-23 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089405#comment-13089405 ] Andrzej Bialecki commented on NUTCH-1087: -- IIRC we had this discussion in the

[jira] [Created] (NUTCH-1088) Write Solr XML documents

2011-08-23 Thread Markus Jelsma (JIRA)
Write Solr XML documents Key: NUTCH-1088 URL: https://issues.apache.org/jira/browse/NUTCH-1088 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Markus Jelsma

Re: The crawl command, keep or get rid of

2011-08-23 Thread Radim Kolar
I agree. Nuke crawl command

Re: The crawl command, keep or get rid of

2011-08-23 Thread Eric Pugh
I wonder if the name crawl implies that the command is sort of standard command, and all you would need? After all, if I where to sit down with a crawler, it seems very logical that crawl would be how you run it! I like the simplicity of crawl from a getting started approach. I agree though

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Oleg Kalnichevski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089466#comment-13089466 ] Oleg Kalnichevski commented on NUTCH-1086: -- The 4.1.3 release of HttpCore patched

Re: Patch für httpResponse

2011-08-23 Thread Julien Nioche
Simone, Would you mind opening a JIRA for this and attach your patch + grant it to ASF? I know it is fairly small but it makes it easier to track the progress, link to svn commits, etc... Thanks Julien On 23 August 2011 07:53, Simone Frenzel psimon...@googlemail.com wrote: --

[Nutch Wiki] Trivial Update of bin/nutch_generate by LewisJohnMcgibbney

2011-08-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch_generate page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/bin/nutch_generate?action=diffrev1=12rev2=13 '''[-topN N]''': Where N is the number

[jira] [Commented] (NUTCH-1085) Nutch script does not require HADOOP_HOME

2011-08-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089496#comment-13089496 ] Lewis John McGibbney commented on NUTCH-1085: - As well as being nice for

[jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient

2011-08-23 Thread Aravind Srini (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089503#comment-13089503 ] Aravind Srini commented on NUTCH-1086: -- Thanks, Oleg for pitching in and confirming

[jira] [Updated] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread simone frenzel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] simone frenzel updated NUTCH-1089: -- Attachment: HttpResponsePatch.patch short compressed pages caused Exception

[jira] [Resolved] (NUTCH-1089) short compressed pages caused Exception

2011-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1089. -- Resolution: Fixed 1.4 Committed revision 1160753. trunk Committed revision 1160754 Thanks

[jira] [Commented] (NUTCH-1057) Make fetcher thread time out configurable

2011-08-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089762#comment-13089762 ] Markus Jelsma commented on NUTCH-1057: -- I'd like to commit this issue this friday

Build failed in Jenkins: Nutch-trunk #1583

2011-08-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1583/ -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A