-- Forwarded message --
From: Simone Frenzel psimon...@googlemail.com
Date: 2011/8/22
Subject: Patch für httpResponse
To: dev-subscr...@nutch.apache.org
Hi,
tested nutch on differnt webpages. In case of a short ziped pages it thrwos
an IO_Exception:
java.io.IOException:
[
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089314#comment-13089314
]
Aravind Srini commented on NUTCH-1086:
--
Some transitive dependencies:
* Solr 3.1.0 ,
+1 let's replace it with a shell script instead.
On 22 August 2011 21:56, Markus Jelsma markus.jel...@openindex.io wrote:
Hi,
The crawl command seems to add a lot of confusion. It hides the entire
crawl
cycle logic from new users, leading to questions, lack of understanding of
basic Nutch
In branch 1.4 at first. It should be easy to port to trunk however. You're
more than welcome to contribute.
On Tue, Aug 23, 2011 at 12:28 AM, Markus Jelsma
markus.jel...@openindex.iowrote:
Hi,
Please see Julien's comment in this recent thread:
Re: Future of Nutch 2.0 [Was:
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-578:
---
Assignee: Markus Jelsma (was: Dennis Kubes)
URL fetched with 403 is generated over and over
What kind of shell script did you have in mind? The wiki already provides some
useful scripts. The tutorials on Nutch also show commands that can be used in
custom scripts.
Is an immediate crawl-with-one-command a desired feature? Provided as Java
code or shell script?
On Tuesday 23 August
[
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089382#comment-13089382
]
Markus Jelsma commented on NUTCH-578:
-
I just confirmed this is still an issue. I've
What kind of shell script did you have in mind? The wiki already provides
some
useful scripts. The tutorials on Nutch also show commands that can be used
in
custom scripts.
That's exactly my point. There are various scripts in the wiki, based on
different versions of Nutch and of variable
Deprecate crawl command and replace with example script
---
Key: NUTCH-1087
URL: https://issues.apache.org/jira/browse/NUTCH-1087
Project: Nutch
Issue Type: Task
Affects Versions: 1.4
You're right: https://issues.apache.org/jira/browse/NUTCH-1087
On Tuesday 23 August 2011 13:24:27 Julien Nioche wrote:
What kind of shell script did you have in mind? The wiki already provides
some
useful scripts. The tutorials on Nutch also show commands that can be
used in
custom
[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089405#comment-13089405
]
Andrzej Bialecki commented on NUTCH-1087:
--
IIRC we had this discussion in the
Write Solr XML documents
Key: NUTCH-1088
URL: https://issues.apache.org/jira/browse/NUTCH-1088
Project: Nutch
Issue Type: New Feature
Components: indexer
Reporter: Markus Jelsma
I agree. Nuke crawl command
I wonder if the name crawl implies that the command is sort of standard
command, and all you would need? After all, if I where to sit down with a
crawler, it seems very logical that crawl would be how you run it! I like
the simplicity of crawl from a getting started approach. I agree though
[
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089466#comment-13089466
]
Oleg Kalnichevski commented on NUTCH-1086:
--
The 4.1.3 release of HttpCore patched
Simone,
Would you mind opening a JIRA for this and attach your patch + grant it to
ASF? I know it is fairly small but it makes it easier to track the progress,
link to svn commits, etc...
Thanks
Julien
On 23 August 2011 07:53, Simone Frenzel psimon...@googlemail.com wrote:
--
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The bin/nutch_generate page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_generate?action=diffrev1=12rev2=13
'''[-topN N]''': Where N is the number
[
https://issues.apache.org/jira/browse/NUTCH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089496#comment-13089496
]
Lewis John McGibbney commented on NUTCH-1085:
-
As well as being nice for
[
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089503#comment-13089503
]
Aravind Srini commented on NUTCH-1086:
--
Thanks, Oleg for pitching in and confirming
[
https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
simone frenzel updated NUTCH-1089:
--
Attachment: HttpResponsePatch.patch
short compressed pages caused Exception
[
https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1089.
--
Resolution: Fixed
1.4 Committed revision 1160753.
trunk Committed revision 1160754
Thanks
[
https://issues.apache.org/jira/browse/NUTCH-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089762#comment-13089762
]
Markus Jelsma commented on NUTCH-1057:
--
I'd like to commit this issue this friday
See https://builds.apache.org/job/Nutch-trunk/1583/
--
[...truncated 986 lines...]
A
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A
23 matches
Mail list logo