Findings about Nutch-2.0 RC 1.
The Nutch job jar is not present in the binary archive. This means
distributed running of jobs is not supported. I'm not sure if this is a
problem (since users can always build one themselves), merely pointing it
out. The recently released 1.5 also lacks this job
Hmm please ignore the parse text limited to 100 chars, this is actually
not the case. (Only in our branch that has a fix for limiting anchor texts;
not yet present in in the nutchgora branch because it still needs
polishing). So no need to wait for commits on my part.
On Wed, Jun 13, 2012 at
Hi Seb,
As Chris said, the issues you highlight well justify another RC.
I can shift it by the end of play today.
Thanks very much for having a look through guys
Lewis
On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
Hi Lewis,
my first steps with 2.0 (to
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The GORA_HBase page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/GORA_HBase?action=diffrev1=11rev2=12
- This document describes how to get Nutch to use HBase as a
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The GORA_HBase page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/GORA_HBase?action=diffrev1=12rev2=13
This document describes how to get Nutch 2.0 to use
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The GORA_HBase page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/GORA_HBase?action=diffrev1=13rev2=14
valueorg.apache.gora.hbase.store.HBaseStore/value
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The FrontPage page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diffrev1=240rev2=241
* [[NutchMavenSupport|Using Nutch as a Maven dependency]]
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The Nutch2Tutorial page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/Nutch2Tutorial
New page:
= Nutch 2.0 Tutorial =
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The FrontPage page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diffrev1=241rev2=242
* Nutch2Roadmap -- Discussions on the architecture and
Hi Seb,
Quick update
On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
1 some guidance would be nice. README.txt points
to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an
update
Hi,
Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.
So far on trunk we have
** Apache Nutch is an open source web-search software project.
Stemming from Apache Lucene, it now builds on Apache Solr adding
web-specifics, such as a
Hi,
I would remove the 'experimental' notion. Aside from that it's fine with me.
Ferdy.
On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.
[
https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294429#comment-13294429
]
Ferdy Galema commented on NUTCH-1342:
-
Do you have any clue as to why
Ferdy
The Nutch job jar is not present in the binary archive. This means
distributed running of jobs is not supported. I'm not sure if this is a
problem (since users can always build one themselves), merely pointing it
out. The recently released 1.5 also lacks this job jar, so at least no
and and array other document looks like a typo, rest is fine
On 13 June 2012 13:45, Ferdy Galema ferdy.gal...@kalooga.com wrote:
Hi,
I would remove the 'experimental' notion. Aside from that it's fine with
me.
Ferdy.
On Wed, Jun 13, 2012 at 2:29 PM, Lewis John Mcgibbney
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The Nutch2Tutorial page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/Nutch2Tutorial?action=diffrev1=2rev2=3
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The Nutch2Tutorial page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/Nutch2Tutorial?action=diffrev1=3rev2=4
This document describes how to get Nutch 2.0 to use
Hi Guys,
Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't
supply binary distributions of the code, this is because when using Gora a
user may wish/require to recompile the code to accomodate config changes
etc. We only supply src distributions...
Does this principle
+1 to the description w/o experimental too (I agree with Ferdy).
You guys ROCK.
Cheers,
Chris
On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote:
Hi,
Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.
So far on trunk we have
Hi Lewis,
Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an
update of Julien's (I think) page on GORA_HBase. Thsi will get you
rocking with HBase. The changes between Cassandra, Accumulo and the
other data stores are fairly trivial.
I'll managed to perform a crawl with 2.0
Lewis John McGibbney created NUTCH-1390:
---
Summary: readdb -url $url throws NPE with gora-cassandra
Key: NUTCH-1390
URL: https://issues.apache.org/jira/browse/NUTCH-1390
Project: Nutch
Lewis John McGibbney created NUTCH-1391:
---
Summary: readdb -stats fires java.io.EOFException
Key: NUTCH-1391
URL: https://issues.apache.org/jira/browse/NUTCH-1391
Project: Nutch
Issue
Lewis John McGibbney created NUTCH-1392:
---
Summary: -force and -resume arguments being ignored in ParserJob
Key: NUTCH-1392
URL: https://issues.apache.org/jira/browse/NUTCH-1392
Project: Nutch
Lewis John McGibbney created NUTCH-1393:
---
Summary: Display consistent usage of GeneratorJob with 1.X
Key: NUTCH-1393
URL: https://issues.apache.org/jira/browse/NUTCH-1393
Project: Nutch
Lewis John McGibbney created NUTCH-1394:
---
Summary: backport NUTCH-1232 Remove host field from index-basic
Key: NUTCH-1394
URL: https://issues.apache.org/jira/browse/NUTCH-1394
Project: Nutch
Hi Sebastian,
On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed.
Much simpler than 1.x (no segments!).
:0)
% ./bin/nutch readdb -stats
WebTable statistics start
WebTableReader:
[
https://issues.apache.org/jira/browse/NUTCH-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294730#comment-13294730
]
Lewis John McGibbney commented on NUTCH-1392:
-
Additionally this issue should
27 matches
Mail list logo