Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The following page has been changed by MarcinOkraszewski:
http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial
The comment on the change is:
Troubleshoo
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The following page has been changed by AlessioTomasino:
http://wiki.apache.org/nutch/Nutch_0%2e9_Crawl_Script_Tutorial
---
e been fixed.
hadoop needs a tmp directory to execute jobs in the distributed
fashion. I usually point mine to C:\tmp Hdoop will also create some
directories related to its filesystem. the main directories you will
work with will be your crawl directory and its subfolders crawldb lindb,
indexes,
The nutch version 0.8 tutorial has a section and it is pretty straight forward.
Make sure to remember to change the nutch-site.xml file and fill in your
username.
I have had mIxed results with cygwin and nutch (so make backups etc.).
Cheers
Sent from my Verizon Wireless BlackBerry
op or DFS [I don't mind if they are running "under the hood"].
Later on if the initial study is successful, I will of course
switch to the full blown Nutch with Hadoop+DFS+Distributed Search.
(Q1) What tutorial do I need to follow to get Nutch 9.12
to crawl and index on
[ http://issues.apache.org/jira/browse/NUTCH-340?page=all ]
Sami Siren resolved NUTCH-340.
--
Fix Version/s: (was: 0.8.1)
Resolution: Fixed
I just committed this to svn trunk and updated the website, thanks!
> Bug(s) in 0.8 tutor
[ http://issues.apache.org/jira/browse/NUTCH-340?page=all ]
Uros Gruber updated NUTCH-340:
--
Attachment: patch.txt
I hope it's ok now.
> Bug(s) in 0.8 tutorial
> --
>
> Key: NUTCH-340
>
/HowToContribute
for instructions on how to construct a patch file for nutch and recreate that.
> Bug(s) in 0.8 tutorial
> --
>
> Key: NUTCH-340
> URL: http://issues.apache.org/jira/browse/NUTCH-340
> Project: Nutch
>
[ http://issues.apache.org/jira/browse/NUTCH-340?page=all ]
Uros Gruber updated NUTCH-340:
--
Attachment: patch.txt
I found those two problems during reading 0.8 tutorial. Maybe I missed
something else.
> Bug(s) in 0.8 tutor
Bug(s) in 0.8 tutorial
--
Key: NUTCH-340
URL: http://issues.apache.org/jira/browse/NUTCH-340
Project: Nutch
Issue Type: Bug
Components: documentation
Affects Versions: 0.8
Reporter: Sami Siren
Hi Tyrell,
Could you please help me find your tutorial on wiki pages? Somehow, I
can't find it on nutch wiki (http://wiki.apache.org/nutch/).
Also searching for "tutorial" didn't give me expected result:
http://wiki.apache.org/nutch/?action=fullsearch&context=180&am
I want to contribute my work to nutch. Newbie here, some help will be
appreciated.
I tried use J2EE plugin(Lobmoz), impossible to build it as a web app.
since i want to change the web content also, so it looks neat.
Cheers,
Jackey Yang
Hi,
I reported some typos and incomplete information in nutch 08 tutorial
some time ago. It seems that all commiters and voluntaries are busy
with more important issues so I took this opportunity and now I am
proud to present my *first-small-humble-patch-ever*.
Please review the patch and let
Hi,
Nutch 0.8 version tutorial (see:
http://lucene.apache.org/nutch/tutorial8.html) in whole-web indexing
paragraph - it says: bin/nutch index indexes crawl/linkdb
crawl/segments/*
Shouldn't it say: bin/nutch index crawl/indexes crawl/crawldb
crawl/linkdb segment#1_path [segment#2
Hi Jake,
Updated the wiki. My tutorial builds on the previous ones and goes on
to identify the possible errors, which can be encountered. It also
contains an index updation shell script, which I have used in my
projects. There are some duplications, which were necesary to maintain
context. Hope
e new wiki pages (see
the help section on the wiki for more specific instructions).
You might want to take a look at the existing tutorials to avoid
duplication:
* Website tutorial for 0.7 -
http://lucene.apache.org/nutch/tutorial.html
* Website tutorial for 0.8 -
http://lucene.apache.org/
Hi,
I would like to submit a tutorial I have prepared in implementing
and maintainnig a Nutch installation. Please tell me how to proceed.
The tutorial draft is complete and I would like your feedback.
Thanks and Regards,
Tyrell
+1
If we go with that idea, then the one on the website should be the
tutorial for the latest release with a link to the wiki for the dev version of
the tutorial and a note explaining that tutorials for older versions come with
the source.
Jake.
-Original Message-
From
> My motivation is to have usable version of tutorial - as simple as it is
> possible to be versioned with the sources - only for historical purposes
> - if somebody wants to use nutch 0.7 a year from now he will be able to
> find a tutorial for it without problems.
+1
But for m
Upps, sorry for ignoring this discussion - i was looking for comments in
JIRA and already committed the change before reading your discussion.
My motivation is to have usable version of tutorial - as simple as it is
possible to be versioned with the sources - only for historical purposes
- if
each tutorialstating that more detailed tutorials
are available on Nutch Wiki.
> Changed the links to the tutorial to point to the wiki
> --
>
> Key: NUTCH-225
> URL: http://issues.apache.org/jira/browse/NUTCH-225
&
+1
Site tutorial links pointing to wiki tutorials is the best option.
Jeff.
Richard Braman wrote:
+1. No need for 2 tutorials. The only descrepency I saw, was the
invertlinks command not in 0.7. I updated the wiki to note that that
command only applied to 0.8
-Original Message
@lucene.apache.org
Subject: Tutorial
This is in response to Piotr's comment to my JIRA entry
(http://issues.apache.org/jira/browse/NUTCH-225). I haven't been
subscribed to this list, so I'm afraid I missed the discussion about the
tutorial that went on here.
After
This is in response to Piotr's comment to my JIRA entry
(http://issues.apache.org/jira/browse/NUTCH-225). I haven't been
subscribed to this list, so I'm afraid I missed the discussion about the
tutorial that went on here.
After getting Piotr's comment I wen
[
http://issues.apache.org/jira/browse/NUTCH-225?page=comments#action_12369405 ]
Piotr Kosiorowski commented on NUTCH-225:
-
As stated in another thread I prefer to have a simple tutorial kept in version
control with releases.
We already have a
Changed the links to the tutorial to point to the wiki
--
Key: NUTCH-225
URL: http://issues.apache.org/jira/browse/NUTCH-225
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Jake Vanderdray
Walking through the tutorial
http://lucene.apache.org/nutch/tutorial.html
and just a little suggestion. For the
s1=`ls -d segments/2* | tail -1`
s2=`ls -d segments/2* | tail -1`
s3=`ls -d segments/2* | tail -1`
I suggest using \ls just in case users have an alias
like
alias ls='ls
directory. It also looks
> like the name of the file doesn't matter. So I made a
> myurls directory, put a urls file in there and then
> ran
>
> bin/nutch crawl myurls -dir crawl.test -depth 3
>
> But, yeah, would like to put such steps in a tutorial.
>
>
> I
le in there and then
ran
bin/nutch crawl myurls -dir crawl.test -depth 3
But, yeah, would like to put such steps in a tutorial.
It looks like the front page got hit, and that's about
it, so there is more to do.
Earl
--- Earl Cahill <[EMAIL PROTECTED]> wrote:
> howdy,
>
> I
howdy,
I have been looking around for a nutch/mapred tutorial
and haven't had much luck. I found this one
http://lucene.apache.org/nutch/tutorial.html
which did help me get a crawl going on trunk, but no
such luck in branches/mapred. I set the urls file and
the filter in the same way t
+1
Piotr Kosiorowski wrote:
Hello,
Some time ago someone mentioned on the list a problem with nutch
tutorial (I cannot find this email now). I have checked it today and
he/she was right. If you follow the nutch Intranet Crawling tutorial
you will end up with not very interesting index.
This is
Piotr Kosiorowski wrote:
I can commit such changes for 0.7 release (it means today) if I got
positive feedback from other committers.
+1
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic
Hello,
Some time ago someone mentioned on the list a problem with nutch
tutorial (I cannot find this email now). I have checked it today and
he/she was right. If you follow the nutch Intranet Crawling tutorial
you will end up with not very interesting index.
This is because it recommends users to
33 matches
Mail list logo