hi..
i am not able to solve this problem
Any Ideas???
--
View this message in context:
http://www.nabble.com/error-in-using-generate-command-tp24545715p24621067.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Brian Tingle wrote:
Hey,
I'm playing around the nutch on hadoop; when I go
hadoop jar nutch-1.0.job org.apache.nutch.crawl.Crawl -threads ... is
that threads per node or total threads for all nodes?
Threads per map task - if you run multiple map tasks per node then you
will get
On Thu, Jul 23, 2009 at 10:58, Beatstarun_agrawal...@yahoo.com wrote:
hi..
i am not able to solve this problem
Crawl command uses crawl-urlfilter.txt while inject/generate/etc. commands
use other files (such as regex-urlfilter.txt). So you should check your filters.
Any Ideas???
--
Why does your example say both monster.crawl and test.crawl ?
Are you perhaps entering the command wrong or is this just an error in
the email?
Alex
2009/7/18 Beats tarun_agrawal...@yahoo.com:
hi,
i m getting this weird error ( at least for me):
i m trying to crawl a some web pages..
Sorry for the error
it is just typing error.
thanx for replying
alexmc wrote:
Why does your example say both monster.crawl and test.crawl ?
Are you perhaps entering the command wrong or is this just an error in
the email?
Alex
2009/7/18 Beats tarun_agrawal...@yahoo.com:
hi,
i
hi
In ParseData following fields are there:
Version, Status ,Title, Outlinks..
I want to add new field like location. I will get this during parsing in
Html Parser.How can i set new filelds so that is visible in parsedata.
--
View this message in context:
You can add it to parse data's metadata.
On Thu, Jul 23, 2009 at 13:38, Saurabh Sumansaurabhsuman...@rediff.com wrote:
hi
In ParseData following fields are there:
Version, Status ,Title, Outlinks..
I want to add new field like location. I will get this during parsing in
Html Parser.How
Hi All,
I'm sure I've read somewhere before that URLs that is made like
http://www.site.com/categories.asp?cid=25page=9
Can't be crawled. Is that true?
Warmest Regards,
Zaihan
because?
you mean urls which contain a query part?
they can be crawled.
the default nutch configuration excludes them by this filter rule in
conf/crawl-urlfilter.txt
# skip URLs containing certain characters as probable queries, etc.
-[...@=]
Zaihan schrieb:
Hi All,
I'm sure I've read
Thanks for the pointer to the LocalFetchRecover tool. It seems there
were some changes to the hadoop api since nutch 0.8.1 so this tool
didn't work initially. I've made what I think are the correct changes
and have attached my changes. (Hopefully the attachment gets through.) I
put together a
Hi guys,
I'm in the middle of a very long fetch phase, too long actually. I would
like to stop it but not loose 5 days of fetching.
Is there anything I can do ?
--
-MilleBii-
Thanks, I eventually found where the job trackers were in the :50030 web
page of the cloudera thing, and I saw it said 10 threads for each
crawler in the little status update box where it was telling me how far
along each crawl was. I have to say, this whole thing (nutch/hadoop) is
pretty
On Thu, Jul 23, 2009 at 21:29, MilleBiimille...@gmail.com wrote:
Hi guys,
I'm in the middle of a very long fetch phase, too long actually. I would
like to stop it but not loose 5 days of fetching.
Is there anything I can do ?
No, unfortunately nutch 1.0 does not have that feature. But we
How do I set the number of Map tasks when I do a command like
hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl
?
I think I'm going to try out the change below, is there any reason not
to do it, or is Crawl supposed to be more of a demo and I should write
some script or my own
14 matches
Mail list logo