Hi Julien,
Any update on the mongodb plugin for nutch??
Using https://github.com/ctjmorgan/nutch-mongodb-indexer is a problem for me
as i dont know how to create a new package and i cant find the ivy folders.
It way too complex for a non-java developer.
Currently i have installed nutch 1.6 on my
Hi,
Changing the hadoop jar file to a lower version solved the issue
I removed hadoop-core-1.0.3.jar from the lib folder and replaced it with
hadoop-core-0.20.2.jar file
Sebastian Nagel wrote
Hi,
that's a known problem with Hadoop on Windows / Cygwin:
i get a similar error for nutch 2.1 ,how do i fix it? :
Buildfile: C:\apache-nutch-2.1\build.xml
[taskdef] Could not load definitions from resource
org/sonar/ant/antlib.xml. It could not be found.
ivy-probe-antlib:
ivy-download:
[taskdef] Could not load definitions from resource
, January 27, 2013, peterbarretto lt;
peterbarretto08@
gt;
wrote:
I want to increase the number of urls fetched at a time in nutch. I have
around 10 websites to crawl. so how can i crawl all the sites at a time ?
right now i am fetching 1 site with a fetch delay of 2 second but it is
too
slow
Am 25.01.2013 19:51, schrieb Gora Mohanty:
On 25 January 2013 16:05, peterbarretto lt;
peterbarretto08@
gt; wrote:
I still get the below error after setting the java home variable
lt;http://lucene.472066.n3.nabble.com/file/n4036204/nutch_java_home_error.pnggt;
Not sure of how much
Hi Tejas,
I changed the generate.count.mode to domain and generate.max.count to 100
but still it shows queue mode as byhost and not by domain.
peterbarretto wrote
Hi Tejas
The fetcher.threads.per.host property has been depreciated and replaced
with fetcher.threads.per.queue
I am not sue
to add the code and all.
Jorge Luis Betancourt Gonzalez wrote
I suppose you can write a custom indexer, to store the data in mongodb
instead of solr, I think there is an open repo on github about this.
- Mensaje original -
De: peterbarretto lt;
peterbarretto08@
gt;
Para:
user
/property
Not sure why you see queue mode as byhost and not by domain. Did it print
that in the logs ?
I should have asked you this before : Are you using nutch 1.X or 2.x ?
thanks,
Tejas Patil
On Tue, Jan 29, 2013 at 12:08 AM, peterbarretto
lt;
peterbarretto08@
gt;wrote:
Hi
mcgibbney wrote
You are not getting very many URLs!
On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto lt;
peterbarretto08@
gt;wrote:
2013-01-29 08:44:35,014 INFO crawl.CrawlDbReader - TOTAL urls: 96404
2013-01-29 08:44:35,018 INFO crawl.CrawlDbReader - status 1
(db_unfetched):
85672
seem to be getting one issue with javac
On Tue, Jan 29, 2013 at 8:39 PM, peterbarretto lt;
peterbarretto08@
gt;wrote:
C:\nutch-16\src\java\org\apache\nutch\indexer\mongodb\MongodbWriter.java:18:
error: MongodbWriter is not abstract and does not override abstract
method
delete(String
, 2013 at 8:06 PM, Lewis John Mcgibbney
lewis.mcgibbney@
wrote:
You are not getting very many URLs!
On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto lt;
peterbarretto08@
gt; wrote:
2013-01-29 08:44:35,014 INFO crawl.CrawlDbReader - TOTAL urls: 96404
2013-01-29 08:44:35,018 INFO
and will hopefully have patches for
Nutch trunk cooked up for tomorrow.
I'll update this thread likewise.
Thanks
Lewis
On Wed, Jan 30, 2013 at 10:02 PM, peterbarretto
lt;
peterbarretto08@
gt; wrote:
Hi Lewis,
I am new to java and i dont know how to inherit all public methods from
NutchIndexWriter
, peterbarretto lt;
peterbarretto08@
gt;wrote:
Hi Lewis,
I managed to get the code working by adding the below function to
MongodbWriter.java in the public class MongodbWriter implements
NutchIndexWriter :-
public void delete(String key) throws IOException{
return
Hi Lewis,
Is this patch done??
lewis john mcgibbney wrote
Hi,
Once I get access to my office I am going to build the patches from trunk.
Is it trunk that you are using?
Thanks
Lewis
On Fri, Feb 8, 2013 at 9:00 PM, peterbarretto lt;
peterbarretto08@
gt;wrote:
Hi Lewis,
I managed
, peterbarretto lt;
peterbarretto08@
gt;wrote:
Hi Lewis,
Is this patch done??
lewis john mcgibbney wrote
Hi,
Once I get access to my office I am going to build the patches from
trunk.
Is it trunk that you are using?
Thanks
Lewis
On Fri, Feb 8, 2013 at 9:00 PM, peterbarretto
the crawled urls to the mongodb.
I can get the html content of crawled urls from the readseg -dump command in
nutch 1.6 so i guess it will be possible to get full html along with just
the text part?
lewis john mcgibbney wrote
Hi Peter
On Saturday, February 16, 2013, peterbarretto
lt
Hi Lewis,
I tried applying the patch on 2.1 but it gives the below error:
patching file pom.xml
patching file ivy/ivy.xml
Hunk #1 succeeded at 34 with fuzz 2 (offset 4 lines).
patching file src/bin/nutch
Hunk #1 FAILED at 61.
Hunk #2 succeeded at 220 with fuzz 2 (offset 2 lines).
1 out of 2 hunks
17 matches
Mail list logo