Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have
as
filters to select a subset of all hbase records?
Thanks.
Alex.
-Original Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013
.
Alex.
-Original Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator
lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters. Its mapper goes over all
records as far
, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters. Its mapper goes over all
records as far as I know. If you use hadoop you can see how many
records
go
hbase or sent to hbase
as
filters to select a subset of all hbase records?
Thanks.
Alex.
-Original Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters
records?
Thanks.
Alex.
-Original Message-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11
Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters. Its mapper goes over all
Hi list,
we're experimenting with nutch 2.1 and cassandra 1.2.1 (on different hosts).
Our cassandra 'webpage' store has about 31GB right now on disk, we add
URLs by 'injecting' them, about 100k-300k per cycle.
When starting a 'fetch' run, it now needs about an hour before the
queues are set up
Hi Roland,
You say you start a fetch run, does this mean the FetcherJob or
GeneratorJob? What kind of settings do you run your zNutch server with?
On Wednesday, February 20, 2013, Roland rol...@rvh-gmbh.de wrote:
Hi list,
we're experimenting with nutch 2.1 and cassandra 1.2.1 (on ? hosts).
Hi Lewis,
the GeneratorJob takes only ~5 minutes.
I'm running it in standalone mode, like this:
./bin/nutch fetch 1361367698-1708119958 -threads 40
It's configured to fetch parse, but it makes no difference if it only
fetches:
FetcherJob: starting
FetcherJob: batchId: 1361367698-1708119958
I am assuming that your generate.max.count property value is set to the
default -1? Have you tried configuring more, smaller batchId's (fetch
lists)?
I don't have an immediate answer as to why overall, the FetcherJob is
taking this amount of time and resources
On Wednesday, February 20, 2013,
.
Alex.
-Original Message-
From: Roland rol...@rvh-gmbh.de
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 10:56 am
Subject: Re: nutch with cassandra internal network usage
Hi Lewis,
the GeneratorJob takes only ~5 minutes.
I'm running it in standalone mode, like this:
./bin
Hi Alex,
the GeneratorJob seems to have a solution for that, if not it would
iterate over all records too, am I right?
--Roland
Am 20.02.2013 20:42, schrieb alx...@aim.com:
Hi,
This is because fetch's mapper goes over all records and selects those that has
the given batchId. Currently
-gmbh.de
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 11:47 am
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
the GeneratorJob seems to have a solution for that, if not it would
iterate over all records too, am I right?
--Roland
Am 20.02.2013 20:42, schrieb alx
Hi,
Please head over to most recent thread on dev@ for potential improvements
for the Generator* code.
Thanks for invoking this discussion, it is well overdue.
Lewis
On Wed, Feb 20, 2013 at 12:55 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Alex,
On Wed, Feb 20, 2013 at
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters. Its mapper goes over all
records as far as I know. If you use hadoop you can see how many records go
as input to mappers. Also
-
From: Lewis John Mcgibbney lewis.mcgibb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Feb 20, 2013 12:56 pm
Subject: Re: nutch with cassandra internal network usage
Hi Alex,
On Wed, Feb 20, 2013 at 11:54 AM, alx...@aim.com wrote:
The generator also does not have filters
19 matches
Mail list logo