Julien,

You might be right, the swap might be the bottle neck.

On the other hand, I am guessing because I am not taking advantage of
map/reduce since I am running in local single node mode.Thus One mapper and
one reduce I assume? Might that be an issue?

I am thinking of running it on cluster.

Would you be able to give some tip on the number of mapper reducer I should
set. Let's say number of mapper per Gib of Memory per node? If I use a
slave  node with 8 Gib Memory, what would be the ideal number of mapper for
me to start?

I have been browsing the Wiki and mailing list. It ranges from 2
mapper/reducer per box to 99 mapper/reducer per box.

Any tips would be lovely.

Cheers,

Ye

On Mon, Mar 11, 2013 at 4:56 PM, Julien Nioche <
[email protected]> wrote:

> > My guess is that 48hr to parse 100k urls does not sound efficient.
> >
>
> that's definitely not right. You mentioned that you are using a medium
> instance and that the memory is at ~100% usage so my guess would be that
> you are swapping a lot. Check the swap usage. Maybe move to a large
> instance instead and allow more memory per mapper / reducer as the parsing
> step can be gready.
>
>
> > Unfortunately 100k is just the beginning for me. :( I am looking at 10
> > Millions per fetch cycle. I am looking for ideas and pointer on how to
> gain
> > speed. May be using/tweaking Map Reduce would the the answer?
> >
>
> I think your problem is a lot more basic than that
>
> Julien
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to