I agree.
I can sort this tomorrow.
@Kiran,
Are we still working to addition of documentation contributors via wiki uid
entry to Contributer and Admin group?
Tejas can and should be added to both Groups
thanks
lewis

On Monday, April 22, 2013, Tejas Patil <tejas.patil...@gmail.com> wrote:
> Hi Lewis,
>
> Thanks !!
> I have huge respect for those who engineered the Fetcher class (esp. of
> 1.x) as its simply *awesome* and complex piece of code.
> I can polish my post more so that it comes to the "wiki" quality. I don't
> have access to wiki. Can you provide me the same ?
>
> Thanks,
> Tejas
>
>
> On Mon, Apr 22, 2013 at 8:09 PM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> hi Tejas,
>> this is a real excellent reply and very useful.
>> it would be really great if we could somehow have this kind of low level
>> information readily available on the Nutch wiki.
>>
>> On Monday, April 22, 2013, Tejas Patil <tejas.patil...@gmail.com> wrote:
>> > Fetcher threads try to get a fetch item (url) from a queue of all the
>> fetch
>> > items (this queue is actually a queue of queues. For details see [0]).
If
>> a
>> > thread doesnt get a fetch-item, it spinwaits for 500ms before polling
the
>> > queue again.
>> > The '*spinWaiting*' count tells us how many threads are in their
>> > spinwaiting state at a given instance.
>> >
>> > The '*active*' count tells us how many threads are currently performing
>> the
>> > activities related to the fetch of a fetch-item. This involves sending
>> > requests to the server, getting the bytes from the server, parsing,
>> storing
>> > etc..
>> >
>> > '*pages*' is a count for total pages fetched till a given point.
>> > '*errors*' is a count for total errors seen.
>> >
>> > *Next comes pages/s:*
>> > First number comes from this:
>> > ((((float)pages.get())*10)/elapsed)/10.0
>> >
>> > second one comes from this:
>> > (actualPages*10)/10.0
>> >
>> > actualPages holds the count of pages processed in the last 5 secs (when
>> the
>> > calculation is done).
>> >
>> > First number can be seen as the overall speed for that execution. The
>> > second number can be regarded as the instanteous speed as it just uses
>> the
>> > #pages in last 5 secs when this calculation is done. See lines 818-830
in
>> > [0].
>> >
>> > *Next comes the kb/s* values which are computed as follows:
>> > (((float)bytes.get())*8)/1024)/elapsed
>> > ((float)actualBytes)*8)/1024
>> >
>> > This is similar to that of pages/sec. See lines 818-830 in [0].
>> >
>> > '*URLs*' indicates how many urls are pending and '*queues*' indicate
the
>> > number of queues present. Queues are formed on the basis on hostname or
>> ip
>> > depending on the configuration set.
>> >
>> > See FetcherReducer.java [0] for more details.
>> >
>> > [0] :
>> >
>>
>>
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/fetcher/FetcherReducer.java?view=markup
>> >
>> >
>> > On Mon, Apr 22, 2013 at 6:09 PM, kaveh minooie <ka...@plutoz.com>
wrote:
>> >
>> >> could someone please tell me one more time, in this line:
>> >> 0/20 spinwaiting/active, 53852 pages, 7612 errors, 4.1 12 pages/s,
2632
>> >> 7346 kb/s, 989 URLs in 5 queues > reduce
>> >>
>> >> what are the two numbers before pages/s and two numbers before kb/s?
>> >>
>> >> thanks,
>> >>
>> >
>>
>> --
>> *Lewis*
>>
>

-- 
*Lewis*

Reply via email to