Hi
On Wed, Apr 9, 2014 at 8:43 AM, <[email protected]> wrote: > > user Digest 9 Apr 2014 14:43:51 -0000 Issue 2188 > > I might not be thinking in the right direction so need some help. Is there > a way to find an approximate web content size of a particular website in > Nutch 2.2.1? > You can obtain the WebPage Content by looking in to the FetcherReducer at the following line if (content!=null && content.getContent()!=null) length= content.getContent().length; Content.getContent().length returns as byte[] containing the binary content retrieved for this resource. You would then need to think about how you could sum up the values of these lengths in an attempt to obtain a ~total for some domain. > I have crawled a research website which has lot of images, pdfs, etc. and I > am interested to know the content size of all the files in that website. > Please advise. > > I haven't needed to do this as of yet so I don't have a concrete answer as to how you could implement it all. hth Lewis

