Hi Luis and d_k, On Fri, Jan 10, 2014 at 3:10 PM, <[email protected]> wrote:
> > One way is to use a copyField [0] in Solr and limit its length using the > maxChars attribute and search for the original text and return the coped > field although i'm not sure how much it will be useful for the end user. > > Yes you could use this, however if you know that you DO NOT require anything over a certain character threshold (and that this is NOT going to come back and bit you in the future) then I would suggest using the http.content.limit property override in nutch-site.xml. This will limit the webpage content you fetch, parse and send to be indexed. It would be more efficient as oppose to fetching it, parsing it and NOT using it later on... the latter seems a bit of a waste of time and resources. hth Lewis

