Hi Luis and d_k,

On Fri, Jan 10, 2014 at 3:10 PM, <[email protected]> wrote:

>
> One way is to use a copyField [0] in Solr and limit its length using the
> maxChars attribute and search for the original text and return the coped
> field although i'm not sure how much it will be useful for the end user.
>
> Yes you could use this, however if you know that you DO NOT require
anything over a certain character threshold (and that this is NOT going to
come back and bit you in the future) then I would suggest using the
http.content.limit property override in nutch-site.xml.
This will limit the webpage content you fetch, parse and send to be
indexed. It would be more efficient as oppose to fetching it, parsing it
and NOT using it later on... the latter seems a bit of a waste of time and
resources.
hth
Lewis

Reply via email to