Hello, # is for interpage anchoring, which mean both URLS should point to the same webpage.
It is done via URLNormalizers. Comment the following entry in regex-normalize.xml, if your really have to do it. <regex> <pattern>#.*?(\?|&|$)</pattern> <substitution>$1</substitution> </regex> Thanks, Charan On Tue, Jan 11, 2011 at 9:21 PM, Sourabh Kasliwal <[email protected]>wrote: > Hi, > > While crawling some links I found that nutch truncate some urls that have # > within it. > Eg:- > *http://www.techmeme.com/110111/p82#a110111p82* gets truncated to * > http://www.techmeme.com/110111/p82* > > Can any one please let me know why does nutch does this... or is there a > simple way to avoid it. > > regards > Sourabh >

