What it truncates, 'http://' or 'sId=386'? Or something inside URL?
Just inject http://business.verizon.net/ ... nutch should find the rest... I believe Nutch doesn't have any limits with URL length, although some Web servers limited to 4000... > http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_pageLabel =S > MBPortal_page_main_marketplace&_nfpb=true&_windowLabel=MarketPlacePFControll er > _1&MarketPlacePFController_1_actionOverride=%252Fpageflows%252Fverizon%252Fs mb > %252Fportal%252FmarketPlacePF%252FgetProductDetails&MarketPlacePFController_ 1p > roductsId=386 > > Thanks/Regards, > Parvez > > > > On Tue, Sep 1, 2009 at 4:43 PM, Fuad Efendi <[email protected]> wrote: > > > > I opened the part-00000 file in the dump folder and there, is only ONE > > url > > > and it has been truncated to 318 chars > > > How make Nutch consider URLs with length more than 318 chars > > > > Please provide original (before truncating) sample of such URL > > Thanks > > > > > > > > > >
