But 'normalizer' can't be used with 'injector' (seed.txt)... 'normalizer' is called after Fetching-Parsing-Outlinks HTML...
> -----Original Message----- > From: Mohamed Parvez [mailto:[email protected]] > Sent: September-03-09 3:58 PM > To: [email protected] > Subject: Re: URL with Space > > Thanks for the suggestion fuad. > > I used your suggestion but does not seem to work, the space does not get > replaces by %20 or + > > Senario-1 > urls/seed.txt: > ------------------ > http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true &_ > pageLabel=SMBPortal_page_newsandresources_headlinedetail&newsId=10553&catego ry > name=SmallBusiness&portletTitle=Small > Business Features > > I get the fallowing error: > --------------------------------- > fetch of > http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb > =true&_pageLabel=SMBPortal_page_newsandresources_headlinedetail&newsId=10553 &c > at > egoryname=Small Business&portletTitle=Small Business > *Features failed with: Httpcode=406* > > > But if I Start with an URL with %20 instead of space > > Senario-2 > urls/seed.txt: > ------------------ > http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true &_ > pageLabel=SMBPortal_page_newsandresources_headlinedetail&newsId=10553&catego ry > name=Small%20Business&portletTitle=Small%20Business%20Features > > Everything works as expected. > > > ---- > Thanks/Regards, > Parvez > > > > On Thu, Sep 3, 2009 at 1:45 PM, Fuad Efendi <[email protected]> wrote: > > > > > > I am suing the urlnormalizer plugin (urlnormalizer-(pass|regex|basic)) > > and > > I > > > put the below rule in the conf/regex-normalize.xml file > > > > > > <regex> > > > <pattern>\s</pattern> > > > <substitution>%20</substitution> > > > </regex> > > > > > > > > > Should be escaped backslash: > > <pattern>\\s</pattern> > > > > > > You can also use + (plus) instead of %20. > > > > > > > > > >
