Hi,

Nutch should convert the & in the href attribute
to a bare ampersand and keep it for all succeeding
operations.

What version of Nutch is used?
Are there changes to the default configuration?

Trial with a dummy test document on a local Apache httpd:

% cat /var/www/test_amp.html
<html>
  <head>
    <title>test ampersand in href</title>
  </head>
  <body>
    test: <a href="test.html#!/test?q=a&amp;p=b">link</a>
  </body>
</html>


% nutch parsechecker http://localhost/test_amp.html
fetching: http://localhost/test_amp.html
...
Status: success(1,0)
Title: test ampersand in href
Outlinks: 1
  outlink: toUrl: http://localhost/test.html#!/test?q=a&p=b anchor: link
...

Regards,
Sebastian

On 11/10/2015 01:10 AM, bbarani wrote:
> Hi,
> 
> I see that nutch converts & to &amp; in crawldb, when it invokes the call to
> these url's does it automatically convert &amp; to &? I dont see this
> happening now hence want to check if there is a solution to fix this isue.
> 
> href="test.html#!/testing/id/interactive_1500001491?make=Apple*&amp;*model=Apple6s"
>  
> 
> Thanks,
> Barani
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-Stop-converting-in-the-url-to-amp-tp4239292.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to