Hi,
Nutch should convert the & in the href attribute
to a bare ampersand and keep it for all succeeding
operations.
What version of Nutch is used?
Are there changes to the default configuration?
Trial with a dummy test document on a local Apache httpd:
% cat /var/www/test_amp.html
<html>
<head>
<title>test ampersand in href</title>
</head>
<body>
test: <a href="test.html#!/test?q=a&p=b">link</a>
</body>
</html>
% nutch parsechecker http://localhost/test_amp.html
fetching: http://localhost/test_amp.html
...
Status: success(1,0)
Title: test ampersand in href
Outlinks: 1
outlink: toUrl: http://localhost/test.html#!/test?q=a&p=b anchor: link
...
Regards,
Sebastian
On 11/10/2015 01:10 AM, bbarani wrote:
> Hi,
>
> I see that nutch converts & to & in crawldb, when it invokes the call to
> these url's does it automatically convert & to &? I dont see this
> happening now hence want to check if there is a solution to fix this isue.
>
> href="test.html#!/testing/id/interactive_1500001491?make=Apple*&*model=Apple6s"
>
>
> Thanks,
> Barani
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-Stop-converting-in-the-url-to-amp-tp4239292.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>