[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser

2014-04-26 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-952:
--

Attachment: test_nutch_952.html

Was fixed by NUTCH-797 for v 1.4 (2.x will follow soon). Example link 
(attached) works now for 1.8 (both with parse-html and parse-tika):
{code}
% nutch parsechecker http://localhost/test_nutch_952.html
...
Outlinks: 1
  outlink: toUrl: http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0
{code}

 fix outlink which started with '?' in html parser
 -

 Key: NUTCH-952
 URL: https://issues.apache.org/jira/browse/NUTCH-952
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: nutchgora
Reporter: Stondet
 Attachments: NUTCH-952-v2.patch, test_nutch_952.html


 a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from 
 http://bbs.soso.com/search?ty=csd=0w=rails)
 outlink parsed from above link: 
 http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0
 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser

2014-04-26 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-952:
--

Fix Version/s: (was: 1.9)

 fix outlink which started with '?' in html parser
 -

 Key: NUTCH-952
 URL: https://issues.apache.org/jira/browse/NUTCH-952
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: nutchgora
Reporter: Stondet
 Attachments: NUTCH-952-v2.patch, test_nutch_952.html


 a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from 
 http://bbs.soso.com/search?ty=csd=0w=rails)
 outlink parsed from above link: 
 http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0
 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-952:
---

Fix Version/s: 1.7

 fix outlink which started with '?' in html parser
 -

 Key: NUTCH-952
 URL: https://issues.apache.org/jira/browse/NUTCH-952
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: nutchgora
Reporter: Stondet
 Fix For: 1.7

 Attachments: NUTCH-952-v2.patch


 a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from 
 http://bbs.soso.com/search?ty=csd=0w=rails)
 outlink parsed from above link: 
 http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0
 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-952) fix outlink which started with '?' in html parser

2011-01-07 Thread Stondet (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stondet updated NUTCH-952:
--

Affects Version/s: (was: 1.3)
   2.0

 fix outlink which started with '?' in html parser
 -

 Key: NUTCH-952
 URL: https://issues.apache.org/jira/browse/NUTCH-952
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 2.0
Reporter: Stondet
 Attachments: NUTCH-952-v2.patch


 a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from 
 http://bbs.soso.com/search?ty=csd=0w=rails)
 outlink parsed from above link: 
 http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0
 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-952) fix outlink which started with '?' in html parser

2011-01-05 Thread Stondet (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stondet updated NUTCH-952:
--

Attachment: NUTCH-952.patch

fix outlink which started with '?'

 fix outlink which started with '?' in html parser
 -

 Key: NUTCH-952
 URL: https://issues.apache.org/jira/browse/NUTCH-952
 Project: Nutch
  Issue Type: Bug
  Components: parser
Reporter: Stondet
 Attachments: NUTCH-952.patch


 a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from 
 http://bbs.soso.com/search?ty=csd=0w=rails)
 outlink parsed from above link: 
 http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0
 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.