[jira] [Updated] (NUTCH-1749) Title duplicated in document body

2014-04-05 Thread Greg Padiasek (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Padiasek updated NUTCH-1749: - Attachment: DOMContentUtils.patch > Title duplicated in document body > -

[jira] [Created] (NUTCH-1749) Title duplicated in document body

2014-04-05 Thread Greg Padiasek (JIRA)
Greg Padiasek created NUTCH-1749: Summary: Title duplicated in document body Key: NUTCH-1749 URL: https://issues.apache.org/jira/browse/NUTCH-1749 Project: Nutch Issue Type: Bug Com

[jira] [Issue Comment Deleted] (NUTCH-1615) Implementing A Feature for Fetching From Websites Dump

2014-04-05 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cihad güzel updated NUTCH-1615: --- Comment: was deleted (was: I'm trying for this issue.) > Implementing A Feature for Fetching From We

[jira] [Commented] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961236#comment-13961236 ] Hudson commented on NUTCH-1747: --- SUCCESS: Integrated in Nutch-trunk #2592 (See [https://bui

[jira] [Created] (NUTCH-1748) despite unix systems allow "abc..xyz.txt" kind of urls, url validator plugin rejects.

2014-04-05 Thread Sertac TURKEL (JIRA)
Sertac TURKEL created NUTCH-1748: Summary: despite unix systems allow "abc..xyz.txt" kind of urls, url validator plugin rejects. Key: NUTCH-1748 URL: https://issues.apache.org/jira/browse/NUTCH-1748

[jira] [Updated] (NUTCH-1342) Read time out protocol-http

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1342: - Component/s: (was: fetcher) protocol > Read time out protocol-http > ---

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-827: Component/s: (was: fetcher) protocol > HTTP POST Authentication > -

[jira] [Updated] (NUTCH-410) Faster RegexNormalize with more features

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-410: Component/s: (was: fetcher) > Faster RegexNormalize with more features > ---

[jira] [Resolved] (NUTCH-1278) Fetch Improvement in threads per host

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1278. -- Resolution: Won't Fix No follow up from contributor + solution proposed quite invasive (changes

[jira] [Resolved] (NUTCH-1297) it is better for fetchItemQueues to select items from greater queues first

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1297. -- Resolution: Won't Fix NUTCH-1687 is a nicer approach + no feedback from original contributor >

[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-490: Component/s: (was: fetcher) parser > Extension point with filters for Neko HTML

[jira] [Resolved] (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-385. - Resolution: Not a Problem This is not a problem but a discussion of how things work in the Fetcher

[jira] [Resolved] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1747. -- Resolution: Fixed Committed revision 1585196. > Use AtomicInteger as semaphore in Fetcher > -

[jira] [Commented] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961211#comment-13961211 ] Sebastian Nagel commented on NUTCH-1747: +1 Looks like inProgress was intended to

[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2014-04-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1182: --- Attachment: NUTCH-1182-trunk-v1.patch >From time to time this problem is reported by users >

[jira] [Updated] (NUTCH-1182) fetcher should track and shut down hung threads

2014-04-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1182: --- Fix Version/s: 1.9 > fetcher should track and shut down hung threads > --

[jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects

2014-04-05 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961185#comment-13961185 ] Hudson commented on NUTCH-1735: --- SUCCESS: Integrated in Nutch-trunk #2591 (See [https://bui

[jira] [Resolved] (NUTCH-1735) code dedup fetcher queue redirects

2014-04-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1735. Resolution: Fixed Committed to trunk r1585144. > code dedup fetcher queue redirects >

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961009#comment-13961009 ] Julien Nioche commented on NUTCH-1687: -- I like the idea but am a bit concerned by the

[jira] [Commented] (NUTCH-1735) code dedup fetcher queue redirects

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961002#comment-13961002 ] Julien Nioche commented on NUTCH-1735: -- +1 Nice to simplify the code of the Fetcher

[jira] [Assigned] (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-207: --- Assignee: Julien Nioche Will see if I can port this patch to the current version of the Fetche

[jira] [Updated] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1747: - Attachment: NUTCH-1747-trunk.patch > Use AtomicInteger as semaphore in Fetcher >

[jira] [Created] (NUTCH-1747) Use AtomicInteger as semaphore in Fetcher

2014-04-05 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1747: Summary: Use AtomicInteger as semaphore in Fetcher Key: NUTCH-1747 URL: https://issues.apache.org/jira/browse/NUTCH-1747 Project: Nutch Issue Type: Improveme