[jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to use GORA_94 branch

2014-04-16 Thread Matzz (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970730#comment-13970730 ] Matzz commented on NUTCH-1714: -- This patch is not compatible w index-metadata plugin {code}

[jira] [Commented] (NUTCH-1676) Add rudimentary SSL support to protocol-http

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971452#comment-13971452 ] Julien Nioche commented on NUTCH-1676: -- Hi Markus - any progress on this issue? Would

[jira] [Resolved] (NUTCH-1720) Duplicate lines in HttpBase.java

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1720. -- Resolution: Fixed Thanks Walter! Committed revision 1587923. Duplicate lines in

[jira] [Commented] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971471#comment-13971471 ] Julien Nioche commented on NUTCH-1147: -- Good idea not to force it to 1 but what about

[jira] [Resolved] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1603. -- Resolution: Fixed Committed revision 1587928. ZIP parser complains about truncated PDF file

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971486#comment-13971486 ] Julien Nioche commented on NUTCH-1521: -- Can we close this one? CrawlDbFilter pass

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971477#comment-13971477 ] Julien Nioche commented on NUTCH-1697: -- Hi Markus. Actually it does matter and BTW

[jira] [Resolved] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1743. -- Resolution: Fixed Committed revision 1587935. parsechecker to show outlinks

[jira] [Comment Edited] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971508#comment-13971508 ] Julien Nioche edited comment on NUTCH-1743 at 4/16/14 2:56 PM:

[jira] [Commented] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971511#comment-13971511 ] Julien Nioche commented on NUTCH-1743: -- 2-x : Committed revision 1587936.

[jira] [Issue Comment Deleted] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1743: - Comment: was deleted (was: Trunk Committed revision 1587935. ) parsechecker to show outlinks

[jira] [Created] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-04-16 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1757: Summary: ParserChecker to take custom metadata as input Key: NUTCH-1757 URL: https://issues.apache.org/jira/browse/NUTCH-1757 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1757: - Attachment: NUTCH-1757.patch ParserChecker to take custom metadata as input

[jira] [Commented] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971545#comment-13971545 ] Hudson commented on NUTCH-1603: --- SUCCESS: Integrated in Nutch-trunk #2606 (See

[jira] [Commented] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971546#comment-13971546 ] Hudson commented on NUTCH-1743: --- SUCCESS: Integrated in Nutch-trunk #2606 (See

[jira] [Created] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1758: Summary: IndexChecker to send document to IndexWriters Key: NUTCH-1758 URL: https://issues.apache.org/jira/browse/NUTCH-1758 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1758: - Attachment: NUTCH-1758.patch IndexChecker to send document to IndexWriters

[jira] [Commented] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-04-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971551#comment-13971551 ] Julien Nioche commented on NUTCH-1758: -- The parameter -D doIndex=true must be either

[jira] [Commented] (NUTCH-1743) parsechecker to show outlinks

2014-04-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971644#comment-13971644 ] Hudson commented on NUTCH-1743: --- SUCCESS: Integrated in Nutch-nutchgora #991 (See

[jira] [Updated] (NUTCH-1566) bin/nutch to allow whitespace in paths

2014-04-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1566: --- Attachment: NUTCH-1566-2x.patch NUTCH-1566-v3-trunk.patch * patch updated

[jira] [Commented] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972023#comment-13972023 ] Lewis John McGibbney commented on NUTCH-1603: - Committed @ revision 1588088 in

[jira] [Updated] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1603: Fix Version/s: 2.3 ZIP parser complains about truncated PDF file

[jira] [Resolved] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1473. - Resolution: Won't Fix This won't be touched and should be closed as such.

[jira] [Closed] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1473. --- Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

[jira] [Updated] (NUTCH-992) SolrDedup is broken in 2.x

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-992: --- Fix Version/s: (was: 2.4) 2.3 SolrDedup is broken in 2.x

[jira] [Closed] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-970. -- Injector job crashes with MySQL with table collation set to utf8_general_ci

[jira] [Reopened] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened NUTCH-970: Injector job crashes with MySQL with table collation set to utf8_general_ci

[jira] [Resolved] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-970. Resolution: Won't Fix Injector job crashes with MySQL with table collation set to

[jira] [Updated] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-970: --- Fix Version/s: (was: 2.4) 2.3 Injector job crashes with MySQL

[jira] [Closed] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-970. -- Injector job crashes with MySQL with table collation set to utf8_general_ci

[jira] [Updated] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1473: Fix Version/s: (was: 2.4) 2.3 Column length too big for

[jira] [Resolved] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1473. - Resolution: Won't Fix Column length too big for column 'text' (max = 21845);

[jira] [Reopened] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened NUTCH-1473: - Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

[jira] [Closed] (NUTCH-1473) Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1473. --- Column length too big for column 'text' (max = 21845); use BLOB or TEXT instead

[jira] [Created] (NUTCH-1759) Upgrade to Crawler Commons 0.4

2014-04-16 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1759: --- Summary: Upgrade to Crawler Commons 0.4 Key: NUTCH-1759 URL: https://issues.apache.org/jira/browse/NUTCH-1759 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser

2014-04-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Attachment: NUTCH-1308-ZipParser-main-trunk.patch Hi [~lewismc], is this fixed with

[jira] [Commented] (NUTCH-1605) mime type detector recognizes xlsx as zip file

2014-04-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972053#comment-13972053 ] Sebastian Nagel commented on NUTCH-1605: Changes to MIME magic may result in

[jira] [Commented] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser

2014-04-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972051#comment-13972051 ] Lewis John McGibbney commented on NUTCH-1308: - Thanks Seb. I'll dig in to this

[jira] [Commented] (NUTCH-1603) ZIP parser complains about truncated PDF file

2014-04-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972217#comment-13972217 ] Hudson commented on NUTCH-1603: --- SUCCESS: Integrated in Nutch-nutchgora #992 (See

[jira] [Closed] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2014-04-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng closed NUTCH-1521. - Resolution: Fixed Fix Version/s: (was: 2.4) 1.9 CrawlDbFilter pass null url to