RE: [RESULT] [VOTE] Moving to Git

2016-02-26 Thread Markus Jelsma
t; University of Southern California, Los Angeles, CA 90089 USA > ++++++ > > > > > > -Original Message- > From: Markus Jelsma > Reply-To: "dev@nutch.apache.org" > Date: Monday, February 22, 2016 at 1:54

[jira] [Commented] (NUTCH-2231) Jexl support in generator job

2016-02-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167409#comment-15167409 ] Markus Jelsma commented on NUTCH-2231: -- Proper null check. Committed to t

[jira] [Reopened] (NUTCH-2231) Jexl support in generator job

2016-02-25 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-2231: -- If no expression is set, an error is logged which shouldn't. > Jexl support in gener

[jira] [Commented] (NUTCH-1687) Pick queue in Round Robin

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163365#comment-15163365 ] Markus Jelsma commented on NUTCH-1687: -- Hi Tien - where did you patch and comm

[jira] [Resolved] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2231. -- Resolution: Fixed Committed to trunk in revision 1732177. This Jexl stuff is awesome! > J

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Attachment: NUTCH-2231.patch Updated patch that transforms hyphens in field identifiers to

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Description: CrawlDatum allows Jexl expressions on its metadata fields nicely, but it lacks the

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Description: Generator should support Jexl expressions. This would make it much easier to

[jira] [Updated] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2231: - Attachment: NUTCH-2231.patch Patch for trunk! It adds a JexlUtil where the expression parsing is

[jira] [Resolved] (NUTCH-1179) Option to restrict generated records by metadata

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1179. -- Resolution: Duplicate > Option to restrict generated records by metad

[jira] [Closed] (NUTCH-1179) Option to restrict generated records by metadata

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-1179. > Option to restrict generated records by metad

[jira] [Closed] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-2215. > Generator to restrict crawl to mime t

[jira] [Resolved] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2215. -- Resolution: Duplicate > Generator to restrict crawl to mime t

[jira] [Updated] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2215: - Affects Version/s: (was: 1.11) > Generator to restrict crawl to mime t

[jira] [Updated] (NUTCH-2215) Generator to restrict crawl to mime type

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2215: - Fix Version/s: (was: 1.12) > Generator to restrict crawl to mime t

[jira] [Resolved] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2232. -- Resolution: Fixed Assignee: Markus Jelsma Committed to trunk in revision 1732160. Thanks

[jira] [Updated] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Attachment: NUTCH-2232.patch Updated patch with only the following modification: * moved imports

[jira] [Updated] (NUTCH-2232) DeduplicationJob should decode URL's before length is compared

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Summary: DeduplicationJob should decode URL's before length is compared (was: Deduplicati

[jira] [Commented] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163003#comment-15163003 ] Markus Jelsma commented on NUTCH-2232: -- Yes, there is clearly a difference in le

[jira] [Updated] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Fix Version/s: 1.12 > DeduplicationJob: Url is not decoded before the url length is compa

[jira] [Updated] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2232: - Affects Version/s: 1.11 > DeduplicationJob: Url is not decoded before the url length is compa

[jira] [Resolved] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2229. -- Resolution: Fixed Committed to trunk in revision 1732140. > Allow Jexl expressions

[jira] [Commented] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162945#comment-15162945 ] Markus Jelsma commented on NUTCH-2229: -- Ah, this works very nicely! I'

[jira] [Created] (NUTCH-2231) Jexl support in generator job

2016-02-24 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2231: Summary: Jexl support in generator job Key: NUTCH-2231 URL: https://issues.apache.org/jira/browse/NUTCH-2231 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Attachment: NUTCH-2229.patch Patch for trunk! > Allow Jexl expressions on CrawlDatum'

[jira] [Updated] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-24 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2229: - Patch Info: Patch Available Description: CrawlDatum allows Jexl expressions on its metadata

[jira] [Created] (NUTCH-2229) Allow Jexl expressions on CrawlDatum's fixed attributes

2016-02-23 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2229: Summary: Allow Jexl expressions on CrawlDatum's fixed attributes Key: NUTCH-2229 URL: https://issues.apache.org/jira/browse/NUTCH-2229 Project: Nutch

[jira] [Resolved] (NUTCH-2227) RegexParseFilter

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2227. -- Resolution: Fixed Committed to trunk in revision 1731849. > RegexParseFil

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Attachment: NUTCH-2227.patch Updated patch. conf/regex-parsefilter.txt was missing in the patch

[jira] [Comment Edited] (NUTCH-2227) RegexParseFilter

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158808#comment-15158808 ] Markus Jelsma edited comment on NUTCH-2227 at 2/23/16 12:4

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Attachment: NUTCH-2227.patch Updated patch. It now includes package-info.java. Will commit

[jira] [Updated] (NUTCH-2216) db.ignore.*.links to optionally follow internal redirects

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2216: - Attachment: NUTCH-2216.patch Updated patch for trunk. And included second and third comments by

[jira] [Resolved] (NUTCH-2221) Introduce db.ignore.internal.links to FetcherThread

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2221. -- Resolution: Fixed Assignee: Markus Jelsma > Introduce db.ignore.internal.links

[jira] [Commented] (NUTCH-2221) Introduce db.ignore.internal.links to FetcherThread

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158691#comment-15158691 ] Markus Jelsma commented on NUTCH-2221: -- Committed to trunk in revision 173

[jira] [Comment Edited] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158684#comment-15158684 ] Markus Jelsma edited comment on NUTCH-2144 at 2/23/16 10:3

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158684#comment-15158684 ] Markus Jelsma commented on NUTCH-2144: -- ParseOutputFormat.filterNorma

[jira] [Updated] (NUTCH-2221) Introduce db.ignore.internal.links to FetcherThread

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2221: - Attachment: NUTCH-2221.patch Updated patch for current trunk revision. Will commit shortly

[jira] [Resolved] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2220. -- Resolution: Fixed Committed to trunk in revision 1731831. Thanks for your comments Sebastian

[jira] [Updated] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2220: - Description: We need an option db.ignore.internal.links that operates in FetcherThread, just

[jira] [Updated] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2220: - Description: We need an option db.ignore.internal.links that operates in FetcherThread, just

[jira] [Updated] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2220: - Description: We need an option db.ignore.internal.links that operates in FetcherThread, just

[jira] [Comment Edited] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158651#comment-15158651 ] Markus Jelsma edited comment on NUTCH-2220 at 2/23/16 10:0

[jira] [Commented] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158651#comment-15158651 ] Markus Jelsma commented on NUTCH-2220: -- Yes, i would opt for an incompatibility

[jira] [Resolved] (NUTCH-2228) Plugin index-replace unit test broken on Java 8

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2228. -- Resolution: Fixed Committed to trunk in revision 1731824. Thanks Sebastian! > Plugin in

[jira] [Updated] (NUTCH-2228) Plugin index-replace unit test broken on Java 8

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2228: - Summary: Plugin index-replace unit test broken on Java 8 (was: index-replace unit test fails

[jira] [Commented] (NUTCH-2228) index-replace unit test fails

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158633#comment-15158633 ] Markus Jelsma commented on NUTCH-2228: -- Ah i see! Your patch addresses the pro

[jira] [Assigned] (NUTCH-2228) index-replace unit test fails

2016-02-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-2228: Assignee: Markus Jelsma > index-replace unit test fa

[jira] [Created] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2228: Summary: index-replace unit test fails Key: NUTCH-2228 URL: https://issues.apache.org/jira/browse/NUTCH-2228 Project: Nutch Issue Type: Bug

[jira] [Work stopped] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2227 stopped by Markus Jelsma. > RegexParseFilter > > > Key

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Attachment: NUTCH-2227.patch Updated patch, added negative test. Which works. Will commit

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Attachment: NUTCH-2227.patch Updated patch, build.xml was missing > RegexParseFil

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Attachment: NUTCH-2227.patch Patch for trunk! Tests pass. > RegexParseFil

[jira] [Work started] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2227 started by Markus Jelsma. > RegexParseFilter > > > Key

[jira] [Updated] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2227: - Description: A parse filter that takes a regex and a field name. If regex matches via

[jira] [Created] (NUTCH-2227) RegexParseFilter

2016-02-22 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2227: Summary: RegexParseFilter Key: NUTCH-2227 URL: https://issues.apache.org/jira/browse/NUTCH-2227 Project: Nutch Issue Type: New Feature Components

[jira] [Updated] (NUTCH-2219) Criteria order to be configurable in DeduplicationJob

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2219: - Fix Version/s: 1.12 > Criteria order to be configurable in Deduplication

[jira] [Updated] (NUTCH-2219) Criteria order to be configurable in DeduplicationJob

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2219: - Affects Version/s: 1.11 > Criteria order to be configurable in Deduplication

[jira] [Resolved] (NUTCH-2219) Criteria order to be configurable in DeduplicationJob

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2219. -- Resolution: Fixed Committed to trunk in revision 1731651. Thanks Ron van der Vegt > Crite

[jira] [Commented] (NUTCH-2226) SOLR mismatch in deploy mode

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157027#comment-15157027 ] Markus Jelsma commented on NUTCH-2226: -- Hello - how is this related? Are you u

[jira] [Commented] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156711#comment-15156711 ] Markus Jelsma commented on NUTCH-2220: -- Any comments to this change, e.g. sepa

RE: [RESULT] [VOTE] Moving to Git

2016-02-22 Thread Markus Jelsma
Can someone please put up a small howto somewhere? I need to know how to: * check out trunk * check out a specific tag * do a svn up * create a patch, e.g. svn diff * perform a commit Thanks, Markus -Original message- > From:Mattmann, Chris A (3980) > Sent: Sunday 21st February 2016 1

[jira] [Updated] (NUTCH-2219) Criteria order to be configurable in DeduplicationJob

2016-02-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2219: - Description: Current implementation: "This command takes a path to a crawldb as paramete

[jira] [Updated] (NUTCH-2219) Criteria order to be configurable in DeduplicationJob

2016-02-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2219: - Summary: Criteria order to be configurable in DeduplicationJob (was: Dedup script, allow users

[jira] [Updated] (NUTCH-2219) Dedup script, allow users to change the order in which main documents are selected.

2016-02-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2219: - Attachment: NUTCH-2219.patch Thanks, looks fine! Slightly updated patch: * changed usage output

[jira] [Assigned] (NUTCH-2219) Dedup script, allow users to change the order in which main documents are selected.

2016-02-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-2219: Assignee: Markus Jelsma > Dedup script, allow users to change the order in which m

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152221#comment-15152221 ] Markus Jelsma commented on NUTCH-2191: -- 1. although that could work, it does

[jira] [Comment Edited] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152184#comment-15152184 ] Markus Jelsma edited comment on NUTCH-2191 at 2/18/16 11:3

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152184#comment-15152184 ] Markus Jelsma commented on NUTCH-2191: -- 1. ah yes,we still need to fix this c

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152141#comment-15152141 ] Markus Jelsma commented on NUTCH-2191: -- Hello Kshijtij - well no, certainly no

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152140#comment-15152140 ] Markus Jelsma commented on NUTCH-2191: -- Hi - it works indeed. But new prob

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151154#comment-15151154 ] Markus Jelsma commented on NUTCH-2191: -- Hi Karanjeet - looks like the only cha

[jira] [Resolved] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2223. -- Resolution: Fixed Committed to trunk in revision 1730808. > Upgrade xercesImpl to 2.11.0

[jira] [Commented] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150264#comment-15150264 ] Markus Jelsma commented on NUTCH-2223: -- Thanks Tien Nguyen Manh! >

[jira] [Commented] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150248#comment-15150248 ] Markus Jelsma commented on NUTCH-2223: -- Incredible, i tried the tika-breaker.

[jira] [Assigned] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-2223: Assignee: Markus Jelsma > Upgrade xercesImpl to 2.11.0 to fix hang on issue in t

[jira] [Updated] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2223: - Priority: Major (was: Minor) > Upgrade xercesImpl to 2.11.0 to fix hang on issue in t

[jira] [Updated] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2223: - Description: Stracktrace for the hang seems to be: {code} at

[jira] [Updated] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2223: - Fix Version/s: 1.12 > Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimet

[jira] [Updated] (NUTCH-2223) Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimetype detection

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2223: - Description: {code}Stracktrace for the hang seems to be: at

[jira] [Updated] (NUTCH-2224) Average bytes/second calculated incorrectly in fetcher

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2224: - Component/s: fetcher > Average bytes/second calculated incorrectly in fetc

[jira] [Updated] (NUTCH-2224) Average bytes/second calculated incorrectly in fetcher

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2224: - Affects Version/s: 1.11 > Average bytes/second calculated incorrectly in fetc

[jira] [Updated] (NUTCH-2224) Average bytes/second calculated incorrectly in fetcher

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2224: - Fix Version/s: 1.12 > Average bytes/second calculated incorrectly in fetc

[jira] [Resolved] (NUTCH-2224) Average bytes/second calculated incorrectly in fetcher

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2224. -- Resolution: Fixed Committed to trunk in revision 1730803. Thanks Tien Nguyen Manh! > Aver

[jira] [Updated] (NUTCH-2224) Average bytes/second calculated incorrectly in fetcher

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2224: - Summary: Average bytes/second calculated incorrectly in fetcher (was: Wrong metric compute in

[jira] [Assigned] (NUTCH-2224) Wrong metric compute in Fetcher status report

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-2224: Assignee: Markus Jelsma > Wrong metric compute in Fetcher status rep

[jira] [Resolved] (NUTCH-2225) Parsed time calculated incorrectly

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2225. -- Resolution: Fixed Committed to trunk in revision 1730802. Thanks Tien Nguyen Manh! > Par

[jira] [Updated] (NUTCH-2225) Parsed time calculated incorrectly

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2225: - Summary: Parsed time calculated incorrectly (was: Parsed time not include time to parse

[jira] [Assigned] (NUTCH-2225) Parsed time not include time to parse

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-2225: Assignee: Markus Jelsma > Parsed time not include time to pa

[jira] [Updated] (NUTCH-2225) Parsed time not include time to parse

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2225: - Affects Version/s: 1.11 > Parsed time not include time to pa

[jira] [Updated] (NUTCH-2225) Parsed time not include time to parse

2016-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2225: - Fix Version/s: 1.12 > Parsed time not include time to pa

[jira] [Resolved] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-961. - Resolution: Fixed Committed to trunk in revision 1730694. Thanks everyone for contributions

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Updated patch. ExtractorRepository was missing. > Expose Tik

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: 1.12 > Expose Tika's boilerpipe

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Affects Version/s: 1.11 > Expose Tika's boilerpipe

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148642#comment-15148642 ] Markus Jelsma commented on NUTCH-961: - Tests pass as expected and Boilerpipe as

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Patch for trunk. > Expose Tika's boilerpipe

[jira] [Resolved] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1233. -- Resolution: Fixed Committed to trunk in revision 1730687. > Rely on Tika for outl

[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1233: - Affects Version/s: 1.11 > Rely on Tika for outlink extract

<    3   4   5   6   7   8   9   10   11   12   >