t; University of Southern California, Los Angeles, CA 90089 USA
> ++++++
>
>
>
>
>
> -Original Message-
> From: Markus Jelsma
> Reply-To: "dev@nutch.apache.org"
> Date: Monday, February 22, 2016 at 1:54
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167409#comment-15167409
]
Markus Jelsma commented on NUTCH-2231:
--
Proper null check. Committed to t
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reopened NUTCH-2231:
--
If no expression is set, an error is logged which shouldn't.
> Jexl support in gener
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163365#comment-15163365
]
Markus Jelsma commented on NUTCH-1687:
--
Hi Tien - where did you patch and comm
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2231.
--
Resolution: Fixed
Committed to trunk in revision 1732177. This Jexl stuff is awesome!
> J
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Attachment: NUTCH-2231.patch
Updated patch that transforms hyphens in field identifiers to
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Description:
CrawlDatum allows Jexl expressions on its metadata fields nicely, but it lacks
the
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Description:
Generator should support Jexl expressions. This would make it much easier to
[
https://issues.apache.org/jira/browse/NUTCH-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2231:
-
Attachment: NUTCH-2231.patch
Patch for trunk! It adds a JexlUtil where the expression parsing is
[
https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1179.
--
Resolution: Duplicate
> Option to restrict generated records by metad
[
https://issues.apache.org/jira/browse/NUTCH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1179.
> Option to restrict generated records by metad
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2215.
> Generator to restrict crawl to mime t
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2215.
--
Resolution: Duplicate
> Generator to restrict crawl to mime t
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Affects Version/s: (was: 1.11)
> Generator to restrict crawl to mime t
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Fix Version/s: (was: 1.12)
> Generator to restrict crawl to mime t
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2232.
--
Resolution: Fixed
Assignee: Markus Jelsma
Committed to trunk in revision 1732160. Thanks
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Attachment: NUTCH-2232.patch
Updated patch with only the following modification:
* moved imports
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Summary: DeduplicationJob should decode URL's before length is compared
(was: Deduplicati
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163003#comment-15163003
]
Markus Jelsma commented on NUTCH-2232:
--
Yes, there is clearly a difference in le
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Fix Version/s: 1.12
> DeduplicationJob: Url is not decoded before the url length is compa
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2232:
-
Affects Version/s: 1.11
> DeduplicationJob: Url is not decoded before the url length is compa
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2229.
--
Resolution: Fixed
Committed to trunk in revision 1732140.
> Allow Jexl expressions
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162945#comment-15162945
]
Markus Jelsma commented on NUTCH-2229:
--
Ah, this works very nicely! I'
Markus Jelsma created NUTCH-2231:
Summary: Jexl support in generator job
Key: NUTCH-2231
URL: https://issues.apache.org/jira/browse/NUTCH-2231
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Attachment: NUTCH-2229.patch
Patch for trunk!
> Allow Jexl expressions on CrawlDatum'
[
https://issues.apache.org/jira/browse/NUTCH-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2229:
-
Patch Info: Patch Available
Description:
CrawlDatum allows Jexl expressions on its metadata
Markus Jelsma created NUTCH-2229:
Summary: Allow Jexl expressions on CrawlDatum's fixed attributes
Key: NUTCH-2229
URL: https://issues.apache.org/jira/browse/NUTCH-2229
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2227.
--
Resolution: Fixed
Committed to trunk in revision 1731849.
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch. conf/regex-parsefilter.txt was missing in the patch
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158808#comment-15158808
]
Markus Jelsma edited comment on NUTCH-2227 at 2/23/16 12:4
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch. It now includes package-info.java. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Attachment: NUTCH-2216.patch
Updated patch for trunk. And included second and third comments by
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2221.
--
Resolution: Fixed
Assignee: Markus Jelsma
> Introduce db.ignore.internal.links
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158691#comment-15158691
]
Markus Jelsma commented on NUTCH-2221:
--
Committed to trunk in revision 173
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158684#comment-15158684
]
Markus Jelsma edited comment on NUTCH-2144 at 2/23/16 10:3
[
https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158684#comment-15158684
]
Markus Jelsma commented on NUTCH-2144:
--
ParseOutputFormat.filterNorma
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2221.patch
Updated patch for current trunk revision. Will commit shortly
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2220.
--
Resolution: Fixed
Committed to trunk in revision 1731831. Thanks for your comments Sebastian
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Description:
We need an option db.ignore.internal.links that operates in FetcherThread, just
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Description:
We need an option db.ignore.internal.links that operates in FetcherThread, just
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Description:
We need an option db.ignore.internal.links that operates in FetcherThread, just
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158651#comment-15158651
]
Markus Jelsma edited comment on NUTCH-2220 at 2/23/16 10:0
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158651#comment-15158651
]
Markus Jelsma commented on NUTCH-2220:
--
Yes, i would opt for an incompatibility
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2228.
--
Resolution: Fixed
Committed to trunk in revision 1731824. Thanks Sebastian!
> Plugin in
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2228:
-
Summary: Plugin index-replace unit test broken on Java 8 (was:
index-replace unit test fails
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158633#comment-15158633
]
Markus Jelsma commented on NUTCH-2228:
--
Ah i see! Your patch addresses the pro
[
https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2228:
Assignee: Markus Jelsma
> index-replace unit test fa
Markus Jelsma created NUTCH-2228:
Summary: index-replace unit test fails
Key: NUTCH-2228
URL: https://issues.apache.org/jira/browse/NUTCH-2228
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2227 stopped by Markus Jelsma.
> RegexParseFilter
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch, added negative test. Which works. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Updated patch, build.xml was missing
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Attachment: NUTCH-2227.patch
Patch for trunk! Tests pass.
> RegexParseFil
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2227 started by Markus Jelsma.
> RegexParseFilter
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2227:
-
Description:
A parse filter that takes a regex and a field name. If regex matches via
Markus Jelsma created NUTCH-2227:
Summary: RegexParseFilter
Key: NUTCH-2227
URL: https://issues.apache.org/jira/browse/NUTCH-2227
Project: Nutch
Issue Type: New Feature
Components
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Fix Version/s: 1.12
> Criteria order to be configurable in Deduplication
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Affects Version/s: 1.11
> Criteria order to be configurable in Deduplication
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2219.
--
Resolution: Fixed
Committed to trunk in revision 1731651. Thanks Ron van der Vegt
> Crite
[
https://issues.apache.org/jira/browse/NUTCH-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157027#comment-15157027
]
Markus Jelsma commented on NUTCH-2226:
--
Hello - how is this related? Are you u
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156711#comment-15156711
]
Markus Jelsma commented on NUTCH-2220:
--
Any comments to this change, e.g. sepa
Can someone please put up a small howto somewhere? I need to know how to:
* check out trunk
* check out a specific tag
* do a svn up
* create a patch, e.g. svn diff
* perform a commit
Thanks,
Markus
-Original message-
> From:Mattmann, Chris A (3980)
> Sent: Sunday 21st February 2016 1
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Description:
Current implementation:
"This command takes a path to a crawldb as paramete
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Summary: Criteria order to be configurable in DeduplicationJob (was: Dedup
script, allow users
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2219:
-
Attachment: NUTCH-2219.patch
Thanks, looks fine!
Slightly updated patch:
* changed usage output
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2219:
Assignee: Markus Jelsma
> Dedup script, allow users to change the order in which m
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152221#comment-15152221
]
Markus Jelsma commented on NUTCH-2191:
--
1. although that could work, it does
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152184#comment-15152184
]
Markus Jelsma edited comment on NUTCH-2191 at 2/18/16 11:3
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152184#comment-15152184
]
Markus Jelsma commented on NUTCH-2191:
--
1. ah yes,we still need to fix this c
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152141#comment-15152141
]
Markus Jelsma commented on NUTCH-2191:
--
Hello Kshijtij - well no, certainly no
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152140#comment-15152140
]
Markus Jelsma commented on NUTCH-2191:
--
Hi - it works indeed. But new prob
[
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151154#comment-15151154
]
Markus Jelsma commented on NUTCH-2191:
--
Hi Karanjeet - looks like the only cha
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2223.
--
Resolution: Fixed
Committed to trunk in revision 1730808.
> Upgrade xercesImpl to 2.11.0
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150264#comment-15150264
]
Markus Jelsma commented on NUTCH-2223:
--
Thanks Tien Nguyen Manh!
>
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150248#comment-15150248
]
Markus Jelsma commented on NUTCH-2223:
--
Incredible, i tried the tika-breaker.
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2223:
Assignee: Markus Jelsma
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in t
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Priority: Major (was: Minor)
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in t
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Description:
Stracktrace for the hang seems to be:
{code}
at
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Fix Version/s: 1.12
> Upgrade xercesImpl to 2.11.0 to fix hang on issue in tika mimet
[
https://issues.apache.org/jira/browse/NUTCH-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2223:
-
Description:
{code}Stracktrace for the hang seems to be:
at
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Component/s: fetcher
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Affects Version/s: 1.11
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Fix Version/s: 1.12
> Average bytes/second calculated incorrectly in fetc
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2224.
--
Resolution: Fixed
Committed to trunk in revision 1730803. Thanks Tien Nguyen Manh!
> Aver
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2224:
-
Summary: Average bytes/second calculated incorrectly in fetcher (was:
Wrong metric compute in
[
https://issues.apache.org/jira/browse/NUTCH-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2224:
Assignee: Markus Jelsma
> Wrong metric compute in Fetcher status rep
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2225.
--
Resolution: Fixed
Committed to trunk in revision 1730802. Thanks Tien Nguyen Manh!
> Par
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Summary: Parsed time calculated incorrectly (was: Parsed time not include
time to parse
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2225:
Assignee: Markus Jelsma
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Affects Version/s: 1.11
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2225:
-
Fix Version/s: 1.12
> Parsed time not include time to pa
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-961.
-
Resolution: Fixed
Committed to trunk in revision 1730694. Thanks everyone for contributions
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Updated patch. ExtractorRepository was missing.
> Expose Tik
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.12
> Expose Tika's boilerpipe
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Affects Version/s: 1.11
> Expose Tika's boilerpipe
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148642#comment-15148642
]
Markus Jelsma commented on NUTCH-961:
-
Tests pass as expected and Boilerpipe as
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Patch for trunk.
> Expose Tika's boilerpipe
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1233.
--
Resolution: Fixed
Committed to trunk in revision 1730687.
> Rely on Tika for outl
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Affects Version/s: 1.11
> Rely on Tika for outlink extract
701 - 800 of 3815 matches
Mail list logo