[
https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1293:
-
Attachment: NUTCH-1293-1.5-1.patch
Patch for 1.5.
IndexingFiltersChecker to
[
https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1293:
-
Attachment: (was: NUTCH-1293-1.5-1.patch)
IndexingFiltersChecker to store detected
[
https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1293:
-
Attachment: NUTCH-1293-1.5-1.patch
Wrong patch indeed :)
[
https://issues.apache.org/jira/browse/NUTCH-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1291:
-
Attachment: NUTCH-1291-1.5-1.patch
Patch for 1.5.
Fetcher to stringify
[
https://issues.apache.org/jira/browse/NUTCH-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1215:
-
Attachment: NUTCH-1215-1.5-1.patch
Patch for 1.5. Couldn't be simpler.
[
https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1259:
-
Patch Info: Patch Available
TikaParser should not add Content-Type from HTTP Headers to
[
https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1262:
-
Priority: Minor (was: Major)
Patch Info: Patch Available
Map `duplicating`
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1258:
-
Patch Info: Patch Available
MoreIndexingFilter should be able to read Content-Type from
[
https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1005:
-
Summary: Parse headings plugin (was: Index headings plugin)
Parse headings plugin
[
https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1259:
-
Attachment: NUTCH-1259-1.5-1.patch
Here's a patch for 1.5. Comments? We have this running in
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1258:
-
Priority: Minor (was: Major)
MoreIndexingFilter should be able to read Content-Type from
[
https://issues.apache.org/jira/browse/NUTCH-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1266:
-
Attachment: NUTCH-1266-1.5-1.patch
Patch add an optional key element. If configured that value
[
https://issues.apache.org/jira/browse/NUTCH-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1266:
-
Patch Info: Patch Available
Subcollection to optionally write to configured fields
[
https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1005:
-
Attachment: NUTCH-1005-1.5-5.patch
New patch without indexing capabilities. Use NUTCH-1264 for
[
https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1005:
-
Attachment: NUTCH-1005-1.5-4.patch
New patch as per Julien's comments.
Index
[
https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1262:
-
Attachment: NUTCH-1262-1.5-1.patch
Here's a patch for 1.5. It seems to work fine when tested
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1242:
-
Attachment: NUTCH-1242-1.5-1.patch
Patch for latest trunk. Changed config options from
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1245:
-
Priority: Critical (was: Major)
URL gone with 404 after db.fetch.interval.max stays
[
https://issues.apache.org/jira/browse/NUTCH-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1260:
-
Fix Version/s: 1.5
Fetcher should log fetching of redirects
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1258:
-
Attachment: NUTCH-1258-1.5-1.patch
Patch for 1.5. Adds configuration to read from contentmeta,
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1252:
-
Fix Version/s: 1.5
Thanks. Marked for 1.5, keeping it on the radar.
[
https://issues.apache.org/jira/browse/NUTCH-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1256:
-
Attachment: NUTCH-1256-1.5-1.patch
Patch introduces new parameter with two mandatory arguments.
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1252:
-
Thanks. Marked for 1.5, keeping it on the radar.
SegmentReader -get shows wrong
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1201:
-
Attachment: CustomFetcher.java
NUTCH-1201-1.5-wip.patch
Here's a WIP that allows
[
https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1251:
-
Fix Version/s: 1.5
Deletion of duplicates fails with
[
https://issues.apache.org/jira/browse/NUTCH-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1248:
-
Attachment: NUTCH-1248-1.5-1.patch
Any comments? Tests pass and it works as expected. I'll
[
https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1139:
-
Attachment: NUTCH-1139-1.5-2.patch
New patch for 1.5. Any final comments?
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-827:
Fix Version/s: 1.5
HTTP POST Authentication
Key:
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1245:
-
Fix Version/s: 1.5
URL gone with 404 after db.fetch.interval.max stays db_unfetched in
[
https://issues.apache.org/jira/browse/NUTCH-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1244:
-
Attachment: NUTCH-1244-1.5-1.patch
Patch for 1.5. It relies on an exact match of the whole
[
https://issues.apache.org/jira/browse/NUTCH-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1244:
-
Attachment: NUTCH-1244-1.5-2.patch
Patch for 1.5 fixes small issue with arguments and adds
[
https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1210:
-
Attachment: NUTCH-1210-1.5-1.patch
Patch for 1.5.
DomainBlacklistFilter
[
https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1210:
-
Patch Info: Patch Available
DomainBlacklistFilter
-
[
https://issues.apache.org/jira/browse/NUTCH-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1232:
-
Summary: Remove host field from index-basic (was: Remove host|site fields
from index-basic)
[
https://issues.apache.org/jira/browse/NUTCH-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1239:
-
Attachment: NUTCH-1239-1.5-1.patch
Patch for 1.5. Little review would be appreciated. I added a
[
https://issues.apache.org/jira/browse/NUTCH-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1238:
-
Priority: Trivial (was: Major)
Fetcher throughput threshold must start before feeder
[
https://issues.apache.org/jira/browse/NUTCH-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1238:
-
Patch Info: Patch Available
Fetcher throughput threshold must start before feeder finished
[
https://issues.apache.org/jira/browse/NUTCH-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1238:
-
Attachment: NUTCH-1238-1.5-1.patch
Patch for 1.5. The exceeding check is replaced by the new
[
https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1104:
-
Description:
Umbrella issue for tracking issues that should be ported from 1.x trunk to the
[
https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1230:
-
Priority: Blocker (was: Major)
Patch Info: Patch Available
Summary: MimeType API
[
https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1230:
-
Attachment: NUTCH-1230-1.5-2.patch
Patches for MimeUtil and some other classes. Everything works
[
https://issues.apache.org/jira/browse/NUTCH-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1230:
-
Attachment: NUTCH-1230-1.5-3.patch
I feel like a fool sometimes but its sorted now! All tests
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: NUTCH-1233-1.5-wip.patch
The boilerpipe code relies on an unavailable BP version.
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Description:
Fetcher improvements to parse and follow outlinks up to a specified depth. The
[
https://issues.apache.org/jira/browse/NUTCH-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1222:
-
Assignee: Markus Jelsma
Summary: Upgrade to new Hadoop 0.22.0 (was: Upgrade to newer Hadoop
[
https://issues.apache.org/jira/browse/NUTCH-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1222:
-
Patch Info: Patch Available
Upgrade to new Hadoop 0.22.0
[
https://issues.apache.org/jira/browse/NUTCH-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1222:
-
Attachment: NUTCH-1222-1.5-1.patch
Ivy patch. Everything is fine! Make sure to do ant clean or
[
https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1225:
-
Patch Info: Patch Available
Assignee: Markus Jelsma
Migrate CrawlDBScanner to
[
https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1225:
-
Attachment: NUTCH-1225-1.5-1.patch
Patch for 1.5. This is only compatible with Hadoop 0.21 or
[
https://issues.apache.org/jira/browse/NUTCH-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1225:
-
Attachment: NUTCH-1225-1.5-2.patch
New patch uses proper value iteration in reducer.
Old API:
[
https://issues.apache.org/jira/browse/NUTCH-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1226:
-
Attachment: NUTCH-1226-1.5-1.patch
First crack! Had a lot of trouble with some deprecated stuff
[
https://issues.apache.org/jira/browse/NUTCH-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1226:
-
Description: Hadoop 0.21 only!
Patch Info: Patch Available
Migrate CrawlDbReader to
[
https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1219:
-
Description:
We should upgrade to the new Hadoop API for Nutch trunk as already has been
done
[
https://issues.apache.org/jira/browse/NUTCH-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1221:
-
Patch Info: Patch Available
Migrate DomainStatistics to MapReduce API
[
https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1219:
-
Description:
We should upgrade to the new Hadoop API for Nutch trunk as already has been
done
[
https://issues.apache.org/jira/browse/NUTCH-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1214:
-
Patch Info: Patch Available
DomainStats tool should be named for what it's doing
[
https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1104:
-
Description:
Umbrella issue for tracking issues that should be ported from 1.x trunk to the
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-9-ParseOutputFormat.patch
Patch fixes issue described in NUTCH-1212.
[
https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1104:
-
Description:
Umbrella issue for tracking issues that should be ported from 1.x trunk to the
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-9.patch
New patch [9] solves an issue of NPE in filtering. It's now
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: (was: NUTCH-1185-1.5-9.patch)
Fetcher to parse and follow Nth degree
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-9.patch
Fetcher to parse and follow Nth degree outlinks
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Description:
Fetcher improvements to parse and follow outlinks up to a specified depth. The
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-6.patch
New patch includes all involved files:
* ParseData
*
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-7.patch
This patch refactors filtering and parsing of outlinks to a
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Description:
Fetcher improvements to parse and follow outlinks up to a specified depth. The
[
https://issues.apache.org/jira/browse/NUTCH-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1203:
-
Attachment: NUTCH-1203-1.5-1.patch
ParseSegment to list ms per record
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-5-ParseData.patch
Patch for ParseData was missing. This now has a
[
https://issues.apache.org/jira/browse/NUTCH-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1171:
-
Fix Version/s: (was: 1.4)
WebGraph to overwrite normalized input keys
[
https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1153:
-
Attachment: NUTCH-1153-1.5-2.patch
Final patch also disabled writing of _SUCCESS files by recent
[
https://issues.apache.org/jira/browse/NUTCH-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1155:
-
Attachment: NUTCH-1155-1.5-1.patch
simple patch
Host/domain limit in generator
[
https://issues.apache.org/jira/browse/NUTCH-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1173:
-
Attachment: NUTCH-1173-1.5-1.patch
Simple patch.
DomainStats doesn't count
[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1098:
-
Attachment: patch-with-utf8-encoding.diff
Restored original patch.
better
[
https://issues.apache.org/jira/browse/NUTCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1140:
-
Fix Version/s: 1.5
index-more plugin, resetTitle method creates multiple values in the
[
https://issues.apache.org/jira/browse/NUTCH-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-828:
Due Date: 9/Jun/10 (was: 9/Jun/10)
Fix Version/s: 1.5
Fetch Filter
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-5.patch
New patch adds fetcher.follow.outlinks.num.links setting that
[
https://issues.apache.org/jira/browse/NUTCH-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1193:
-
Priority: Trivial (was: Major)
Fix Version/s: 1.5
Thank you for reporting. This is
[
https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1104:
-
Description:
Umbrella issue for tracking issues that should be ported from 1.x trunk to the
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-3.patch
New patch fixes the todo's and incorporates NUTCH-1174.
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-4.patch
New patch does not initialize maxOutlinkDepth in fetcher.
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-2.patch
New patch uses HashSet to deduplicate the outlinks.
Todo:
*
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1184-1.5-1.patch
Here's a first attempt, it introduces a new configuration
[
https://issues.apache.org/jira/browse/NUTCH-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1178:
-
Attachment: NUTCH-1178-1.5-1.patch
Patch adding a new distinct retry interval field.
[
https://issues.apache.org/jira/browse/NUTCH-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1180:
-
Description:
Nutch currently replaces an existing CrawlDB with the new CrawlDB. By
optionally
[
https://issues.apache.org/jira/browse/NUTCH-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1177:
-
Attachment: NUTCH-1177-1.5-1.patch
Patch for trunk.
Generator to select on
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Attachment: NUTCH-1142-1.5-3.patch
New patch with the ability to normalize and filter existing
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Attachment: NUTCH-1142-1.5-2.patch
New patch also filters collected outlinks instead of just map
[
https://issues.apache.org/jira/browse/NUTCH-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1151:
-
Attachment: NUTCH-1151-1.5-1.patch
Patch for trunk. Adds configuration directive to
[
https://issues.apache.org/jira/browse/NUTCH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1153:
-
Attachment: NUTCH-1153-1.5-1.patch
Patch for trunk.
LinkRank must not log all
[
https://issues.apache.org/jira/browse/NUTCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1139:
-
Fix Version/s: (was: 1.4)
1.5
Needs proper testing, pass to 1.5
[
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1024:
-
Fix Version/s: (was: 1.4)
1.5
Dynamically set fetchInterval by
[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-965:
Fix Version/s: (was: 1.4)
1.5
Skip parsing for truncated documents
[
https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1147:
-
Patch Info: Patch Available
WebGraph nodeDumper uses only 1 reducer
[
https://issues.apache.org/jira/browse/NUTCH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1147:
-
Attachment: NUTCH-1147-1.5-1.patch
Patch for trunk.
WebGraph nodeDumper uses
[
https://issues.apache.org/jira/browse/NUTCH-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1150:
-
Summary: http.redirect.max can lead to multiple parses of the same url
(was: http.redirect.max
[
https://issues.apache.org/jira/browse/NUTCH-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1144:
-
Fix Version/s: (was: 1.5)
Filtering optional in WebGraph
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Description:
The WebGraph programs performs URL normalization. Since normalization of
outlinks
[
https://issues.apache.org/jira/browse/NUTCH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-717:
Fix Version/s: (was: 1.4)
1.5
Make Nutch Solr integration easier
[
https://issues.apache.org/jira/browse/NUTCH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1061:
-
Fix Version/s: (was: 1.4)
(was: nutchgora)
1.5
[
https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1084:
-
Affects Version/s: (was: 1.4)
1.3
Fix Version/s: (was:
101 - 200 of 216 matches
Mail list logo