Julien Nioche created NUTCH-3025:
Summary: urlfilter-fast to filter based on the length of the URL
Key: NUTCH-3025
URL: https://issues.apache.org/jira/browse/NUTCH-3025
Project: Nutch
Issue
Julien Nioche created NUTCH-3017:
Summary: Allow fast-urlfilter to load from HDFS/S3 and support
gzipped input
Key: NUTCH-3017
URL: https://issues.apache.org/jira/browse/NUTCH-3017
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643035#comment-16643035
]
Julien Nioche commented on NUTCH-2648:
--
[~wastl-nagel]
?? (code borrowed
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2046.
--
Resolution: Fixed
Assignee: Julien Nioche (was: Lewis John McGibbney)
> The crawl script
[
https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-1371.
Resolution: Duplicate
> Replace Ivy with Maven Ant tasks
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890043#comment-15890043
]
Julien Nioche commented on NUTCH-2363:
--
Got it! Thanks for the explanation [~markus17]! Had missed
[
https://issues.apache.org/jira/browse/NUTCH-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1531.
--
Resolution: Duplicate
No follow up on this one + same functionality discussed elsewhere
> URL
[
https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549206#comment-15549206
]
Julien Nioche commented on NUTCH-2320:
--
Hi @markus17, you haven't left much time for people to
[
https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359504#comment-15359504
]
Julien Nioche commented on NUTCH-1371:
--
None whatsoever [~lewismc]. Maybe mark it as duplicate and
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142863#comment-15142863
]
Julien Nioche commented on NUTCH-2046:
--
I agree with the objective but I'd rather have a consistent
[
https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reopened NUTCH-2213:
--
Assignee: Julien Nioche
The WARC Export actually has the same issue as its CommonCrawl
[
https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140608#comment-15140608
]
Julien Nioche edited comment on NUTCH-2213 at 2/10/16 10:36 AM:
Hi Joris
[
https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113021#comment-15113021
]
Julien Nioche commented on NUTCH-2204:
--
+1
> remove junit lib from runtime
>
[
https://issues.apache.org/jira/browse/NUTCH-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033491#comment-15033491
]
Julien Nioche commented on NUTCH-2177:
--
Do you mean 'mapreduce.framework.name' ?
> Generator
[
https://issues.apache.org/jira/browse/NUTCH-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2177:
-
Attachment: NUTCH-2177.patch
> Generator produces only one partition even in distributed mode
>
[
https://issues.apache.org/jira/browse/NUTCH-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033491#comment-15033491
]
Julien Nioche edited comment on NUTCH-2177 at 12/1/15 11:43 AM:
Do you
[
https://issues.apache.org/jira/browse/NUTCH-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2177.
--
Resolution: Fixed
Committed revision 1717412.
Thanks [~wastl-nagel] and [~markus17]
>
Julien Nioche created NUTCH-2177:
Summary: Generator produces only one partition even in distributed
mode
Key: NUTCH-2177
URL: https://issues.apache.org/jira/browse/NUTCH-2177
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029037#comment-15029037
]
Julien Nioche commented on NUTCH-2177:
--
I am on
Hadoop version: 2.4.0-amzn-7
not clear which
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018232#comment-15018232
]
Julien Nioche commented on NUTCH-2069:
--
no probs. Would be good to find a way to format based on the
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2069.
--
Resolution: Fixed
Trunk committed revision 1715386.
Thanks everyone for comments and reviews
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-2069.
> Ignore external links based on domain
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2069:
-
Attachment: NUTCH-2069.v2.patch
new patch introducing 'db.ignore.external.links.mode'
this is
[
https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998467#comment-14998467
]
Julien Nioche commented on NUTCH-2064:
--
FYI have ported the code to Crawler-Commons
[
https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2064.
--
Resolution: Fixed
Fix Version/s: (was: 1.12)
1.11
Trunk :
[
https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-2158:
Assignee: Julien Nioche (was: Chris A. Mattmann)
> Upgrade to Tika 1.11
>
[
https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2158:
-
Attachment: NUTCH-2158.patch
Patch which upgrades to Tika 1.11
tests fail for protocol-http
[
https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943757#comment-14943757
]
Julien Nioche commented on NUTCH-2132:
--
Looking at it from a slightly different angle, couldn't you
[
https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943856#comment-14943856
]
Julien Nioche commented on NUTCH-2132:
--
bq. but that locks us into using Kibana, etc. Ideally one
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939503#comment-14939503
]
Julien Nioche commented on NUTCH-2129:
--
I'd rather keep it simple and not modify the CrawlDatum so
[
https://issues.apache.org/jira/browse/NUTCH-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902651#comment-14902651
]
Julien Nioche commented on NUTCH-2095:
--
Thanks [~jorgelbg]. Please add a line to CHANGES.txt to
[
https://issues.apache.org/jira/browse/NUTCH-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902715#comment-14902715
]
Julien Nioche commented on NUTCH-2095:
--
See [https://issues.apache.org/jira/browse/HADOOP-10961].
[
https://issues.apache.org/jira/browse/NUTCH-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902578#comment-14902578
]
Julien Nioche commented on NUTCH-2095:
--
[~jorgelbg] could you please fix the test. See below
{code}
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2102.
--
Resolution: Fixed
Committed revision 1704634.
Thanks for the reviews
> WARC Exporter
>
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Fix Version/s: 1.11
> WARC Exporter
> -
>
> Key: NUTCH-2102
>
[
https://issues.apache.org/jira/browse/NUTCH-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-2114.
Resolution: Invalid
> kkk
> ---
>
> Key: NUTCH-2114
> URL:
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747300#comment-14747300
]
Julien Nioche commented on NUTCH-2102:
--
The only modification to existing code is in the class
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Description:
This patch adds a WARC exporter
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747301#comment-14747301
]
Julien Nioche commented on NUTCH-2102:
--
Please review
> WARC Exporter
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747327#comment-14747327
]
Julien Nioche edited comment on NUTCH-2102 at 9/16/15 11:21 AM:
Hi Markus
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Description:
This patch adds a WARC exporter
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Attachment: (was: NUTCH-2102.patch)
> WARC Exporter
> -
>
> Key:
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747327#comment-14747327
]
Julien Nioche commented on NUTCH-2102:
--
Hi Markus
> I believe this warc format is the updated arc
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Attachment: NUTCH-2102.patch
> WARC Exporter
> -
>
> Key: NUTCH-2102
Julien Nioche created NUTCH-2102:
Summary: WARC Exporter
Key: NUTCH-2102
URL: https://issues.apache.org/jira/browse/NUTCH-2102
Project: Nutch
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/NUTCH-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2102:
-
Attachment: NUTCH-2102.patch
> WARC Exporter
> -
>
> Key: NUTCH-2102
[
https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744078#comment-14744078
]
Julien Nioche commented on NUTCH-2064:
--
yep, can discuss that post 1.11
> URLNormalizer basic to
[
https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731114#comment-14731114
]
Julien Nioche commented on NUTCH-2064:
--
What about moving the basic URL normalizer to
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1517.
--
Resolution: Fixed
trunk committed revision 1697911.
Thanks for comments and review
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712988#comment-14712988
]
Julien Nioche commented on NUTCH-1517:
--
Thanks [~jorgelbg]. Will commit soon unless
[
https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2049.
--
Resolution: Fixed
Committed revision 1697466.
Thanks to everyone involved.
Upgrade Trunk to
[
https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706402#comment-14706402
]
Julien Nioche commented on NUTCH-2049:
--
Fantastic work [~lewismc]! I think this is
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1517:
-
Attachment: (was: NUTCH-1517.patch)
CloudSearch indexer
---
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1517:
-
Flags: Patch
CloudSearch indexer
---
Key: NUTCH-1517
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1517:
-
Attachment: NUTCH-1517.patch
New implementation of the CloudSearchIndexWriter, uses the latest
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647467#comment-14647467
]
Julien Nioche commented on NUTCH-2069:
--
Hi [~wastl-nagel] and [~markus17]. BTW did
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646543#comment-14646543
]
Julien Nioche commented on NUTCH-2069:
--
What code restyle? I applied the formatting
Julien Nioche created NUTCH-2069:
Summary: Ignore external links based on domain
Key: NUTCH-2069
URL: https://issues.apache.org/jira/browse/NUTCH-2069
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2069:
-
Attachment: NUTCH-2069.patch
Ignore external links based on domain
[
https://issues.apache.org/jira/browse/NUTCH-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2069:
-
Patch Info: Patch Available
Ignore external links based on domain
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640138#comment-14640138
]
Julien Nioche commented on NUTCH-2048:
--
howto_upgrade_tika.txt has been around for 2
[
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-1517:
Assignee: Julien Nioche
CloudSearch indexer
---
Key:
[
https://issues.apache.org/jira/browse/NUTCH-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600946#comment-14600946
]
Julien Nioche commented on NUTCH-2016:
--
+1
Remove OldFetcher from trunk
[
https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2036:
-
Affects Version/s: (was: 1.11)
Adding some continuous crawl goodies to the crawl script
[
https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600949#comment-14600949
]
Julien Nioche commented on NUTCH-2036:
--
Any thoughts on this? This is useful and
[
https://issues.apache.org/jira/browse/NUTCH-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2036:
-
Fix Version/s: 1.11
Adding some continuous crawl goodies to the crawl script
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599840#comment-14599840
]
Julien Nioche commented on NUTCH-2046:
--
re-script : what about a positive parameter
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589951#comment-14589951
]
Julien Nioche commented on NUTCH-2000:
--
Hi Seb, +1 to commit. Not sure I'll be able
[
https://issues.apache.org/jira/browse/NUTCH-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-2006.
--
Resolution: Fixed
Fix Version/s: 1.11
Committed revision 1679567.
Thanks Seb
[
https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545534#comment-14545534
]
Julien Nioche commented on NUTCH-2012:
--
+1 to merging them into a more generic tool.
[
https://issues.apache.org/jira/browse/NUTCH-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541843#comment-14541843
]
Julien Nioche commented on NUTCH-2008:
--
Makes total sense. +1
Could also make it
Julien Nioche created NUTCH-2006:
Summary: IndexingFiltersChecker to take custom metadata as input
Key: NUTCH-2006
URL: https://issues.apache.org/jira/browse/NUTCH-2006
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2006:
-
Attachment: NUTCH-2006.patch
Patch which allows to take custom metadata into account + improved
[
https://issues.apache.org/jira/browse/NUTCH-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2006:
-
Patch Info: Patch Available
IndexingFiltersChecker to take custom metadata as input
[
https://issues.apache.org/jira/browse/NUTCH-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1999:
-
Assignee: (was: Julien Nioche)
Add http://nutch.apache.org/robots.txt
[
https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2002:
-
Attachment: NUTCH-2002.patch
ParserChecker to check robots.txt
Julien Nioche created NUTCH-2002:
Summary: ParserChecker to check robots.txt
Key: NUTCH-2002
URL: https://issues.apache.org/jira/browse/NUTCH-2002
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2000:
-
Fix Version/s: (was: 1.10)
1.11
Link inversion fails with .locked already
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510629#comment-14510629
]
Julien Nioche commented on NUTCH-2000:
--
Lewis - could be, need to investigate but
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509898#comment-14509898
]
Julien Nioche commented on NUTCH-2000:
--
[~lewismc] reverted to 1.10 as this is a
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2000:
-
Priority: Blocker (was: Major)
Link inversion fails with .locked already exists.
[
https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-2000:
-
Fix Version/s: (was: 1.11)
1.10
Link inversion fails with .locked already
Julien Nioche created NUTCH-1999:
Summary: Add http://nutch.apache.org/robots.txt
Key: NUTCH-1999
URL: https://issues.apache.org/jira/browse/NUTCH-1999
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-1999:
Assignee: Julien Nioche
Add http://nutch.apache.org/robots.txt
Julien Nioche created NUTCH-2000:
Summary: Link inversion fails with .locked already exists.
Key: NUTCH-2000
URL: https://issues.apache.org/jira/browse/NUTCH-2000
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506745#comment-14506745
]
Julien Nioche commented on NUTCH-1990:
--
bq. lot of garbage
yep, that's what the
[
https://issues.apache.org/jira/browse/NUTCH-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1990.
--
Resolution: Fixed
Fix Version/s: 1.10
Committed revision 1675305.
Use URI.normalise()
[
https://issues.apache.org/jira/browse/NUTCH-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503072#comment-14503072
]
Julien Nioche commented on NUTCH-1990:
--
Thanks [~wastl-nagel]!
I have extracted
Julien Nioche created NUTCH-1990:
Summary: Use URI.normalise() in BasicURLNormalizer
Key: NUTCH-1990
URL: https://issues.apache.org/jira/browse/NUTCH-1990
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375857#comment-14375857
]
Julien Nioche commented on NUTCH-1958:
--
I agree but I think there could be benefits
[
https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375696#comment-14375696
]
Julien Nioche commented on NUTCH-1958:
--
What would you suggest as a replacement?
[
https://issues.apache.org/jira/browse/NUTCH-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-1965.
Resolution: Fixed
WTF is this?
My
--
Key: NUTCH-1965
URL:
[
https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319992#comment-14319992
]
Julien Nioche commented on NUTCH-1942:
--
See
Julien Nioche created NUTCH-1942:
Summary: Remove TopLevelDomain
Key: NUTCH-1942
URL: https://issues.apache.org/jira/browse/NUTCH-1942
Project: Nutch
Issue Type: Task
Reporter:
[
https://issues.apache.org/jira/browse/NUTCH-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche closed NUTCH-1937.
Resolution: Invalid
Please use the mailing list to ask questions like these instead of filing bugs
[
https://issues.apache.org/jira/browse/NUTCH-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1889.
--
Resolution: Fixed
Committed revision 1655960.
Store all values from Tika metadata in Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296621#comment-14296621
]
Julien Nioche commented on NUTCH-1918:
--
Quite an important issue for those who
[
https://issues.apache.org/jira/browse/NUTCH-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296619#comment-14296619
]
Julien Nioche commented on NUTCH-1889:
--
This one is quite trivial, I'd like to see it
[
https://issues.apache.org/jira/browse/NUTCH-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1889:
-
Fix Version/s: (was: 1.11)
1.10
Store all values from Tika metadata in
[
https://issues.apache.org/jira/browse/NUTCH-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1918:
-
Fix Version/s: (was: 1.11)
1.10
TikaParser specifies a default namespace
1 - 100 of 1071 matches
Mail list logo