...@gmail.com]
Verzonden: woensdag 2 mei 2012 2:21
Aan: user@nutch.apache.org
Onderwerp: Re: Crawl sites with hashtags in url
Hi Roberto,
If you're having an invalid URI error, then this might probably help you:
http://lucene.472066.n3.nabble.com/Invalid-uri-td3742047.html
Remi
On Tue, May 1, 2012 at 7
Hi,
Have any of you has worked on crawling https sites with certificate.pls let
me know
--
View this message in context:
http://lucene.472066.n3.nabble.com/Re-Crawl-sites-with-hashtags-in-url-tp3954098p3968209.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hello,
Im currently trying to crawl a site which uses hashtags in the urls. I dont
seem to get any results and Im hoping im just overlooking something.
I have created a JIRA bug report because I was not aware of the existence of
this mailing list. Its my first time using such channels so i
Hi,
URL's are passed through a series of normalizers. By default both the
RegexNormalizer and the BasicNormalizer affect URL's with anchors, the latter
removes it completely and is not configurable.
You can either hack your way through it by simply disabling the removal of the
page reference
: Markus Jelsma (JIRA) [mailto:j...@apache.org]
Verzonden: dinsdag 1 mei 2012 13:40
Aan: r.garden...@simgroep.nl
Onderwerp: [jira] [Closed] (NUTCH-1343) Crawl sites with hashtags in url
[
https://issues.apache.org/jira/browse/NUTCH-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all
Hi Roberto,
as defined in ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt the
hash ('#') is used to separate the fragment from the rest of the URL.
The RFC explicitly delegates the semantics of the fragment to the media
type of the document. In good old HTML the fragment is just an anchor
and
Hi Roberto,
If you're having an invalid URI error, then this might probably help you:
http://lucene.472066.n3.nabble.com/Invalid-uri-td3742047.html
Remi
On Tue, May 1, 2012 at 7:25 PM, Roberto Gardenier
r.garden...@simgroep.nlwrote:
Hello,
Im currently trying to crawl a site which uses
7 matches
Mail list logo