The protocol plugins seem to be the right starting point. But here and at other places like the Fetcher I see that pages are basically needing the java.net.URL. Actually only for splitting the url in host,port, path.... So we only need the URLStreamHandler in the protocol plugins.
Using something like the javax.mail.URLName would leave the necessary StreamHandler in the protocol plugins and it would be easily possible to make plugins for what ever protocol I like to implement.
Any ideas for building protocol plugins not using the java.net.URL ?
You are correct that java.net.URL is used only to parse URLs into protocol, host, port, file, etc. So we could indeed use a different class that only supports this. Two questions:
1. Why should we replace it? What is the problem with java.net.URL? Does it reject unknown protocols? If so, that would be a good reason.
2. What should we replace it with? I would opt for java.net.URI over javax.mail.URLName. It seems well suited to our purposes and is included in the base JVM. However I recall trying to use it when initially writing Nutch's URLNormalizer and found it deficient. But, if URL does not permit us to use arbitrary protocol names, then perhaps we should revisit this, and work-around these deficiencies.
Would someone like to try replacing URL with URI globally, and seeing what works and what fails?
Doug
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers