Impressive! Are you guys going for the ASF incubator?
> [Apologies for cross-posting] > > The initial release of crawler-commons is available from : > http://code.google.com/p/crawler-commons/downloads/list > > The purpose of this project is to develop a set of reusable Java components > that implement functionality common to any web crawler. These components > would benefit from collaboration among various existing web crawler > projects, and reduce duplication of effort. > The current version contains resources for : > - parsing robots.txt > - parsing sitemaps > - URL analyzer which returns Top Level Domains > - a simple HttpFetcher > > This release is available on Sonatype's OSS Nexus repository [ > https://oss.sonatype.org/content/repositories/releases/com/google/code/craw > ler-commons/] and should be available on Maven Central soon. > > Please send your questions, comments or suggestions to > http://groups.google.com/group/crawler-commons > > Best regards, > > Julien > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com

