[cc to crawler-commons list] I wasn't part of the initial discussion so I don't know what the arguments for / against were. I suppose it depends partially on user adoption. The project has had a slow start but with this initial release it should gain a bit of traction. The license is already Apache 2.0. We'll see how it goes, but as long as it thrives I don't really mind were it lives
Julien On 6 July 2011 21:15, Markus Jelsma <[email protected]> wrote: > Impressive! Are you guys going for the ASF incubator? > > > [Apologies for cross-posting] > > > > The initial release of crawler-commons is available from : > > http://code.google.com/p/crawler-commons/downloads/list > > > > The purpose of this project is to develop a set of reusable Java > components > > that implement functionality common to any web crawler. These components > > would benefit from collaboration among various existing web crawler > > projects, and reduce duplication of effort. > > The current version contains resources for : > > - parsing robots.txt > > - parsing sitemaps > > - URL analyzer which returns Top Level Domains > > - a simple HttpFetcher > > > > This release is available on Sonatype's OSS Nexus repository [ > > > https://oss.sonatype.org/content/repositories/releases/com/google/code/craw > > ler-commons/] and should be available on Maven Central soon. > > > > Please send your questions, comments or suggestions to > > http://groups.google.com/group/crawler-commons > > > > Best regards, > > > > Julien > > > > -- > > > > Open Source Solutions for Text Engineering > > > > http://digitalpebble.blogspot.com/ > > http://www.digitalpebble.com > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

