On Jul 6, 2011, at 1:37pm, Julien Nioche wrote: > [cc to crawler-commons list] > > I wasn't part of the initial discussion so I don't know what the arguments > for / against were. > I suppose it depends partially on user adoption. The project has had a slow > start but with this initial release it should gain a bit of traction. The > license is already Apache 2.0. We'll see how it goes, but as long as it > thrives I don't really mind were it lives
See the "Hosting Options" section on this page: http://wiki.apache.org/nutch/ApacheConUs2009MeetUp -- Ken > On 6 July 2011 21:15, Markus Jelsma <[email protected]> wrote: > >> Impressive! Are you guys going for the ASF incubator? >> >>> [Apologies for cross-posting] >>> >>> The initial release of crawler-commons is available from : >>> http://code.google.com/p/crawler-commons/downloads/list >>> >>> The purpose of this project is to develop a set of reusable Java >> components >>> that implement functionality common to any web crawler. These components >>> would benefit from collaboration among various existing web crawler >>> projects, and reduce duplication of effort. >>> The current version contains resources for : >>> - parsing robots.txt >>> - parsing sitemaps >>> - URL analyzer which returns Top Level Domains >>> - a simple HttpFetcher >>> >>> This release is available on Sonatype's OSS Nexus repository [ >>> >> https://oss.sonatype.org/content/repositories/releases/com/google/code/craw >>> ler-commons/] and should be available on Maven Central soon. >>> >>> Please send your questions, comments or suggestions to >>> http://groups.google.com/group/crawler-commons >>> >>> Best regards, >>> >>> Julien -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions

