On Jul 6, 2011, at 1:37pm, Julien Nioche wrote:

> [cc to crawler-commons list]
> 
> I wasn't part of the initial discussion so I don't know what the arguments
> for / against were.
> I suppose it depends partially on user adoption. The project has had a slow
> start but with this initial release it should gain a bit of traction. The
> license is already Apache 2.0. We'll see how it goes, but as long as it
> thrives I don't really mind were it lives

See the "Hosting Options" section on this page:

http://wiki.apache.org/nutch/ApacheConUs2009MeetUp

-- Ken

> On 6 July 2011 21:15, Markus Jelsma <[email protected]> wrote:
> 
>> Impressive! Are you guys going for the ASF incubator?
>> 
>>> [Apologies for cross-posting]
>>> 
>>> The initial release of crawler-commons is available from :
>>> http://code.google.com/p/crawler-commons/downloads/list
>>> 
>>> The purpose of this project is to develop a set of reusable Java
>> components
>>> that implement functionality common to any web crawler. These components
>>> would benefit from collaboration among various existing web crawler
>>> projects, and reduce duplication of effort.
>>> The current version contains resources for :
>>> - parsing robots.txt
>>> - parsing sitemaps
>>> - URL analyzer which returns Top Level Domains
>>> - a simple HttpFetcher
>>> 
>>> This release is available on Sonatype's OSS Nexus repository [
>>> 
>> https://oss.sonatype.org/content/repositories/releases/com/google/code/craw
>>> ler-commons/] and should be available on Maven Central soon.
>>> 
>>> Please send your questions, comments or suggestions to
>>> http://groups.google.com/group/crawler-commons
>>> 
>>> Best regards,
>>> 
>>> Julien

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions






Reply via email to