Hi there,

The search engine used is Sphinx, its pretty quick and had the options I 
wanted.
The crawl is custom written, its something that started small and just got 
bigger, I like crawling :) I did look at nutch, but as I was wanting to 
stay small it just felt to big. The goal is to provide a searchable index 
of nz domains. Some people like stats,, how many sites are running x, or 
using y js framework. This site helps to provide those answers.

Check out the blog post - http://www.crawl.co.nz/blog/stats-first-crawl

Cheers




On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:
>
> Interesting. Are you using a FOSS search engine like Lucene? What about 
> for crawling? Also, what's the goal of this site?
>
> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:
>>
>> Evening (or morning to be precise),
>>
>> Just wanted to share something that I've spent a little bit of time on, I 
>> got it up the other week and finally thought I'd let other people see it.
>>
>> http://www.crawl.co.nz/
>>
>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've 
>> come up with around a third, which isn't to bad. I've put up an index of 
>> the crawl here, unlike search engines you can search the html here, so you 
>> can see which sites running jquery, or some other piece of js. Can also do 
>> ands and ors, search ips and headers.
>>
>> Have a look :) Working on getting another 20-30% on the next crawl.
>>
>> Any questions, just ask.
>>
>
On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:
>
> Interesting. Are you using a FOSS search engine like Lucene? What about 
> for crawling? Also, what's the goal of this site?
>
> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:
>>
>> Evening (or morning to be precise),
>>
>> Just wanted to share something that I've spent a little bit of time on, I 
>> got it up the other week and finally thought I'd let other people see it.
>>
>> http://www.crawl.co.nz/
>>
>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've 
>> come up with around a third, which isn't to bad. I've put up an index of 
>> the crawl here, unlike search engines you can search the html here, so you 
>> can see which sites running jquery, or some other piece of js. Can also do 
>> ands and ors, search ips and headers.
>>
>> Have a look :) Working on getting another 20-30% on the next crawl.
>>
>> Any questions, just ask.
>>
>

-- 
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]

Reply via email to