Hi there, The search engine used is Sphinx, its pretty quick and had the options I wanted. The crawl is custom written, its something that started small and just got bigger, I like crawling :) I did look at nutch, but as I was wanting to stay small it just felt to big. The goal is to provide a searchable index of nz domains. Some people like stats,, how many sites are running x, or using y js framework. This site helps to provide those answers.
Check out the blog post - http://www.crawl.co.nz/blog/stats-first-crawl Cheers On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote: > > Interesting. Are you using a FOSS search engine like Lucene? What about > for crawling? Also, what's the goal of this site? > > On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote: >> >> Evening (or morning to be precise), >> >> Just wanted to share something that I've spent a little bit of time on, I >> got it up the other week and finally thought I'd let other people see it. >> >> http://www.crawl.co.nz/ >> >> Basically, I'm trying to get all the NZ domains, my first "crawl" i've >> come up with around a third, which isn't to bad. I've put up an index of >> the crawl here, unlike search engines you can search the html here, so you >> can see which sites running jquery, or some other piece of js. Can also do >> ands and ors, search ips and headers. >> >> Have a look :) Working on getting another 20-30% on the next crawl. >> >> Any questions, just ask. >> > On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote: > > Interesting. Are you using a FOSS search engine like Lucene? What about > for crawling? Also, what's the goal of this site? > > On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote: >> >> Evening (or morning to be precise), >> >> Just wanted to share something that I've spent a little bit of time on, I >> got it up the other week and finally thought I'd let other people see it. >> >> http://www.crawl.co.nz/ >> >> Basically, I'm trying to get all the NZ domains, my first "crawl" i've >> come up with around a third, which isn't to bad. I've put up an index of >> the crawl here, unlike search engines you can search the html here, so you >> can see which sites running jquery, or some other piece of js. Can also do >> ands and ors, search ips and headers. >> >> Have a look :) Working on getting another 20-30% on the next crawl. >> >> Any questions, just ask. >> > -- NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected]
