Ah, so you implemented your own depth limited search algorithm? Sounds exciting. The stats look interesting. Looks like you might try to do a bit of parsing on the HTML for some of these stuff, right? You're staying away from regex I hope.
Ultimately - what's the point of this search engine? On Tuesday, June 26, 2012 9:31:06 PM UTC+12, Nick wrote: > > Hi there, > > The search engine used is Sphinx, its pretty quick and had the options I > wanted. > The crawl is custom written, its something that started small and just got > bigger, I like crawling :) I did look at nutch, but as I was wanting to > stay small it just felt to big. The goal is to provide a searchable index > of nz domains. Some people like stats,, how many sites are running x, or > using y js framework. This site helps to provide those answers. > > Check out the blog post - http://www.crawl.co.nz/blog/stats-first-crawl > > Cheers > > > > > On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote: >> >> Interesting. Are you using a FOSS search engine like Lucene? What about >> for crawling? Also, what's the goal of this site? >> >> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote: >>> >>> Evening (or morning to be precise), >>> >>> Just wanted to share something that I've spent a little bit of time on, >>> I got it up the other week and finally thought I'd let other people see it. >>> >>> http://www.crawl.co.nz/ >>> >>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've >>> come up with around a third, which isn't to bad. I've put up an index of >>> the crawl here, unlike search engines you can search the html here, so you >>> can see which sites running jquery, or some other piece of js. Can also do >>> ands and ors, search ips and headers. >>> >>> Have a look :) Working on getting another 20-30% on the next crawl. >>> >>> Any questions, just ask. >>> >> > On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote: >> >> Interesting. Are you using a FOSS search engine like Lucene? What about >> for crawling? Also, what's the goal of this site? >> >> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote: >>> >>> Evening (or morning to be precise), >>> >>> Just wanted to share something that I've spent a little bit of time on, >>> I got it up the other week and finally thought I'd let other people see it. >>> >>> http://www.crawl.co.nz/ >>> >>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've >>> come up with around a third, which isn't to bad. I've put up an index of >>> the crawl here, unlike search engines you can search the html here, so you >>> can see which sites running jquery, or some other piece of js. Can also do >>> ands and ors, search ips and headers. >>> >>> Have a look :) Working on getting another 20-30% on the next crawl. >>> >>> Any questions, just ask. >>> >> -- NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected]
