Ah, so you implemented your own depth limited search algorithm? Sounds 
exciting. The stats look interesting. Looks like you might try to do a bit 
of parsing on the HTML for some of these stuff, right? You're staying away 
from regex I hope.

Ultimately - what's the point of this search engine?

On Tuesday, June 26, 2012 9:31:06 PM UTC+12, Nick wrote:
>
> Hi there,
>
> The search engine used is Sphinx, its pretty quick and had the options I 
> wanted.
> The crawl is custom written, its something that started small and just got 
> bigger, I like crawling :) I did look at nutch, but as I was wanting to 
> stay small it just felt to big. The goal is to provide a searchable index 
> of nz domains. Some people like stats,, how many sites are running x, or 
> using y js framework. This site helps to provide those answers.
>
> Check out the blog post - http://www.crawl.co.nz/blog/stats-first-crawl
>
> Cheers
>
>
>
>
> On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:
>>
>> Interesting. Are you using a FOSS search engine like Lucene? What about 
>> for crawling? Also, what's the goal of this site?
>>
>> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:
>>>
>>> Evening (or morning to be precise),
>>>
>>> Just wanted to share something that I've spent a little bit of time on, 
>>> I got it up the other week and finally thought I'd let other people see it.
>>>
>>> http://www.crawl.co.nz/
>>>
>>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've 
>>> come up with around a third, which isn't to bad. I've put up an index of 
>>> the crawl here, unlike search engines you can search the html here, so you 
>>> can see which sites running jquery, or some other piece of js. Can also do 
>>> ands and ors, search ips and headers.
>>>
>>> Have a look :) Working on getting another 20-30% on the next crawl.
>>>
>>> Any questions, just ask.
>>>
>>
> On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:
>>
>> Interesting. Are you using a FOSS search engine like Lucene? What about 
>> for crawling? Also, what's the goal of this site?
>>
>> On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:
>>>
>>> Evening (or morning to be precise),
>>>
>>> Just wanted to share something that I've spent a little bit of time on, 
>>> I got it up the other week and finally thought I'd let other people see it.
>>>
>>> http://www.crawl.co.nz/
>>>
>>> Basically, I'm trying to get all the NZ domains, my first "crawl" i've 
>>> come up with around a third, which isn't to bad. I've put up an index of 
>>> the crawl here, unlike search engines you can search the html here, so you 
>>> can see which sites running jquery, or some other piece of js. Can also do 
>>> ands and ors, search ips and headers.
>>>
>>> Have a look :) Working on getting another 20-30% on the next crawl.
>>>
>>> Any questions, just ask.
>>>
>>

-- 
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]

Reply via email to