Have a look at Neo4j graph database (with neo4jphp wrapper lib)
It would allow building a graph of .nz domains based on crosslinks,
also you can put there all other related into like servers and so on.
A bit of learning curve, especially on how to do queries however a fun thing to play with.

Cheers,
Alexei


Nick wrote:
Hi there,

The search engine used is Sphinx, its pretty quick and had the options I
wanted.
The crawl is custom written, its something that started small and just
got bigger, I like crawling :) I did look at nutch, but as I was wanting
to stay small it just felt to big. The goal is to provide a searchable
index of nz domains. Some people like stats,, how many sites are running
x, or using y js framework. This site helps to provide those answers.

Check out the blog post - http://www.crawl.co.nz/blog/stats-first-crawl

Cheers




On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:

    Interesting. Are you using a FOSS search engine like Lucene? What
    about for crawling? Also, what's the goal of this site?

    On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:

        Evening (or morning to be precise),

        Just wanted to share something that I've spent a little bit of
        time on, I got it up the other week and finally thought I'd let
        other people see it.

        http://www.crawl.co.nz/

        Basically, I'm trying to get all the NZ domains, my first
        "crawl" i've come up with around a third, which isn't to bad.
        I've put up an index of the crawl here, unlike search engines
        you can search the html here, so you can see which sites running
        jquery, or some other piece of js. Can also do ands and ors,
        search ips and headers.

        Have a look :) Working on getting another 20-30% on the next crawl.

        Any questions, just ask.


On Monday, June 25, 2012 9:34:07 PM UTC+12, .Net2Php wrote:

    Interesting. Are you using a FOSS search engine like Lucene? What
    about for crawling? Also, what's the goal of this site?

    On Saturday, June 23, 2012 12:48:27 AM UTC+12, Nick wrote:

        Evening (or morning to be precise),

        Just wanted to share something that I've spent a little bit of
        time on, I got it up the other week and finally thought I'd let
        other people see it.

        http://www.crawl.co.nz/

        Basically, I'm trying to get all the NZ domains, my first
        "crawl" i've come up with around a third, which isn't to bad.
        I've put up an index of the crawl here, unlike search engines
        you can search the html here, so you can see which sites running
        jquery, or some other piece of js. Can also do ands and ors,
        search ips and headers.

        Have a look :) Working on getting another 20-30% on the next crawl.

        Any questions, just ask.

--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]

--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]

Reply via email to