Hi All!
I hope this email isn't an intrusion. I'm looking for a consultant (or
consultants plural) to work on a Nutch project. I've sent this out to a few
of you but I wanted to make sure I got it out to everybody.
The client is interested in a "proof of concept." What this boils down to is
a prototype system that crawls terrorist sites with Nutch, creating a web
interface for searching and then adding some additional features like saved
searches, active hot spots and export to blog. I've included a rough outline
of functionality below:
- *Spider* – Visit a starting set of seed sites, provided by the
analyst, and follow links to other sites … index and store what it finds.
Most sites will be in the arabic language.
- *Geographical Tagging* – When geo.position,
DC.coverage.spatial, and/or ICBM information is available for a
web page the location is calculated and stored in the index.
- *Link-graph* – Each URL is ranked based on a number of
parameters like inbound links, etc.
- *Query capability* – Query the set of indexed web pages based
on keyword, geo-location, and Boolean triggers.
- *Saved query capability* – A query can be saved for easy
future viewing.
- *Hot spots* – A summary report showing which pages are
actively being linked to.
- *Hot Memes* – A summary report showing phraseology that is
currently active.
-
*Deliverable Work Flow for Analysts:*
- Log in to the Spider tool
- View daily summary reports, export a few hot sites to the blog
- Based on memes appearing in other media (television, radio,
print) do some queries to see how those memes are spreading
online, export
relevant sites to the blog
- Create a saved search in the Spider tool to watch progress of
memes
- Log in to Blog tool
- Review the list of sites that were just exported from the
Spider tool
- Make qualitative analysis, storing it as part of the
blog entry
- Evaluate site / page via I4 and other quantitative
methodologies
- Categorize site via datablogging fields (i.e. area = {Middle
East, Indonesia, etc.})
- Publish the blog entries for other analysts to see
- Analysts subscribe to each other's blogs via RSS and read
daily analysis in one location without having to visit
individual blog sites
- Collaboration happens via comments on blogs,
posts-about-posts, etc. This collaboration is on
cultural/contextual elements and is documented, archived, and
searchable.
The deliverable is an installable web application (Tomcat, Java, MySql)
along with installation, configuration and startup support. I've tried to
build down the requirements to what I know Nutch can do well out of the box
and we can wrap fairly quickly. This is a six week proof of concept so I'll
need a working beta within four weeks.
Don't worry about the blog collaboration tool at all... that's a piece of
software that we have currently and can export to via the MetaWeblogAPI very
quickly and easily. We need help on the Nutch side but I wanted you to see
the workflow from Nutch to the blogging collaboration system.
A few questions for you:
1) If you're interested, when can you start? The client is a hurry up and
wait client, but they may be willing to jump quickly in the coming days.
2) What are your hourly consulting/coding rates?
3) How many hours over the course of six weeks do you guess this project
would take? I'm guessing one to two people full time.
4) Is this sort of deliverable something that you feel you can pull off on
your own in four weeks (to beta) or would you recommend I bring in somebody
else with Nutch/web experience? Do you have anybody in mind that you work
well with?
As a software developer myself I know that these are heavily loaded
questions given the lack of exact design requirements. I'm looking for
somebody who feels this is within their capability and is willing to work
hard to make it happen. The client is willing to trust us with many of the
details and if we do a good job this should lead to a much more robust and
dynamic application. And, of course, building the prototype gives you a good
leg up on getting the contract once they move forward. So, please answer to
the best of your ability... this isn't a commitment at this point... just a
ballpark to get me moving forward with the client.
I'm going to speak with the client again later this afternoon and would
like to have a sense of what's possible. I apologize for the urgency... the
client awoke from something of a slumber yesterday.
Best,
Joe Reger