I have some budget to hire someone to enhance nutch / lucern in some
interesting ways and also have budget to develop a short course in deploying
web search for the news media. Who has experience developing search engines
that index commercial websites including newspapers websites and other
sources of conventional news, allowing rapid and precise search over timely
news stories?
The course they are interested in would cover topics including
- How to avoid indexing irrelevant content like advertising text to
avoid false hits
- How to avoid getting blacklisted by news commercial sites
- Limiting indexing to the relevant news websites i.e., not spidering
into an advertisers website
- Creating archives of news stories for later search and historical
retrieval
- Other challenges involved in deploying nutch on rapidly changing
content like news articles
If anyone is interested, contact me at crawler29 which is my
gmail.comaccount or IM me on MSN at voting911 (hotmail).
By the say, is there an IRC room for anything about web search, nutch /
lucern or releated sorts of things? If so, how do I get there?
Thanks!
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers