Hi Lyle,


"For right now the publicly available databas(objectssource.com) are small and nutch works very well already."


It think you mean ObjectsSearch.com .I just want to say that Anyone can use or test our search engine and we are growing our database size and try to keep it Up-to-Date so users can see how the latest version works.

Just keep checking as we fetch the sites all the time and database will grow fast.It may help in your work.

Let us know how we can help.


Best Regards,


Asim Iqbal














"What do you all think I should get started on? The code works really well now, so I don't think we want to do much tweaking on small databases--due to scaling factors. But probably do want to get some tools/benchmarks working so can tweak on a very large database.

1. I spent some time reading the documentation on Mike's QualityTestTool.
Code at:
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/
java/net/nutch/quality/QualityTestTool.java

I liked the concept, and liked the questions and approach he used. (100
questions from Inktomi)--rating on overlap in top 10 returned results. It
would be very useful to extend this to a training approach. I could code a
training extension--would probably need some help.

2. I could spend time just doing queries, and looking for deficiencies.
I was able to find problems this way with the Yahoo Labs implementation(9 months
old?)--
but the problems I've found have been fixed. This approach probably would not be
too productive until somebody gets a really big database. For right now
the publicly available databas(objectssource.com) are small and nutch
works very well already.


3. I could code some sort of user interface that would allow manual
tweaking of the parameters and examination of the queries. An
easy--to--use testing interface let a lot of people test and tweak.

4. I could systematically hand rate hits. I could probably do a couple of
hundred a week--and that is enough to produce useful results. This is also
pretty necessary as a reality check--even if Google and Nutch produce
similar results--it could be garbage.

Again this might better wait until it is scaled up. However--it might also
be useful to do on an intranet scale. This would be useful for intranets,
as well as giving a baseline comparison for bigger projects.

One posibility: Doug: If you could get a query set from the OSU intranet,
30-50 random questions, I could hand compare it to Google results--and
report.

Any other suggestions?
What should I start with?

Thanks
Lyle"

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee� Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to