Hey Paul,

    Great that somebody is trying to get this group moving again!

    I agree with you that there is still a lot to be done in 'understanding' web pages. I'm especially hopeful that the "Semantic Web" initiative will, in a not-too-long run, give us a more tractable way of generating useful web indexes. So far NLP has shown to be too time consuming and error-prone for a task this size (correct me if I'm wrong! NLP is not really my area). Ontology use (that's a little bit closer to my area) has shown to help a lot to classify web pages, but there is still a lot to go to be able to make this as scalable as the brute force algorithms used by Google.
    Remember that the web is highly dynamic and HUGE. There are no standard protocols to receive messages when a new page is created or when the contents or address of a page have changed. So you have always to keep "browsing" and updating knowledge. My opinion is that maybe Google is the best you can get (well, the ranking scheme can always get a little better with some minor changes) when you want to treat all web pages. NLP and other processing methods can be used on top of this to generate something better, but the domain has to be constraint.
    Maybe, as a long term project, many different constrained indexes can be combined if they are made using the same infrastructure (DAML+OIL/Web Ontology?).

    Well, that's all my point of view. Actually my research area is more related to the integration part. I'm working on a Ontology-enabled Link Discovery system. The objective of such system is to find patterns (spacial and time, hopefully seamlessly) in data that has been previously been pre-processed and stored using ontologies (DAML+OIL, mainly, with some extra things to enable pattern definition) as a data structure framework.

Michel

Paul Maddox wrote:
Hi,

I'm sure even Google themselves would admit there there's scope for 
improvement.  With Answers, Catalogs, Image Search, News, etc, etc, 
they seem to be quite busy! :-)

As an AI programmer specialising in NLP, personally I'd like to see 
web bots actually 'understanding' the content they review, rather 
than indexing by brute force.  How about the equivalent of Dmoz or 
Yahoo Directory, but generated by a web spider?

Paul.


On Fri, 08 Nov 2002 10:22:48 +0100, Harry Behrens wrote:
  
Haven't seen traffic in ages.
I guess the theme's pretty much dead.

What's there to invent after Google?

   -h

    


_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots
  

Reply via email to