Hi all,
Meanwhile we launched our event crawler, searching for events (parties/exibitions and so on) in the German Web. Our crawler analyzes each downloaded file, if there is an event calender in the continous text of each file. The challenges about the different formats of all the event calenders out there, could be solved by analyzing the structure of each website and comparing this structure with typical things of known event calenders. Identifying the event location is one of the biggest challanges. There are about 120,000 city districts in 13,885 German cities, and their names are frequently ambiguous and used in the calenders on the web. We are currently still improveing the location recognition, but it is working on a good level now. If every website would be geocodes, described in Andrew Daviel's mail, we would not had to solve this problem.What's everyone working on at the moment?
After a few years of work, we now have a crawler, searching each day on around 50,000 pages for event calenders, caching each single event, indexing this event and saving it together with the exactly geocodes and dates. Using our index enables to search the German web for events, by telling our system your location and the perimeter the system should search in.
Andrew, we also have 13,885 webpages (because there are 13.885 cities in Germany), presenting the events of the German cities, we could geocode. Would this be interesting for you to index?
Bye
Matthias Jaekle
--
http://www.eventax.com
_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots
