Hi, Am 25.07.2009, 17:13 Uhr, schrieb River Tarnell <[email protected]>:
> .NET is not, itself, non-free. Microsoft's implementation (the most > common > one) is, but Mono (http://mono-project.com/Main_Page) is not. perhaps > the AWB > developers could make whatever changes are needed to run it on a free > implementation. Mono works great, I'm using bots using the DotNetWikiBot framework on the toolserver. For simple parsing of a pages-articles.xml file, you may test a script, I used some time ago - it is a very simple xml parser (for the pages-articles.xml structure) and calls a function called "test" with the article title and the text of the article. Its not the perfect solution but the solution implemented in five minutes ;) function test($title, $text) { // do something here } $filename = "enwiki-200XXXXX-pages-articles.xml"; $dataFile = fopen($filename, "r"); if ($dataFile) { $status = 0; while (!feof($dataFile)) { $buffer = fgets($dataFile, 4096); if (($status == 0) && (stripos($buffer, "<page>") !== false)) $status = 1; elseif (($status == 1) && (stripos($buffer, "<title>") !== false)) $title = strip_tags($buffer); elseif (($status == 1) && (stripos($buffer, "<revision>") !== false)) $status = 2; elseif (($status == 2) && (stripos($buffer, "<text ") !== false)) { $status = 3; $text = strip_tags($buffer); if (stripos($buffer, "</text>") !== false) { $status = 2; } } elseif (($status == 3) && (stripos($buffer, "</text>") === false)) $text .= strip_tags($buffer); elseif ($status == 3) { $text .= strip_tags($buffer); $status = 2; } elseif (($status == 2) && (stripos($buffer, "</revision>") !== false)) $status = 1; elseif (($status == 1) && (stripos($buffer, "</page>") !== false)) { test(trim($title), trim($text)); $title = ""; $text = ""; $status = 0; } } fclose($dataFile); } else { die("File not found: $filename"); } _______________________________________________ Toolserver-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/toolserver-l
