Hey Oliver Keys from Wikipedia.org. I am glad you are interested in the wikitimeline.net concept and I look forward to engaging the Wikipedia community in a discussion about its potential. We developed Wikitimelins.net strictly for resume enhancement purposes. :)
You asked/said: >> I think it would be pretty awesome, yep :). >> How exactly would it work; we'd link through to your setup? >> Your software would be hosted here? >> How it's formatted will alter who needs to approve what (if it's not >>something that involves developer work or approval, for example, it's a >>community matter), >> but however we do it I would be very interested to see more feedback and >>reactions from the community: we try to include them in any technical >>changes. ************************************************************************************** First off, how did you get a job at Wikipedia? Do you have to know somebody? Does bribery work? <snicker, just kidding> How many servers does Wikipedia have now? Are you in the Fort Lauderdale, Florida Wikipedia server center? ************************************************************************************** I believe a timeline of a Wikipedia article could be placed in a Wikipedia article using the following notation: <script src="http://wikitimelines.net/H/P/HP40C1A2X.js" type="text/javascript"></script> <div id="HP40C1A2X"></div> Of course the magic part is not quite that easy, read on. I think you/me would only need 1 fast server for this ($300 a month, Shawn?). Just give it a quad processor, 16 gigs of memory and a big pipe. I could probably serve up a million timelines a day off a server like that, with about 30 instances of my database running concurrently. It would be a very busy server (so we would probably have to pack it in ice <ha ha>). You could use Carbonite to back it up, because in the end, the timelines back-end would consist of millions of little databases. Where the server is physically located and who actually owns/maintains it makes no difference to us, as long as we/I have access to it. The back-end for timelines was designed for: 1) Portability 2) Storage optimization (no indexes needed, almost) 3) Ease of replication and backup 4) Ease of multi-user usage (no large databases or records to lock) 5) Heavy demand 6) Speed, speed and speed (I have it on a cheap, slow server, that is why it appears a bit slow right now) The heart of the timelines is a rather complex date parsing system and a presentation layer (10,000 lines of code, 3000+ man hours of work). Here is a step by step listing of its (back-end) algorithm traversing its task of greatest friction. :) (In these steps I use the word "I" instead of "we" for clarity. I have 2 to 3 people working on this project, at any one time, so using "we" would be more accurate.) 1) A timeline is requested. 2) I check if the timelines directory exists (I have all the 7 million Wikipedia titles and have assigned them all unique ID's). 3) If the directory dose exist (and therefore the timeline has already gone through initial processing), I send the timeline JavaScript via AJAX, then I send the timeline data as XML or JSON. 4) If the timeline doesn't exist, I create a directory for it. 5) In this directory I create the following tables: Create Table epochs (Id c(9), Selected l) Create Table pics (Id c(9), Caption m, bigpic m, startdate T, Date T, Current c(1), modified T, added T, Height N(4,0), Width N(4,0), Link m) Create Table mess (Id c(9), Name c(35), email c(35), website c(35), Date T, Active c(1), Mess m) Create Table sen (sen c(9), numdates N(2), Para c(9), Start N(5), End N(5), startd T, Endd T, First c(1), Current c(1), Deleted T, added T, Color c(6), tsen N(4)) Create Table para (Id c(9), Fixed m, dates m, marked m, Current c(1), added T, First c(1), Deleted T) Create Table decorator (Id c(9), startdate T, enddate T, Color c(6), opacity N(3), startlabel m, Current c(1), Type c(1), Deleted T, modified T ,added T) Create Table allmags (Id c(9), Start T, End T, unit c(1), mag N(4), Current c(1), Order N(4,0), band N(3), Deleted T, modified T ,added T) Create Table tljsdb (Id c(9), band N(2), Prop c(1), Value m, Current c(1), Deleted T, modified T ,added T) Create Table global (Date d, Height N(10), Width N(10), tlheight N(10), Current c(1), gotpics l, modified T ,added T, picsavail l, rtotal N(10), rcount N(10), lasttime T) 6) Then I pull the Wikipedia article from your (en.Wikipedia.org) website servers. 7) I then pull out and cleanup each individual paragraph from the article and stick each one into a database. 8) I then mark all of the (suspected) dates in the paragraph and save that into another field in the database (very complicated). 9) I then do sentence disambiguation, which is a lot more complex than we had originally thought it would be, mine is nearly 100% accurate. http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation 10) I then send that into a comprehensive date disambiguation algorithm to see if the (suspected) dates are really dates. 11) If they are, I first detect "continuous dates". Dates that denote a continuum. Example - "He was prime minister from January 20, 1874 to August 2, 1880". 12) I then grab single dates. Example "He became prime minister on January 20, 1874." 13) I then grab "widow dates" like "He became prime minister in January of that year." because in order for this to make any sense (in natural language) a year is almost always in the previous sentence or paragraph. 14) I then parse out all pictures, from the article, and disambiguate all of the potential dates, from the pictures caption, using the same algorithm I used on the paragraphs. These dates are used for picture placement on the timeline. Users can turn pictures on or off and can change the dates on the pictures, to adjust where they are placed on the timeline. 15) I then construct the JavaScript for the new timeline. 16) I send the timelines JavaScript to the client browser. 17) I then construct the XML data for the timeline. 18) I then send the XML to the browser (actually the timelines JavaScript requests the XML as it executes). Of course, this whole thing is much more involved than this. I just wanted to give you an overview. If the timeline already exists (in my databases), it only takes seconds to construct and display it. If the timeline dose not exist, it can take up to 60 seconds to traverse steps 1 to 18 above (depending on the size of the article). Luckily this is a one shot deal, as it only happens once for each Wikipedia article. Each timeline is only "born" once. I hope this helps! We have a rather long list of improvements for the website and the back-end. We just put the website up as a beta to get as much feedback as quickly as possible. So it is BACK TO WORK! lol Thanks Jeff Roehl [email protected] (818) 912-7530 -- You received this message because you are subscribed to the Google Groups "SIMILE Widgets" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/simile-widgets?hl=en.
