>> Terry, Jeff is the builder of http://www.chaosreligion.com/wiki/WikiTimeLine,
Hey this is Jeff Roehl. There is been a bit of misdirection here. Oliver Keyes, a noble member of the Wikipedia staff has provided all of us with the incorrect link. Our website is NOT http://www.chaosreligion.com/wiki/WikiTimeLine (a previous stab at timelineing Wikipedia articles) It is the new: http://wikitimelines.net/ Which makes all the difference in the world. Not that we don't appreciate past attempts to make Wikipedia a better website. Any attempt to make Wikipedia.org a better website is inspirational. :) Thanks Jeff Roehl [email protected] (818) 912-7530 >________________________________ > From: Oliver Keyes <[email protected]> >To: Jeff Roehl <[email protected]> >Cc: faty <[email protected]>; Oliver Roehl <[email protected]>; >"[email protected]" <[email protected]>; "[email protected]" ><[email protected]>; "[email protected]" ><[email protected]>; David Karger <[email protected]>; >"[email protected]" <[email protected]>; >"[email protected]" <[email protected]>; >"[email protected]" <[email protected]>; Terry Chay ><[email protected]> >Sent: Friday, August 24, 2012 4:44 AM >Subject: Re: Here in Southern California > > >Thanks for this feedback :). So, this sounds like the sort of thing that >should be run by Terry Chay, our head of Features; I'm CCing him into the >conversation now :). > >Terry, Jeff is the builder of http://www.chaosreligion.com/wiki/WikiTimeLine, >an extension that allows for gorgeous timelines to be embedded within >Wikipedia articles. He's interested in (first-off) feedback, and second in >thinking about ways we could use his software. I would imagine this would >require approval from you guys, as well as ops and presumably community buyin, >but I'll start with you because you're the most qualified person to speak on >the technical merits or issues. I'm going to gracefully step aside and let you >handle this - let me know if/when it's possible to reach the "community" stage >and see if people are interested in it. > > >On 23 August 2012 21:45, Jeff Roehl <[email protected]> wrote: > >Hey Oliver Keys from Wikipedia.org. >> >> >>I am glad you are interested in the wikitimeline.net concept and I look >>forward to engaging the Wikipedia community in a discussion about its >>potential. We developed Wikitimelins.net strictly for >>resume enhancement purposes. :) >> >> >>You asked/said: >> >> >>>> I think it would be pretty awesome, yep :). >>>> How exactly would it work; we'd link through to your setup? >>>> Your software would be hosted here? >>>> How it's formatted will alter who needs to approve what (if it's not >>>>something that involves developer work or approval, for example, it's a >>>>community matter), >>>> but however we do it I would be very interested to see more feedback and >>>>reactions from the community: we try to include them in any technical >>>>changes. >> >> >> >>************************************************************************************** >> >> >> >>First off, how did you get a job at Wikipedia? Do you have to know somebody? >>Does bribery work? <snicker, just kidding> >> >> >>How many servers does Wikipedia have now? Are you in the Fort Lauderdale, >>Florida Wikipedia server center? >> >> >>************************************************************************************** >> >> >>I believe a timeline of a Wikipedia article could be placed in >>a Wikipedia article using the following notation: >> >> >> >><script src="http://wikitimelines.net/H/P/HP40C1A2X.js" >>type="text/javascript"></script> >><div id="HP40C1A2X"></div> >> >> >>Of course the magic part is not quite that easy, read on. >> >> >>I think you/me would only need 1 fast server for this ($300 a month, Shawn?). >>Just give it a quad processor, 16 gigs of memory and a big pipe. I >>could probably serve up a million timelines a day off a server like that, >>with about 30 instances of my database running concurrently. It would be a >>very busy server (so we would probably have to pack it in ice <ha ha>). You >>could use Carbonite to back it up, because in the end, the >>timelines back-end would consist of millions of little databases. Where the >>server is physically located and who actually owns/maintains it makes no >>difference to us, as long as we/I have access to it. >> >> >>The back-end for timelines was designed for: >> >> >>1) Portability >>2) Storage optimization (no indexes needed, almost) >>3) Ease of replication and backup >>4) Ease of multi-user usage (no large databases or records to lock) >>5) Heavy demand >>6) Speed, speed and speed (I have it on a cheap, slow server, that is why it >>appears a bit slow right now) >> >> >>The heart of the timelines is a rather complex date parsing system and a >>presentation layer (10,000 lines of code, 3000+ man hours of work). >> >> >>Here is a step by step listing of its (back-end) algorithm traversing its >>task of greatest friction. :) >> >> >>(In these steps I use the word "I" instead of "we" for clarity. I have 2 to 3 >>people working on this project, at any one time, so using "we" would be more >>accurate.) >> >> >>1) A timeline is requested. >>2) I check if the timelines directory exists (I have all the 7 million >>Wikipedia titles and have assigned them all unique ID's). >>3) If the directory dose exist (and therefore the timeline has already gone >>through initial processing), I send the timeline JavaScript via AJAX, then I >>send the timeline data as XML or JSON. >>4) If the timeline doesn't exist, I create a directory for it. >>5) In this directory I create the following tables: >> >> >>Create Table epochs (Id c(9), Selected l) >>Create Table pics (Id c(9), Caption m, bigpic m, startdate T, Date T, >>Current c(1), modified T, added T, Height N(4,0), Width N(4,0), Link m) >>Create Table mess (Id c(9), Name c(35), email c(35), website c(35), Date >>T, Active c(1), Mess m) >>Create Table sen (sen c(9), numdates N(2), Para c(9), Start N(5), End >>N(5), startd T, Endd T, First c(1), Current c(1), Deleted T, added T, Color >>c(6), tsen N(4)) >>Create Table para (Id c(9), Fixed m, dates m, marked m, Current c(1), >>added T, First c(1), Deleted T) >>Create Table decorator (Id c(9), startdate T, enddate T, Color c(6), opacity >>N(3), startlabel m, Current c(1), Type c(1), Deleted T, modified T ,added T) >>Create Table allmags (Id c(9), Start T, End T, unit c(1), mag N(4), Current >>c(1), Order N(4,0), band N(3), Deleted T, modified T ,added T) >>Create Table tljsdb (Id c(9), band N(2), Prop c(1), Value m, Current c(1), >>Deleted T, modified T ,added T) >>Create Table global (Date d, Height N(10), Width N(10), tlheight N(10), >>Current c(1), gotpics l, modified T ,added T, picsavail l, rtotal N(10), >>rcount N(10), lasttime T) >> >> >>6) Then I pull the Wikipedia article from your (en.Wikipedia.org) website >>servers. >> >>7) I then pull out and cleanup each individual paragraph from the article and >>stick each one into a database. >>8) I then mark all of the (suspected) dates in the paragraph and save that >>into another field in the database (very complicated). >>9) I then do sentence disambiguation, which is a lot more complex than we had >>originally thought it would be, mine is nearly 100% >>accurate. http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation >>10) I then send that into a comprehensive date disambiguation algorithm to >>see if the (suspected) dates are really dates. >>11) If they are, I first detect "continuous dates". Dates that denote >>a continuum. Example - "He was prime minister from January 20, 1874 to August >>2, 1880". >>12) I then grab single dates. Example "He became prime minister on January >>20, 1874." >>13) I then grab "widow dates" like "He became prime minister in January of >>that year." because in order for this to make any sense (in natural language) >>a year is almost always in the previous sentence or paragraph. >>14) I then parse out all pictures, from the article, and disambiguate all of >>the potential dates, from the pictures caption, using the same algorithm I >>used on the paragraphs. These dates are used for picture placement on the >>timeline. Users can turn pictures on or off and can change the dates on the >>pictures, to adjust where they are placed on the timeline. >>15) I then construct the JavaScript for the new timeline. >>16) I send the timelines JavaScript to the client browser. >>17) I then construct the XML data for the timeline. >>18) I then send the XML to the browser (actually the >>timelines JavaScript requests the XML as it executes). >> >> >>Of course, this whole thing is much more involved than this. I just wanted to >>give you an overview. >> >> >>If the timeline already exists (in my databases), it only takes seconds >>to construct and display it. >> >> >>If the timeline dose not exist, it can take up to 60 seconds to traverse >>steps 1 to 18 above (depending on the size of the article). Luckily this is a >>one shot deal, as it only happens once for each Wikipedia article. Each >>timeline is only "born" once. >> >> >>I hope this helps! >> >> >>We have a rather long list of improvements for the website and the back-end. >>We just put the website up as a beta to get as much feedback as quickly as >>possible. >> >> >>So it is BACK TO WORK! lol >> >>Thanks >>Jeff Roehl >>[email protected] >>(818) 912-7530 > > >-- >Oliver Keyes >Community Liaison, Product Development >Wikimedia Foundation > > > > -- You received this message because you are subscribed to the Google Groups "SIMILE Widgets" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/simile-widgets?hl=en.
