I remember you saying somebody replied before to say that each department fills in their own page of the reporter separately. If that's the case, it might just be easier to get a person doing each tripos-year to add their courses. I'm considering doing this for Part II Psych as at the first lecture in a module they hand out summaries of the lectures to come - giving other people more of an idea about whether or not to attend.
Fergus Ross Ferrier 2009/10/13 Ximin Luo <xl...@cam.ac.uk> > Note: there are 2 forwarded emails here. I sent the 2nd one to Sam only, > instead of the list, by mistake. > > X > > -------- Original Message -------- > Subject: Lecture List source data > Date: Tue, 13 Oct 2009 10:16:16 +0100 > From: Ximin Luo <xl...@cam.ac.uk> > To: reporter.edi...@admin.cam.ac.uk > > Hi, > > I'm with a group of a few students who are planning to build a web-based > timetable application for arranging supervisions and other > university-related > stuff. The idea is for students to be able to select which tripos / courses > they are doing and have this data automatically be added to their > timetable. > > Unfortunately, the Reporter's Lecture-List isn't easily computer-readable; > there are various quirks and inconsistencies which make it very complicated > to > process directly from the PDFs. Do you have this data in a simpler format? > > Thanks, > > Ximin > > -------- Original Message -------- > Subject: Re: [Pidge-dev] good work > Date: Mon, 12 Oct 2009 19:23:36 +0100 > From: Ximin Luo <xl...@cam.ac.uk> > To: Sam Davyson <samdavy...@gmail.com> > References: <20091007170651.18384.97...@slice.fergusrossferrier.co.uk> > <d896124c0910080813w1a66c34dsf04e8729a7849...@mail.gmail.com> > <4ad0ac72.1000...@cam.ac.uk> > <354cb040910111627p567e9745v209c45c2b79f0...@mail.gmail.com> > > several major issues on "parsing the lecture-lists". > > - some subjects (eg. SPS) don't publish tables; instead they link to their > own > website. we'll need to write custom parsers for these. > > - lots of courses make cross references to each other in their scheduling, > such > as "first 8 lectures" / "last 16 lectures". this can be detected manually > and > accounted for, but it does mean we can't just parse every course > individually; > we need to keep track of its context too. > > - there are many typographical mistakes that make it a bitch to parse > strings. > for example, we have things like "TRIPO S", etc. this is possible to > account > for but would be a bitch to code. > > - the document structure and titles are inconsistent; we get things like > "ENGLISH TRIPOS, PART I" which is fine, but we also have things like "C > COURSES", etc. The layout is such that it's impossible *in general* (even > for a > human) to work out which ones are sub-parts and sub-sub-parts. One option > would > be to remove these groupings entirely and only have per-course data, but > that > would massively inconvenience the end user. > > All-in-all, the situation to do with global lecture lists is a total mess. > Ideally we would persuade the faculties to use a more consistent > timetabling > system, but it's unlikely that any of them will listen to us. > > I think the best thing to do for now, is to ask whoever writes the > reporter, to > provide us the source of the data. Hopefully it will be slightly cleaner. > I'll > go do some more research in this direction. > > X > > _______________________________________________ > Mailing list: > https://launchpad.net/~pidge-dev<https://launchpad.net/%7Epidge-dev> > Post to : email@example.com > Unsubscribe : > https://launchpad.net/~pidge-dev<https://launchpad.net/%7Epidge-dev> > More help : https://help.launchpad.net/ListHelp >
_______________________________________________ Mailing list: https://launchpad.net/~pidge-dev Post to : firstname.lastname@example.org Unsubscribe : https://launchpad.net/~pidge-dev More help : https://help.launchpad.net/ListHelp