So yeah... I managed to grab the XML file from the Export (it's fun trying to do that on a webpage written in modern Greek when you're used to ancient Greek and you can't remember what the Koine word for "hyperlink" or "webpage is" :P).
It comes to a mere 4.2 MB file, so now the trick will be parsing the text that is wanted out of that and creating an OSIS from it. The main problem with that is that the text from the file is placed inside of a tag with xml:space="preserve" attribute, and all of the HTML is encoded as entities underneath of that. Therefore all of the structure of the actual data (other than the large groupings under alpha, beta, gamma, etc) is lost to an XML/XSL parsing combination. Wish me luck... ::dives into a pile of libxml2:: --Greg Hellings On 11/9/06, Troy A. Griffitts <[EMAIL PROTECTED]> wrote: > We had a contributer on IRC, today, post this link: > > http://el.wikisource.org/wiki/%CE%93%CE%BB%E1%BF%B6%CF%83%CF%83%CE%B1%CE%B9 > > > It looks promising. > > I know there is a way to download content in XML of a mediawiki site, > but have no experience doing so. > > Anyone want to take a shot at producing a SWORD Hesychius Lexicon, (or > even just a text file from this link? > > > Thanks for everyone's input and help. > > -Troy. > > > > Peter von Kaehne wrote: > > I spoke yesterday both to Prof Hansen and to Prof Ian Cunningham (who is a > > collaborator of Hansen) > > > > http://www.csad.ox.ac.uk/CSAD/Hesychius/Hansen.html > > > > Prof Hansen mentioned the TLG and Prof Cunningham confirmed this + said > > further there is no electronic version of Hansen's work available. I > > understand that Hansen's work is published in de Gruyters' Sammlung > > Griechischer and Lateinischer Altertuemer > > > > http://www.degruyter.com/rs/174_AT_E_ED_ENU_h.cfm?rc=19992&id=SER-M1-WDG-HESYCH-B-19992&fg=AT > > > > - a copy of which I found here to buy: > > > > http://www.basis-buch.de/main-173503.html > > > > WRT the TLG. I read the licence in detail and bluntly said, they have no > > leg to stand upon to deny us using the texts: > > > > They already allowed us to do what we want to do on the base of the licence > > - even if they get now cold feet on direct questioning. That said, at least > > Schmidts edition is now public domain anyway and unless there are > > DMCA-restrictions everyone can copy it out of there anyway. And outside of > > DMCA -alike legislation only the public domain-ness woudl appliy anyway.But > > IANAL etc. > > > > Wrt Latte/Hansen- I am not sure how far Latte's work would constitute an > > original work in its own right - I presume it does - but again the TLG > > licence does allow text extraction for scholarly work which is > > non-commercial. > > > > Peter > > > > > > > > > > > > > > -------- Original-Nachricht -------- > > Datum: Fri, 03 Nov 2006 17:23:03 -0700 > > Von: "Troy A. Griffitts" <[EMAIL PROTECTED]> > > An: SWORD Developers\' Collaboration Forum <[email protected]> > > Betreff: Re: [sword-devel] Hesychius > > > >> Peter, > >> Thank you for your time and info. We have an ongoing dialog with UCI > >> regarding the use of the data from TLG. They have denied our request > >> twice, but I am hoping a detailed third plea might solicit sympathy. > >> > >> -Troy. > >> > >> > >> > >> Peter von Kaehne wrote: > >>> The TLG has though also the older edition by Schmidt which should be by > >> now public domain as it is 1861 > >>> Peter > >>> > >>> -------- Original-Nachricht -------- > >>> Datum: Fri, 03 Nov 2006 15:59:02 +0100 > >>> Von: "Peter von Kaehne" <[EMAIL PROTECTED]> > >>> An: SWORD Developers\' Collaboration Forum <[email protected]> > >>> Betreff: Re: [sword-devel] Hesychius > >>> > >>>> The TLG indeed contains parts of the Hesychius - Latte's work only. > >>>> > >>>> Hansen's work is published on paper only in Germany. Electronic copies > >> are > >>>> not available. > >>>> > >>>> The TLG licence of the text is so that the work might be possible to > >>>> integrate - ie.e. commecial scholarly tools making use of teh whole > >> text are > >>>> forbidden but crosswire might be possible. > >>>> > >>>> HTH > >>>> > >>>> Peter > >>>> > >>>> > >>>> > >>>> -------- Original-Nachricht -------- > >>>> Datum: Thu, 02 Nov 2006 16:38:36 -0700 > >>>> Von: "Troy A. Griffitts" <[EMAIL PROTECTED]> > >>>> An: [email protected] > >>>> Betreff: [sword-devel] Hesychius > >>>> > >>>>> If anyone has the time to research where we can find an electronic > >> copy > >>>>> of Hesychius' Greek Lexicon, your efforts would be extremely valuable > >> to > >>>>> me right now. I believe the TLG has a copy of it, but I currently > >> don't > >>>>> have easy access to the TLG. Thanks in advance. > >>>>> > >>>>> -Troy. > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> sword-devel mailing list: [email protected] > >>>>> http://www.crosswire.org/mailman/listinfo/sword-devel > >>>>> Instructions to unsubscribe/change your settings at above page > >>>> -- > >>>> GMX DSL-Flatrate 0,- Euro* - Überall, wo DSL verfügbar ist! > >>>> NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl > >>>> > >>>> _______________________________________________ > >>>> sword-devel mailing list: [email protected] > >>>> http://www.crosswire.org/mailman/listinfo/sword-devel > >>>> Instructions to unsubscribe/change your settings at above page > >> > >> _______________________________________________ > >> sword-devel mailing list: [email protected] > >> http://www.crosswire.org/mailman/listinfo/sword-devel > >> Instructions to unsubscribe/change your settings at above page > > > > > _______________________________________________ > sword-devel mailing list: [email protected] > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
