And I forgot to mention that I had posted it to the wxSword download site on Soureforge: https://sourceforge.net/project/showfiles.php?group_id=142229
Sorry! --Greg On 11/10/06, Greg Hellings <[EMAIL PROTECTED]> wrote: > Getting the output from their included wiki export page was the > trivial portion of the task (read: I had to guess completely judging > from the directions that were on Wikipedia's site and extrapolate > those to figure out what name WikiSource actually wanted for each > page). Writing the XSLT is proving to be far more cumbersome. I just > spent over an hour trying to figure out why my XSLT was not producing > any output, only to realize that the exported file had a default > namespace. > > It will be incredibly difficult to extract any structural information > from the files in an automated system. For one, I am not familiar > with what Hesychius is, and while I took extensive Greek in my > undergrad course of study, reading through that massive document would > be unwieldy for me at this point, since I could not dedicate huge > amount of time to the work. > > For now I have posted an XML file that is the filtered XML that comes > from the export, with everything except for the page, title and text > fields removed (since the rest of the information simply pertains to > who performed the latest modification to the page and when it happened > and their change log entry). I have also modified all of the > and > < to be > and < in an effort to return the data to its display > format. > > Someone will need to figure out how to differentiate when the < or > > is pertinent to the HTML/XML or when it is pertinent to the more > specific data within. The WikiSource document seems to make very poor > use of the < and > characters to both denote a keyword and to > emphasize certain words or phrases, thus making the data even more > difficult to parse. I don't know that a fully automated solution will > be possible with this data or with the original data... but it's all > just a starting point. > > If you want other files, let me know. > > --Greg > > On 11/9/06, Troy A. Griffitts <[EMAIL PROTECTED]> wrote: > > Greg, > > You're amazing!!! I must have played with stuff for hours today > > trying > > to make sense from the wikimedia export docs. I even downloaded some > > PyWikipediaBot python thingy but couldn't get it to run either (I am > > inept at python, so I wasn't surprised, though quite frustrated, > > nonetheless). Thank you!!! If this might make any difference, my > > personal interest in the lexicon, after it is usable by SWORD, is to > > build a synonyms database from the data. If there is any indication in > > the data that a synonym for an entry is being listed, I would most > > appreciate a unique <seg type="x-synonym>, or some such. Thank you > > again, so much, for your work. I am very excited! > > > > -Troy. > > > > > > > > Greg Hellings wrote: > > > So yeah... I managed to grab the XML file from the Export (it's fun > > > trying to do that on a webpage written in modern Greek when you're > > > used to ancient Greek and you can't remember what the Koine word for > > > "hyperlink" or "webpage is" :P). > > > > > > It comes to a mere 4.2 MB file, so now the trick will be parsing the > > > text that is wanted out of that and creating an OSIS from it. The > > > main problem with that is that the text from the file is placed inside > > > of a tag with xml:space="preserve" attribute, and all of the HTML is > > > encoded as entities underneath of that. Therefore all of the > > > structure of the actual data (other than the large groupings under > > > alpha, beta, gamma, etc) is lost to an XML/XSL parsing combination. > > > > > > Wish me luck... ::dives into a pile of libxml2:: > > > > > > --Greg Hellings > > > > > > On 11/9/06, Troy A. Griffitts <[EMAIL PROTECTED]> wrote: > > >> We had a contributer on IRC, today, post this link: > > >> > > >> http://el.wikisource.org/wiki/%CE%93%CE%BB%E1%BF%B6%CF%83%CF%83%CE%B1%CE%B9 > > >> > > >> > > >> It looks promising. > > >> > > >> I know there is a way to download content in XML of a mediawiki site, > > >> but have no experience doing so. > > >> > > >> Anyone want to take a shot at producing a SWORD Hesychius Lexicon, (or > > >> even just a text file from this link? > > >> > > >> > > >> Thanks for everyone's input and help. > > >> > > >> -Troy. > > >> > > >> > > >> > > >> Peter von Kaehne wrote: > > >>> I spoke yesterday both to Prof Hansen and to Prof Ian Cunningham (who > > >>> is a collaborator of Hansen) > > >>> > > >>> http://www.csad.ox.ac.uk/CSAD/Hesychius/Hansen.html > > >>> > > >>> Prof Hansen mentioned the TLG and Prof Cunningham confirmed this + said > > >>> further there is no electronic version of Hansen's work available. I > > >>> understand that Hansen's work is published in de Gruyters' Sammlung > > >>> Griechischer and Lateinischer Altertuemer > > >>> > > >>> http://www.degruyter.com/rs/174_AT_E_ED_ENU_h.cfm?rc=19992&id=SER-M1-WDG-HESYCH-B-19992&fg=AT > > >>> > > >>> - a copy of which I found here to buy: > > >>> > > >>> http://www.basis-buch.de/main-173503.html > > >>> > > >>> WRT the TLG. I read the licence in detail and bluntly said, they have > > >>> no leg to stand upon to deny us using the texts: > > >>> > > >>> They already allowed us to do what we want to do on the base of the > > >>> licence - even if they get now cold feet on direct questioning. That > > >>> said, at least Schmidts edition is now public domain anyway and unless > > >>> there are DMCA-restrictions everyone can copy it out of there anyway. > > >>> And outside of DMCA -alike legislation only the public domain-ness > > >>> woudl appliy anyway.But IANAL etc. > > >>> > > >>> Wrt Latte/Hansen- I am not sure how far Latte's work would constitute > > >>> an original work in its own right - I presume it does - but again the > > >>> TLG licence does allow text extraction for scholarly work which is > > >>> non-commercial. > > >>> > > >>> Peter > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> -------- Original-Nachricht -------- > > >>> Datum: Fri, 03 Nov 2006 17:23:03 -0700 > > >>> Von: "Troy A. Griffitts" <[EMAIL PROTECTED]> > > >>> An: SWORD Developers\' Collaboration Forum <[email protected]> > > >>> Betreff: Re: [sword-devel] Hesychius > > >>> > > >>>> Peter, > > >>>> Thank you for your time and info. We have an ongoing dialog with > > >>>> UCI > > >>>> regarding the use of the data from TLG. They have denied our request > > >>>> twice, but I am hoping a detailed third plea might solicit sympathy. > > >>>> > > >>>> -Troy. > > >>>> > > >>>> > > >>>> > > >>>> Peter von Kaehne wrote: > > >>>>> The TLG has though also the older edition by Schmidt which should be > > >>>>> by > > >>>> now public domain as it is 1861 > > >>>>> Peter > > >>>>> > > >>>>> -------- Original-Nachricht -------- > > >>>>> Datum: Fri, 03 Nov 2006 15:59:02 +0100 > > >>>>> Von: "Peter von Kaehne" <[EMAIL PROTECTED]> > > >>>>> An: SWORD Developers\' Collaboration Forum <[email protected]> > > >>>>> Betreff: Re: [sword-devel] Hesychius > > >>>>> > > >>>>>> The TLG indeed contains parts of the Hesychius - Latte's work only. > > >>>>>> > > >>>>>> Hansen's work is published on paper only in Germany. Electronic > > >>>>>> copies > > >>>> are > > >>>>>> not available. > > >>>>>> > > >>>>>> The TLG licence of the text is so that the work might be possible to > > >>>>>> integrate - ie.e. commecial scholarly tools making use of teh whole > > >>>> text are > > >>>>>> forbidden but crosswire might be possible. > > >>>>>> > > >>>>>> HTH > > >>>>>> > > >>>>>> Peter > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -------- Original-Nachricht -------- > > >>>>>> Datum: Thu, 02 Nov 2006 16:38:36 -0700 > > >>>>>> Von: "Troy A. Griffitts" <[EMAIL PROTECTED]> > > >>>>>> An: [email protected] > > >>>>>> Betreff: [sword-devel] Hesychius > > >>>>>> > > >>>>>>> If anyone has the time to research where we can find an electronic > > >>>> copy > > >>>>>>> of Hesychius' Greek Lexicon, your efforts would be extremely > > >>>>>>> valuable > > >>>> to > > >>>>>>> me right now. I believe the TLG has a copy of it, but I currently > > >>>> don't > > >>>>>>> have easy access to the TLG. Thanks in advance. > > >>>>>>> > > >>>>>>> -Troy. > > >>>>>>> > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> sword-devel mailing list: [email protected] > > >>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel > > >>>>>>> Instructions to unsubscribe/change your settings at above page > > >>>>>> -- > > >>>>>> GMX DSL-Flatrate 0,- Euro* - Überall, wo DSL verfügbar ist! > > >>>>>> NEU: Jetzt bis zu 16.000 kBit/s! http://www.gmx.net/de/go/dsl > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> sword-devel mailing list: [email protected] > > >>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel > > >>>>>> Instructions to unsubscribe/change your settings at above page > > >>>> _______________________________________________ > > >>>> sword-devel mailing list: [email protected] > > >>>> http://www.crosswire.org/mailman/listinfo/sword-devel > > >>>> Instructions to unsubscribe/change your settings at above page > > >> > > >> _______________________________________________ > > >> sword-devel mailing list: [email protected] > > >> http://www.crosswire.org/mailman/listinfo/sword-devel > > >> Instructions to unsubscribe/change your settings at above page > > >> > > > > > > _______________________________________________ > > > sword-devel mailing list: [email protected] > > > http://www.crosswire.org/mailman/listinfo/sword-devel > > > Instructions to unsubscribe/change your settings at above page > > > > > > _______________________________________________ > > sword-devel mailing list: [email protected] > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
