Sorry, I attached a version of the tarball that had the executable in it and the list moderation caught it. Here's the cleaned version. See the detailed summary below.
--Greg On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings <[EMAIL PROTECTED]> wrote: > Troy, > > The task that I'm currently working on as research for my dissertation > can possibly be leveraged. We are attempting to sort out image > annotations (in an effort to learn how to automatically create them). > As such, we are given a list of terms which annotate the contents of > an image - but we want to know how similar the semantics of some of > the terms are. Here is where I think parallels can be drawn: > > We use established semantic relatedness measurement techniques (see > wn-similarity.sourceforge.net for some of the best tools currently > available for that) to construct a graph connecting each term with all > the other annotating terms, where the edge weight of the graph is the > value of the average over all of the semantic measures that the > WordNet Similarity measure returns (in time we will take a weighted > average with all the values normalized between [0..1], since some > measures only scale from [0..1/2] and others can take values up to > 16,000 and more). We then do some strange graph partitioning tricks, > etc -- that's someone else's domain. > > However, you could possibly utilize the following modification of the > technique. For each term in the ESV, find the similarity between it > and every term in the KJV. If they are identical, set the value to 1, > otherwise, use the WordNet::Similarity tools to produce a value. Then > weight the value of the link by their relative positions in the text > (that way two occurrences of the same term can be differentiated), for > example, divide by abs(position(ESV) - position(KJV)) or something > similar. Then assign the value for each term based on the word that > it most closely resembles. > > This is very similar to what you're already doing, but not identical. > I have modified the esvtag.cpp to use the included similarity.py to > get the semantic distance from a few of the metrics that > WordNet::Similarity uses (however, it scrapes a webpage to do so - you > will do better, if you decide to use this system, to install the local > Perl data and run the system locally) whenever the terms are not > identical. It continues to work for Gen 1:1, the program pegs out my > processor and does not appear to have any intention of completing Gen > 1:2 -- I don't know where the fault for that lies, but it does that > both in your original version and in this version. Obviously, the > weighting I proposed would work best when the version being used > maintains very similar phrase ordering and structuring to the KJV, but > I suppose any metric we use will require human supervision anyway. > > As a bonus, I also have it sticking contiguous terms which are part of > the same source -- "In the beginning" -- into the same <w> tag. > > --Greg > P.S. The attached tarball will clobber any current esvtag directory > that's a child of where you unpack it - so be careful about that. > > On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts <[EMAIL PROTECTED]> wrote: >> Hey guys. I have a fun and useful challenge for anyone wishing to show off >> their prowess at problem solving and basic world domination. >> >> >> We have morphological data for the KJV. Lots of work by many people went >> into this data, to markup each English word in the Bible text to the >> corresponding Hebrew or Greek word in the original text. >> >> >> We have many other Bibles with /similar/ wording to the KJV which are not >> yet marked up. >> >> >> Lane Dennis from Crossway (ESV publishers) is here at Tyndale House visiting >> and we've talked in the past about helping them markup their ESV text to the >> original. >> >> >> I have done most all of the grunt work for you! >> >> Attached is source for a program which attempts to insert <w> markup into >> the ESV markup using the KJV data. >> >> It is HEAVILY commented, requires latest SVN of the SWORD engine INSTALLED >> on your system, both the KJV and ESV modules INSTALLED, and has an nice >> little method: >> >> void matchWords(...) >> >> where you're given: >> a word list from ESV >> a word list from KJV >> a map from KJV word to an XMLTag "<w...>" >> >> and all you have to do is fill out the equivalent: >> map from ESV word to an XMLTag. >> >> >> As a sample, it current has a really silly algorithm that actually works for >> Gen.1.1, so you have an example of the work you need to do. >> >> All you have to do is add the real magic that figures out which words in the >> ESV map to which words in the KJV (well, you get the idea). >> >> Have fun! And I'm sure you can see where this is going and how useful it >> can be for future work! >> >> >> -Troy. >> >> >> >> >> _______________________________________________ >> sword-devel mailing list: [email protected] >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> >
esvtag.tar.gz
Description: GNU Zip compressed data
_______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
