Some discussion came up on the developer's list about running senseclusters when there is no target word (head tags) in the data. In short, SenseClusters allows all 4 possibilities when either training/test data does/n't have head tags. And following explains the "tricks" to do so! We will try to clarify this in the documentation during next releases...
Regards, Amruta > we had to remove it! [so, windower was the only program in old > senseclusters that would *check* for the target word] To be more specific and clear, windower still checks for the target word in every instance, as otherwise it wouldnt be able to give us the window around the word. Its discriminate.pl that handles the issue of "no-target". So, if you want to run discriminate.pl on the data where the instances DO have the target word, we need to either use the --target option or copy the target.regex file into the current dir (where discriminate is being run). If we need to run discriminate.pl on the data where instances DON'T have the target word, we just don't have to use the --target option and make sure that target.regex isnt there in the same dir. In this case, discriminate.pl makes sure to NOT run windower.pl on such data and I *think* prompts you if you are using the --scope options (when it calls the windower)! Actually, senseclusters allows following possibilities: 1. both train and test have target/head words! 2. plain train (like Giga Word) with target-specific test! 3. both train and test do not have target words! 4. not sure why one would do this, but plain test with target-specific train! Note that, by "plain" I mean the data w/o target/head words! A more tricky part in discriminate.pl is that, if you do have a data with target words and forget to use --target or copy target.regex in the same dir, discriminate.pl calls maketarget.pl program that automatically creates a target regex! If the instances do have the head tags, the maketarget.pl program succeds giving the regex for possible target word forms. If the data doesn't include any head tags, the maketarget.pl program doesn't create any target.regex file and so, discriminate.pl knows that the data doesn't have any head tags! Hope this makes sense! All this is probably not documented well currently, so thought I would record these details at least in the email... Please let me know if this leaves anything unclear! Thanks, Amruta ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
