Some discussion came up on the developer's list about running
senseclusters when there is no target word (head tags) in the
data. In short, SenseClusters allows all 4 possibilities when
either training/test data does/n't have head tags. And
following explains the "tricks" to do so! We will try to clarify
this in the documentation during next releases...

Regards,
Amruta

> we had to remove it! [so, windower was the only program in old
> senseclusters that would *check* for the target word]

To be more specific and clear, windower still checks for the
target word in every instance, as otherwise it wouldnt be able
to give us the window around the word. Its discriminate.pl that
handles the issue of "no-target". So, if you want to run
discriminate.pl on the data where the instances DO have the
target word, we need to either use the --target option or copy
the target.regex file into the current dir (where discriminate
is being run). If we need to run discriminate.pl on the data
where instances DON'T have the target word, we just don't have
to use the --target option and make sure that target.regex isnt
there in the same dir. In this case, discriminate.pl makes sure
to NOT run windower.pl on such data and I *think* prompts you
if you are using the --scope options (when it calls the windower)!

Actually, senseclusters allows following possibilities:

1. both train and test have target/head words!

2. plain train (like Giga Word) with target-specific test!

3. both train and test do not have target words!

4. not sure why one would do this, but plain test with target-specific
train!

Note that, by "plain" I mean the data w/o target/head words!

A more tricky part in discriminate.pl is that, if you do have a
data with target words and forget to use --target or copy target.regex
in the same dir, discriminate.pl calls maketarget.pl program that
automatically creates a target regex! If the instances do have the head
tags, the maketarget.pl program succeds giving the regex for possible
target word forms. If the data doesn't include any head tags, the
maketarget.pl program doesn't create any target.regex file and so,
discriminate.pl knows that the data doesn't have any head tags!

Hope this makes sense! All this is probably not documented well
currently, so thought I would record these details at least in
the email...

Please let me know if this leaves anything unclear!

Thanks,
Amruta




-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to