Hi Stefano, This sounds like an interesting project, and it's good to know SenseClusters is proving to be useful. See my responses inline...
On Wed, Oct 22, 2014 at 5:58 AM, Stefano Silvestri <[email protected]> wrote: > I've used a clustering techniques to discover, in an unsupervised way, > relations between medical entities contained in a large collection of > anonymized medical records, in a reserch project of University of Neaples. > The data set is composed by a large set of features - all the results will > be shortly published on a journal. > > The next step in the development of our system is performing an unsupervised > cluster (relation) labeling. To do that, I think to try the clusterlabeling > module from Senseclusters. For creating the input to clusterlabeling I have > to use format_clusters module with --context option and now I have some > problems. > > I have already produced a cluto-style cluster solution file (no problem for > that) from my system. > > The rlabel file, if I'm right, is a file containing the explicit > corresponding name of each entity in the cluster (in my case the relation). > Is that right? Yes, rlabel shows the cluster to which each instance has been assigned. > > And now the problems about the context file... > It should be in senseval2 format. My experimental assesment is made of a > plain text files - so I should use plain text to headless senseval2 utility. > > I have some questions. > > 1) Does the context file have to put together all my input files (the > medical records) in one large file (and each context must correspond to a > medical record)? Yes, the input for each run of SenseClusters should be a single file with all your contexts included. > > 2) Does the contexts be headless, or I have to tag (<head></head>) all the > entities (medical names) in input? Your contexts can be headless, and so there is no need to include <head> tags in your contexts. > > 3) Are other costrains in the context files (formatting, tags, or other)? > There shouldn't be. The output from text2sval.pl should be acceptable for input "as is". > In case of success of the experiments, of course, I'll credit and cite the > Senseclusters project. > > PS - my system works on italian language. That's great! We'd be happy to answer further questions as they arise, and will be curious to know how things work out! Good luck, Ted > > Thanks for response, > Stefano Silvestri, > NLP researcher at University of Neaples "Federico II" > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > senseclusters-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------------ _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
