[ https://issues.apache.org/jira/browse/STANBOL-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olivier Grisel updated STANBOL-197: ----------------------------------- Description: Implementation plan: Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia. Such an index can be constructed using the pig scripts available at: https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus or https://github.com/ogrisel/dbpediakit In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following: #1 - Define the mlt handles in solrconfig.xml (it's not defined in the example solrconfig.xml I was using): <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> #2 - with Solrj, access the mlt handler via something similar to the following: query.setQueryType("/" + MoreLikeThisParams.MLT); query.set(MoreLikeThisParams.MATCH_INCLUDE, false); query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1); query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1); query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body"); query.setQuery("Your query here or in my case the unique key field:value"); was: Implementation plan: Use MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from DBpedia. Such an index can be constructed using the pig scripts available at: https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus In order to perform MoreLikeThis queries using the SolrJ API it is possible to do the following: #1 - Define the mlt handles in solrconfig.xml (it's not defined in the example solrconfig.xml I was using): <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> #2 - with Solrj, access the mlt handler via something similar to the following: query.setQueryType("/" + MoreLikeThisParams.MLT); query.set(MoreLikeThisParams.MATCH_INCLUDE, false); query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1); query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1); query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body"); query.setQuery("Your query here or in my case the unique key field:value"); > Enhancement Engine for Wikipedia/DBpedia-based topic classification of text > content > ----------------------------------------------------------------------------------- > > Key: STANBOL-197 > URL: https://issues.apache.org/jira/browse/STANBOL-197 > Project: Stanbol > Issue Type: New Feature > Components: Enhancer, Entity Hub > Reporter: Olivier Grisel > Assignee: Olivier Grisel > Labels: text-categorization > > Implementation plan: > Use MoreLikeThis queries on a SolrYard instance with topics indexed by > aggregating the text of abstracts of all entities marked categorized by a > given SKOS topic from DBpedia. > Such an index can be constructed using the pig scripts available at: > https://github.com/ogrisel/pignlproc/tree/master/examples/topic-corpus > or > https://github.com/ogrisel/dbpediakit > In order to perform MoreLikeThis queries using the SolrJ API it is possible > to do the following: > #1 - Define the mlt handles in solrconfig.xml (it's not defined in the example > solrconfig.xml I was using): > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" /> > #2 - with Solrj, access the mlt handler via something similar to the > following: > query.setQueryType("/" + MoreLikeThisParams.MLT); > query.set(MoreLikeThisParams.MATCH_INCLUDE, false); > query.set(MoreLikeThisParams.MIN_DOC_FREQ, 1); > query.set(MoreLikeThisParams.MIN_TERM_FREQ, 1); > query.set(MoreLikeThisParams.SIMILARITY_FIELDS, "subject,body"); > query.setQuery("Your query here or in my case the unique key field:value"); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira