Hi Mara, since you were asking about ontologies, let me point you to our work on computational fact checking from knowledge networks PLoS ONE <http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128193>. We developed a measure of semantic similarity based on shortest paths between any two concepts of Wikipedia using the linked data from DBPedia; these the are links found in the infoboxes of Wikipedia articles; so it is a subset of the hyperlinks of the whole web page.
In the article we use it as a way to check simple relational statements, but it could be used for other uses too. And there are also a couple other approaches from the literature, which we cite in the paper, that could also be relevant for what you are doing. HTH! Giovanni Giovanni Luca Ciampaglia <http://glciampaglia.com> *∙* Assistant Research Scientist, Indiana University On Sun, Feb 19, 2017 at 2:56 PM, Mara Sorella <[email protected]> wrote: > Hi everybody, I'm new to the list and have been referred here by a comment > from a SO user as per my question [1], that I'm quoting next: > > > I > > > > * have been successfully able to use the Wikipedia pagelinks SQL dump to > obtain hyperlinks between Wikipedia pages for a specific revision > time.However, there are cases where multiple instances of such links exist, > e.g. the very same https://en.wikipedia.org/wiki/Wikipedia > <https://en.wikipedia.org/wiki/Wikipedia> page and > https://en.wikipedia.org/wiki/Wikimedia_Foundation > <https://en.wikipedia.org/wiki/Wikimedia_Foundation>. I'm interested to > find number of links between pairs of pages for a specific revision. Ideal > solutions would involve dump files other than pagelinks (which I'm not > aware of), or using the MediaWiki API.* > > > > To elaborate, I need this information to weight (almost) every hyperlink > between article pages (that is, in NS0), that was present in a specific > wikipedia revision (end of 2015), therefore, I would prefer not to follow > the solution suggested by the SO user, that would be rather impractical. > > Indeed, my final aim is to use this weight in a thresholding fashion to > sparsify the wikipedia graph (that due to the short diameter is more or > less a giant connected component), in a way that should reflect the > "relatedness" of the linked pages (where relatedness is not intended as > strictly semantic, but at a higher "concept" level, if I may say so). > For this reason, other suggestions on how determine such weights (possibly > using other data sources -- ontologies?) are more than welcome. > > The graph will be used as dataset to test an event tracking algorithm I am > doing research on. > > > Thanks, > > Mara > > > > > [1] http://stackoverflow.com/questions/42277773/number-of- > links-between-two-wikipedia-pages/ > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
