> We are interested in data mining the > texts of these newspapers for changes that would be of interest to a > social psychologist. What I am particuarly wanting advice on is the > next step. We want to filter out all the invariant and useless junk > that you see in a newspaper's website. We want the text of the > articles and nothing else.
I think your best bet is to ask the sales department of the newspapers for the texts. Some newspapers offer print-versions of the articles, which may or may not be linked using URLs that follow a certain pattern. HTH, georg
