"It is often helpful to classify small parts of large articles and then somehow deal with these multiple classifications at the full document level."
The way I understand it is : For an article, classify the paragraphs (for example) and then use this first level classification result as features to classify the complete document. Am I correct? If yes, then a training set would also be needed at both paragraph level and document level, which I think would not be that easy to get. I think, the question is more about the reason behind choosing small pieces of documents for training or to just train by a single document which is the aggregation of all the training documents for a particular class. Please correct me if I am wrong. On Thu, Nov 1, 2012 at 9:39 PM, Ted Dunning <[email protected]> wrote: > Your mileage will vary. > > It is often helpful to classify small parts of large articles and then > somehow deal with these multiple classifications at the full document > level. > > Sometimes it is not helpful, especially if the small parts get too small. > > Try it both ways. My tendency is to prefer to classify book-sized things > at a level smaller than a chapter and sometimes as small as a paragraph. > Going below the paragraph level is usually bad. > > On Thu, Nov 1, 2012 at 3:23 AM, dennis zhuang <[email protected]> > wrote: > > > Hi,all > > > > I am using sgd classifier for our articles classification.I want to > > train a new model,but there is a problem.I can provide the learner a > large > > article or some small articles, but i extract only one vector for one > > article.Then i don't know is there any difference between one vector and > > many vectors for learner when training? Should i provide the learner one > > large article or many small articles? I can't find any documents about > > this,can anybody help me?Thanks. > > > > -- > > 庄晓丹 > > Email: [email protected] [email protected] > > Site: http://fnil.net > > Twitter: @killme2008 > > >
