"It is often helpful to classify small parts of large articles and then
somehow deal with these multiple classifications at the full document
level."

The way I understand it is :

For an article, classify the paragraphs (for example) and then use this
first level classification result as features to classify the complete
document.
Am I correct?

If yes, then a training set would also be needed at both paragraph level
and document level, which I think would not be that easy to get.

I think, the question is more about the reason behind choosing small pieces
of documents for training or to just train by a single document which is
the aggregation of all the training documents for a particular class.

Please correct me if I am wrong.

On Thu, Nov 1, 2012 at 9:39 PM, Ted Dunning <[email protected]> wrote:

> Your mileage will vary.
>
> It is often helpful to classify small parts of large articles and then
> somehow deal with these multiple classifications at the full document
> level.
>
> Sometimes it is not helpful, especially if the small parts get too small.
>
> Try it both ways.  My tendency is to prefer to classify book-sized things
> at a level smaller than a chapter and sometimes as small as a paragraph.
>  Going below the paragraph level is usually bad.
>
> On Thu, Nov 1, 2012 at 3:23 AM, dennis zhuang <[email protected]>
> wrote:
>
> > Hi,all
> >
> >    I am using sgd classifier for our articles classification.I want to
> > train a new model,but there is a problem.I can provide the learner a
> large
> > article or some small articles, but i extract only one vector for one
> > article.Then i don't know is  there any difference between one vector and
> > many vectors for learner when training? Should i provide the learner one
> > large article or many small articles? I can't find any documents about
> > this,can anybody help me?Thanks.
> >
> > --
> > 庄晓丹
> > Email:        [email protected] [email protected]
> > Site:           http://fnil.net
> > Twitter:      @killme2008
> >
>

Reply via email to