I did exactly what you mentioned... tried subset of these documents and
found out there were some junk non-txt files which were causing this issue.
Everything worked fine with dirsource once I deleted them from the dir.
But I feel these functions should also tell what file they are failing
at.... I have ended up debugging with sub sets of input one too many times.
On Aug 18, 2013 9:01 AM, "Milan Bouchet-Valat" <nalimi...@club.fr> wrote:

> Le samedi 17 août 2013 à 11:16 -0700, Ajinkya Kale a écrit :
> > It contains all text files which were converted from doc, docx, ppt
> > etc. using libreoffice.
> > Some of them are non-english text documents.
> >
> >
> > Sorry I cannot share the corpus.. but if someone can shed light on
> > what might cause this error then I can try to eliminate those
> > documents if some specific docs are causing it.
> I think you should go the other way round: try with only one document
> and see if it works, and do enough attempts to find out in what cases it
> works and in what cases it fails. If it always fails, try with examples
> provided by tm, and then with parts of your documents.
>
> I don't think it makes sense to try to use VectorSource() as it would
> imply reimplementing DirSource().
>
>
> Regards
>
> > On Sat, Aug 17, 2013 at 9:55 AM, Milan Bouchet-Valat
> > <nalimi...@club.fr> wrote:
> >         Le vendredi 16 août 2013 à 19:35 -0700, Ajinkya Kale a écrit :
> >         > I am trying to use the text mining package ... I keep
> >         getting this error :
> >         >
> >         > rm(list=ls())
> >         > library(tm)
> >         > sourceDir <- "Z:\\projectk_viz\\docs_to_index"
> >         > ovid <- Corpus(DirSource(sourceDir),readerControl =
> >         list(language = "lat"))
> >         >
> >         > Error in if (vectorized && (length <= 0)) stop("vectorized
> >         sources must
> >         > have positive length") : missing value where TRUE/FALSE
> >         needed
> >         >
> >         > I am not sure what it means.
> >
> >         The posting guide asks for a reproducible example. If you
> >         cannot make
> >         available to us the contents of sourceDir, at least you should
> >         tell us
> >         what kind of files it contains. Have you tried with only some
> >         of the
> >         files the directory contains ?
> >
> >
> >         Regards
> >
> >         > --ajinkya
> >         >
> >         >       [[alternative HTML version deleted]]
> >         >
> >         > ______________________________________________
> >         > R-help@r-project.org mailing list
> >         > https://stat.ethz.ch/mailman/listinfo/r-help
> >         > PLEASE do read the posting guide
> >         http://www.R-project.org/posting-guide.html
> >         > and provide commented, minimal, self-contained, reproducible
> >         code.
> >
> >
> >
> >
> >
> > --
> >
> > Sincerely,
> > Ajinkya
> > http://ajinkya.info
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to