[R] configure help
Hello, as I just spent a (too long) while searching for a way how to persistantly switch back the help display routines from chm to text, here a small documentation how to do that with the windows version of R. The windows installer asks which help type you want to use, I wanted to test the chm version -- which I didn't like. If you after installation want to switch back, you have to edit myRdirectory\etc\Rprofile.site and change the line options(chmhelp=TRUE) back to options(chmhelp=FALSE) It would be nice if -- in a future release -- this also could be changed in the GUI settings. Best, Fridolin -- Fridolin Wild, Institute for Information Systems and New Media, Vienna University of Economics and Business Administration (WUW), Augasse 2-6, A-1090 Wien, Austria fon +43-1-31336-4488, fax +43-1-31336-746 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] configure help
Hey Brian, But it has nothing to do with the GUI preferences: it applies to the command-line version of R as well. I was suspecting this argument ;) -- however, I think from a usability point of view it should be possible to re-set the default setting also somewhere *within* the GUI if the GUI installer allows to set it. Maybe in a different place? What about ?help did you not understand? It says That is where I found how to set it (for my current workspace). Finding where I can set it *permanently* was more difficult (especially when you never had the need to mingle with R-profiles before). That's why I decided to document it here in the mailinglist just in case others have troubles finding it, too. Best, Fridolin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: new to R: don't understand errors
Hello Jerad, It was suggested I contact you for possible help with this issue. Well, as you can see for the emails below, that is what I was told at R-help. Any insight to my lsa problems (also listed below) would be of great help. from what I see, the problem probably indeed lies within the textfiles: for performance reasons, it was not possible to include any check routines that exclude a file if it contains no words (or words below a docFrequency) and thus produces an empty column-vector. I am pretty sure that you do not want to use docFrequency with a value like 50 (it would mean that a term in a document is only included if it appears more than 50 times in *that* document). I will send you the alpha-release of the updated lsa package in a separate message which also includes a parameter called minGlobFreq which is filtering out terms that appear less than x times in the whole document collection. I guess that is what you were looking for. Considering the sanitizing: if you set minDocFreq to 1 and set minWordLength to 1, you should not get an error with your document collection as you then are basically taking everything (even a single character appearing only once). It probably is not so problematic as the LSA step will anyway group this low-frequency terms in a lower order factor. Of course you will still get an error if you use documents that are completely empty, so delete all 0 bytes documents beforehands. I am thinking about what to do with this sanitizing part. It is not a good idea to integrate that into the textmatrix method -- it would slow things down tremendously. So what about this idea: does it make sense to provide a sanitizing collection of methods that help to select the files you want to work with (copy them to a different directory or just return a list with the filenames of the ones that are good)? What should we do with other sanitizing options (deleting urls from texts, deleting short words, etc.)? Hope, I could be of help, Best, Fridolin -- Fridolin Wild, Institute for Information Systems and New Media, Vienna University of Economics and Business Administration (WUW), Augasse 2-6, A-1090 Wien, Austria fon +43-1-31336-4488, fax +43-1-31336-746 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sorting during xtabs? sorting by individual order?
Hey alltogether, refacturing a package (before it will be released), I ran across the following problem. I have two directories with different text files, I want to read the first and construct a document-term matrix from it (every term=word in a row, every file in a column, occurrence frequencies form the values). The second directory contains different files. It needs to be read in to also construct a document-term matrix -- however, in the same term-order to enable similarity comparisons in a vector space of the same format. Let's make a (fake) example: (1) support function # directory 1 contains 2 files (F1 F2): F1 = c(word4, word3, word2) F2 = c(word1, word4, word2) # directory 2 contains also 2 files (F3 F4): F3 = c(word1, word2, bla) F4 = c(word1, word2, word3) # I file in the first directory, file by file, # create triples of the format (file, word, 1) F1tab = sort(table(F1), decreasing = TRUE) F2tab = sort(table(F2), decreasing = TRUE) # and create a dataframe F1frame = data.frame( docs=F1, terms=names(F1tab), Freq = F1tab, row.names = NULL) F2frame = data.frame( docs=F2, terms = names(F2tab), Freq = F2tab, row.names = NULL) (2) textmatrix function ... to be bound together for every file and to be converted with xtabs into a document term matrix: dummy = list(F1frame, F2frame) dtm = t(xtabs(Freq ~ ., data = do.call(rbind, dummy))) = docs terms F1 F2 word2 1 1 word3 1 0 word4 1 1 word1 0 1 Now, when I want to re-use this to construct another document-term matrix from files F3F4 -- with the same terms in the exactly same order, firstly, I need to add F3clean = F3[F3 %in% rownames(dtm)] F4clean = F4[F4 %in% rownames(dtm)] to keep unwanted terms from getting into the tabs. And here is my problem: I need to reformat the output document-term matrix (as it would be given by another time running step 2 with F3clean and F4clean) to correspond with the given order of the rownames(dtm) of the first directory. How can I do this (not costly, the matrices I have to deal with are usually really big)? Hopefully just by adding s.th. to the xtabs function? To make an example of what I need: I need dtm2 to look exactly like this (doc-order is not important): = docs terms F3 F4 word2 1 1 word3 1 1 word4 0 0 word1 1 1 Can anybody help me? Best, Fridolin -- Fridolin Wild, Institute for Information Systems and New Media, Vienna University of Economics and Business Administration (WUW), Augasse 2-6, A-1090 Wien, Austria fon +43-1-31336-4488, fax +43-1-31336-746 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html