[R] configure help

2006-12-19 Thread Fridolin Wild

Hello,

as I just spent a (too long) while searching
for a way how to persistantly switch back the help display
routines from chm to text, here a small documentation
how to do that with the windows version of R.

The windows installer asks which help type
you want to use, I wanted to test the chm
version -- which I didn't like.

If you after installation want to switch back,
you have to edit

 myRdirectory\etc\Rprofile.site 

and change the line

options(chmhelp=TRUE)

back to

options(chmhelp=FALSE)

It would be nice if -- in a future release -- this
also could be changed in the GUI settings.

Best,
Fridolin

-- 
Fridolin Wild, Institute for Information Systems and New Media,
Vienna University of Economics and Business Administration (WUW),
Augasse 2-6, A-1090 Wien, Austria
fon +43-1-31336-4488, fax +43-1-31336-746

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] configure help

2006-12-19 Thread Fridolin Wild

Hey Brian,

 But it has nothing to do with the GUI preferences: it applies to the 
 command-line version of R as well.

I was suspecting this argument ;) -- however, I think
from a usability point of view it should be possible
to re-set the default setting also somewhere *within*
the GUI if the GUI installer allows to set it. Maybe
in a different place?

 What about ?help did you not understand? It says

That is where I found how to set it (for my current
workspace). Finding where I can set it *permanently*
was more difficult (especially when you never had
the need to mingle with R-profiles before).

That's why I decided to document it here in the
mailinglist just in case others have troubles
finding it, too.

Best,
Fridolin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: new to R: don't understand errors

2006-10-04 Thread Fridolin Wild

Hello Jerad,

 It was suggested I contact you for possible help with this issue. Well,
 as you can see for the emails below, that is what I was told at R-help.
 Any insight to my lsa problems (also listed below) would be of great
 help.

from what I see, the problem probably indeed lies within the
textfiles: for performance reasons, it was not possible to
include any check routines that exclude a file if it contains
no words (or words below a docFrequency) and thus produces
an empty column-vector.

I am pretty sure that you do not want to use docFrequency
with a value like 50 (it would mean that a term in a document
is only included if it appears more than 50 times in *that*
document).

I will send you the alpha-release of the updated lsa package
in a separate message which also includes a parameter called
minGlobFreq which is filtering out terms that appear less
than x times in the whole document collection. I guess that is
what you were looking for.

Considering the sanitizing: if you set minDocFreq to 1
and set minWordLength to 1, you should not get an error
with your document collection as you then are basically
taking everything (even a single character appearing
only once). It probably is not so problematic as the
LSA step will anyway group this low-frequency terms
in a lower order factor. Of course you will still get
an error if you use documents that are completely empty,
so delete all 0 bytes documents beforehands.

I am thinking about what to do with this sanitizing part.
It is not a good idea to integrate that into the
textmatrix method -- it would slow things down
tremendously.

So what about this idea: does it make sense to provide a
sanitizing collection of methods that help to select the
files you want to work with (copy them to a different
directory or just return a list with the filenames of
the ones that are good)? What should we do with other
sanitizing options (deleting urls from texts, deleting
short words, etc.)?

Hope, I could be of help,

Best,
Fridolin

-- 
Fridolin Wild, Institute for Information Systems and New Media,
Vienna University of Economics and Business Administration (WUW),
Augasse 2-6, A-1090 Wien, Austria
fon +43-1-31336-4488, fax +43-1-31336-746

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sorting during xtabs? sorting by individual order?

2005-11-08 Thread Fridolin Wild

Hey alltogether,

refacturing a package (before it will be released),
I ran across the following problem.

I have two directories with different text files,
I want to read the first and construct a document-term
matrix from it (every term=word in a row, every file in
a column, occurrence frequencies form the values).

The second directory contains different files. It
needs to be read in to also construct a document-term
matrix -- however, in the same term-order to enable
similarity comparisons in a vector space of the
same format.

Let's make a (fake) example:

(1) support function

# directory 1 contains 2 files (F1  F2):
   F1 = c(word4, word3, word2)
   F2 = c(word1, word4, word2)

# directory 2 contains also 2 files (F3  F4):
   F3 = c(word1, word2, bla)
   F4 = c(word1, word2, word3)

# I file in the first directory, file by file,
# create triples of the format (file, word, 1)

F1tab = sort(table(F1), decreasing = TRUE)
F2tab = sort(table(F2), decreasing = TRUE)

# and create a dataframe

F1frame = data.frame( docs=F1, terms=names(F1tab),
  Freq = F1tab, row.names = NULL)
F2frame = data.frame( docs=F2, terms = names(F2tab),
  Freq = F2tab, row.names = NULL)

(2) textmatrix function

... to be bound together for every file and to be
converted with xtabs into a document term matrix:

dummy = list(F1frame, F2frame)
dtm = t(xtabs(Freq ~ ., data = do.call(rbind, dummy)))

=
   docs
terms   F1 F2
  word2  1  1
  word3  1  0
  word4  1  1
  word1  0  1

Now, when I want to re-use this to construct another
document-term matrix from files F3F4 -- with the same terms
in the exactly same order, firstly, I need to add

F3clean = F3[F3 %in% rownames(dtm)]
F4clean = F4[F4 %in% rownames(dtm)]

to keep unwanted terms from getting into the tabs.

And here is my problem:

I need to reformat the output document-term matrix
(as it would be given by another time running step 2
with F3clean and F4clean) to correspond with the given
order of the rownames(dtm) of the first directory.

How can I do this (not costly, the matrices I have to
deal with are usually really big)? Hopefully just
by adding s.th. to the xtabs function?

To make an example of what I need: I need dtm2
to look exactly like this (doc-order is not important):

=
   docs
terms   F3 F4
  word2  1  1
  word3  1  1
  word4  0  0
  word1  1  1

Can anybody help me?

Best,
Fridolin

-- 
Fridolin Wild, Institute for Information Systems and New Media,
Vienna University of Economics and Business Administration (WUW),
Augasse 2-6, A-1090 Wien, Austria
fon +43-1-31336-4488, fax +43-1-31336-746

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html