[R] Java vs. C speed

2007-05-11 Thread Peter Muhlberger
I tried writing a textual comparison program in R, but found that it
is too slow for my purposes.  I need to make about 145 million
comparisons of the word patterns in pieces of text.  I basically
compare vectors that contain count data on a multitude of words and
find ones that are similar to others (145million X 2 %in% comparisons,
a couple million multinomial estimates).  The best I could do in R
would take about 9 hours of computation time (a matrix solution bogs
down because of the size of the matrix:  10k by 17k; a looping
solution takes too long).  I've been looking into whether a Java or C
routine would be the best alternative.  Seems to be quite a debate on
the web about which is faster, and I don't know how much of that
debate is up to date.  Any impressions out there regarding whether
Java or C would be faster for this application and by how much?  I'd
have to learn quite a bit to implement either so I'd rather just work
on one.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A comment about R:

2006-01-05 Thread Peter Muhlberger
On 1/5/06 11:27 AM, Achim Zeileis [EMAIL PROTECTED] wrote:

 As John and myself seem to have written our replies in parallel, hence
 I added some more clarifying remarks in this mail:
 Note that the Anova() function, also in car, can more conveniently compute
 Wald tests for certain kinds of hypotheses. More generally, however, I'd be
 interested in your suggestions for an alternative method of specifying
 linear hypotheses.
 My understanding was that Peter just wants to eliminate various elements
 from the terms(obj) which is what waldtest() in lmtest supports. If some
 other way of specifying nested models is required, I'd also be interested
 in that.


My two most immediate problems were a) to test whether a set of coefficients
were jointly zero (as Achim suggests, though the complication here is that
the varcov matrix is bootstrapped), but also b) to test whether the average
of a set of coefficients was equal to zero.  At other points in time, I
remember having had to test more complex linear hypotheses involving joint
combinations of equality, non-zero, and 'averages.'  The Stata interface for
linear hypothesis tests is amazingly straightforward.  For example, after a
regression, I could use the following to test the joint hypothesis that
v1=v2 and the average (or sum) of v3 through v5 is zero and .75v6+.25v7 is
zero:

test v1=v2
test v3+v4+v5=0, accum
test .75*v6+.25*v7=0, accum

I don't even have to set up a matrix for my test ];-) !  The output would
show not merely the joint test of all the hypotheses but the tests along the
way, one for each line of commands.  I vaguely remember the hypothesis
testing command after an ml run is much the same and cross-equation
hypothesis tests simply involve adding an equation indicator to the terms.
I can get huberized var-cov matrices simply by adding robust to the
regression command.  I believe there's also a command that will huberize a
var-cov matrix after the fact.  Subsequent hypothesis tests would be on the
huberized matrix.

I won't claim to know what's good for R or the R community, but it would be
nice for me and perhaps others if there were a comparable straightforward
command as in Stata that could meet a variety of needs.  I need to play w/
the commands that have been suggested to me by you guys recently, but I'm
looking at a multitude of commands none of which I suspect have the
flexibility and ease of use of the above Stata commands, at least for the
kind of applications I'd like.  Perhaps the point of R isn't to serve as a
package for a wider set of non-statisticians, but if it wishes to develop in
that direction, facilities like this may be helpful.  It's interesting that
Achim points out that a function John suggests is already available in R--an
indication that even R experts don't have a complete handle on everything in
R even on a relatively straightforward topic like hypothesis tests.

John is no doubt right that editorializing about statistics would be out of
place on an R help page.  But when I have gone to statistical papers, many
have been difficult to access  not very helpful for practical concerns.
I'm glad to hear that Long and Erwin's paper is helpful, but there's a
goodly list of papers mentioned in help.  Perhaps something that would be
useful is some way of highlighting on a help page which reference is most
helpful for practical concerns?

Again, thanks for all the great input from everyone!

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wald tests and Huberized variances (was: A comment about R:)

2006-01-05 Thread Peter Muhlberger
Thanks Z, it's coming more into focus.  I don't know what would work, though
maybe it's not impossible to have a richer set of cross-references by
interest area--e.g. People interested in econometrics may wish to
examine  The views help in this regard, tho something in help itself
would be handy.

Peter

On 1/5/06 2:52 PM, Achim Zeileis [EMAIL PROTECTED] wrote:

 Peter:
 
 If R wants to bring in a wider audience, one thing that might help is a
 denser set of cross-references.  For example, perhaps lm's help should
 mention the econometrics view materials as well as other places to look for
 tests and procedures people may want to do w/ lm.  Another thought is that
 
 This is difficult because the core development team has to ensure a
 certain stability of the system and you wouldn't start cross-linking to
 potentially unstable contributed packages. Furthermore, what is obvious to
 you (or me) as further desired functionality for linear models might be
 completely counter-intuitive for someone in genomics or biostatistics or
 environmetrics or ... and you can't link to all of these without confusing
 everybody.
 
 perhaps the standard R package help should allow people to find
 non-installed but commonly used contributed packages and perhaps their help
 page contents.  A feature that would be very helpful for me is the capacity
 to search all the contents of help files, not just keywords that at times
 seem to miss what I'm trying to find.
 
 This is surely desirable but unfortunately not that simple to implement,
 you'll find some discussion in the list archives about this. However,
 there are various very helpful search facilites like RSiteSearch().
 
 Best,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Bug in bootcov; R 2.2

2006-01-04 Thread Peter Muhlberger
There's a bug in my version of bootcov.  I'm not sure whether to report it
here or in r-bugs, because it is in a contributed package.  The bug is
straightforward, so perhaps it has been reported, tho I found no reference
in a search of the archive.

Any attempt to run bootcov with both cluster and strata will throw an error
indicating that jadd is not defined (Error: object jadd not found).  The
source of the error is the line:

jadd - c(j, jadd)

jadd does not appear elsewhere in the code, so it is undefined.  I believe
the correct line is:

 j - c(j, obs.gci)



Information on my R version:

 R.Version()
$platform
[1] powerpc-apple-darwin7.9.0

$os
[1] darwin7.9.0

$system
[1] powerpc, darwin7.9.0

$status
[1] 

$major
[1] 2

$minor
[1] 2.0

$svn rev
[1] 35749

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] A comment about R:

2006-01-04 Thread Peter Muhlberger
I'm someone who from time to time comes to R to do applied stats for social
science research.  I think the R language is excellent--much better than
Stata for writing complex statistical programs.  I am thrilled that I can do
complex stats readily in R--sem, maximum likelihood, bootstrapping, some
Bayesian analysis.  I wish I could make R my main statistical package, but
find that a few stats that are important to my work are difficult to find or
produce in R.  Before I list some examples, I recognize that people view R
not as a statistical package but rather as a statistical programming
environment.  That said, however, it seems, from my admittedly limited
perspective, that it would be fairly easy to make a few adjustments to R
that would make it a lot more practical and friendly for a broader range of
people--including people like me who from time to time want to do
statistical programming but more often need to run canned procedures.  I'm
not a statistician, so I don't want to have to learn everything there is to
know about common procedures I use, including how to write them from
scratch.  I want to be able to focus my efforts on more novel problems w/o
reinventing the wheel.  I would also prefer not to have to work through a
couple books on R or S+ to learn how to meet common needs in R.  If R were
extended a bit in the direction of helping people like me, I wonder whether
it would not acquire a much broader audience.  Then again, these may just be
the rantings of someone not sufficiently familiar w/ R or the community of
stat package users--so take my comments w/ a grain of salt.

Some examples of statistics I typically use that are difficult to find and /
or produce or produce in a usefully formatted way in R--

Ex. 1)  Wald tests of linear hypotheses after max. likelihood or even after
a regression.  Wald does not even appear in my standard R package on a
search.  There's no comment in the lm help or optim help about what function
to use for hypothesis tests.  I know that statisticians prefer likelihood
ratio tests, but Wald tests are still useful and indeed crucial for
first-pass analysis.  After searching with Google for some time, I found
several Wald functions in various contributed R packages I did not have
installed.  One confusion was which one would be relevant to my needs.  This
took some time to resolve.  I concluded, perhaps on insufficient evidence,
that package car's Wald test would be most helpful.  To use it, however, one
has to put together a matrix for the hypotheses, which can be arduous for a
many-term regression or a complex hypothesis.  In comparison, in Stata one
simply states the hypothesis in symbolic terms.  I also don't know for
certain that this function in car will work or work properly w/ various
kinds of output, say from lm or from optim.  To be sure, I'd need to run
time-consuming tests comparing it with Stata output or examine the
function's code.  In Stata the test is easy to find, and there's no
uncertainty about where it can be run or its accuracy.  Simply having a
comment or see also in lm help or mle or optim help pointing the user to
the right Wald function would be of enormous help.

Ex. 2) Getting neat output of a regression with Huberized variance matrix.
I frequently have to run regressions w/ robust variances.  In Stata, one
simply adds the word robust to the end of the command or
cluster(cluster.variable) for a cluster-robust error.  In R, there are two
functions, robcov and hccm.  I had to run tests to figure out what the
relationship is between them and between them and Stata (robcov w/o cluster
gives hccm's hc0; hccm's hc1 is equivalent to Stata's 'robust' w/o cluster;
etc.).  A single sentence in hccm's help saying something to the effect that
statisticians prefer hc3 for most types of data might save me from having to
scramble through the statistical literature to try to figure out which of
these I should be using.  A few sentences on what the differences are
between these methods would be even better.  Then, there's the problem of
output.  Given that hc1 or hc3 are preferred for non-clustered data, I'd
need to be able to get regression output of the form summary(lm) out of
hccm, for any practical use.  Getting this, however, would require
programming my own function.  Huberized t-stats for regressions are
commonplace needs, an R oriented a little toward more everyday needs would
not require programming of such needs.  Also, I'm not sure yet how well any
of the existing functions handle missing data.

Ex. 3)  I need to do bootstrapping w/ clustered data, again a common
statistical need.  I wasted a good deal of time reading the help contents of
boot and Bootstrap, only to conclude that I'd need to write my own, probably
inefficient, function to bootstrap clustered data if I were to use boot.
It's odd that boot can't handle this more directly.  After more digging, I
learned that bootcov in package Design would handle the cluster bootstrap
and save 

[R] Bug in bootcov; R 2.2

2006-01-04 Thread Peter Muhlberger
I see that I need to send my bug report to the package maintainer.
Apologies for sending it to this list.  There's a lot to absorb from various
online pages when reporting a bug  I missed the part about sending to the
maintainer.

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Bootstrap w/ Clustered Data

2006-01-03 Thread Peter Muhlberger
Looks like I may have found a function that addresses my needs.  Bootcov in
Design handles bootstrapping from clustered data and will save the
coefficients.  I'm not entirely sure it handles clusters the way I'd like,
but I'm going through the code.  If it doesn't, it looks easily
re-writeable.  As far as I can tell, boot in package boot would do clusters
only if the estimation function passed to it pastes together data based on
the clusters boot selects.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Bootstrap w/ Clustered Data

2006-01-02 Thread Peter Muhlberger
Are there any functions in R for running bootstraps with clustered (as
opposed to stratified) data?  I can't seem to find anything obvious in boot
or Bootstrap, though I imagine boot can be manipulated to resample from
clusters.  Is that what people use?

I do see some cluster bootstrap resampling in glsD, but I need
non-parameteric resampling  the capacity to run a sem model.

Thanks in advance,

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Testing a linear hypothesis after maximum likelihood

2005-12-22 Thread Peter Muhlberger
I'd like to be able to test linear hypotheses after setting up and running a
model using optim or perhaps nlm.  One hypothesis I need to test are that
the average of several coefficients is less than zero, so I don't believe I
can use the likelihood ratio test.

I can't seem to find a provision anywhere for testing linear combinations of
coefficients after max. likelihood.

Cheers  happy holidays,

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ML optimization question--unidimensional unfolding scaling

2005-11-03 Thread Peter Muhlberger
Hi Spencer  Andy:  Thanks for your thoughtful input!  I did at one point
look at the optim() function  run debug on it (wasn't aware of
browser--that's helpful!).  My impression is that optim() simply calls a C
function that handles the maximization.  So if I want to break out of my
likelihood function to restart optim() w/ new values, it seems I'd have to
somehow communicate to C that it's time to stop.  May need to rewrite the C,
with which I'm not familiar--Java yes, so maybe when I have some real free
time

Another possibility might be finding some jerry-rigged way to break out of
optim.  Maybe if I tell the likelihood function to freeze its returned value
at some point, optim will conclude it's done and stop.  Probably inefficient
 I will have the problem of telling when the break point ought to occur.
Just wish there were some programmatic way to say 'stop this and return
control to the higher-level calling function 'blah''.

A third possibility is one suggested by Spencer who seems to think it's ok
for the routine to pursue multiple branches w/o restarting, hence no restart
problem.  But w/ Newtonian-style convergence the latent scale values (which
are parameters to be estimated) have current positions  are supposed to
smoothly move toward lower likelihood values.  What will happen in branched
convergence, however, is that some of the latent values will prove to have
better values on the other side of a normal curve from their current
position.  My guess is that this will cause the likelihood function to make
a sudden, non-continuous jump not predictable by derivatives, which may mean
it can't converge properly.

Spencer's MDS alternative is intriguing  I'll need to think more about it.
Maybe I should also consider full-out Bayesian Monte Carlo methods (if I
have time), which would simultaneously explore the whole solution space.

Thanks,
Peter

On 11/2/05 9:01 PM, Spencer Graves [EMAIL PROTECTED] wrote:

  Have you looked at the code for optim?  If you execute optim, it
 will list the code.  You can copy that into a script file and walk
 through it line by line to figure out what it does.  By doing this, you
 should be able to find a place in the iteration where you can test both
 branches of each bifurcation and pick one -- or keep a list of however
 many you want and follow them all more or less simultaneously, pruning
 the ones that seem too implausible.  Then you can alternate between a
 piece of the optim code, bifurcating and pruning, adjusting each and
 printing intermediate progress reports to help you understand what it's
 doing and how you might want to modify it.
 
  With a bit more effort, you can get the official source code with
 comments.  To do that, I think you go to www.r-project.org - CRAN -
 (select a local mirror) - Software:  R sources.  From there, just
 download The latest release:  R-2.2.0.tar.gz.
 
  For more detailed help, I suggest you try to think of the simplest
 possible toy problem that still contains one of the issues you find most
 difficult.  Then send that to this list.  If readers can copy a few
 lines of R code from your email into R and try a couple of things in
 less than a minute, I think you might get more useful replies quicker.

On 11/3/05 8:08 AM, Liaw, Andy [EMAIL PROTECTED] wrote:

 Alternatively, just type debug(optim) before using it, then step through it
 by hitting enter repeatedly...
 
 When you're done, do undebug(optim).

On 11/3/05 11:06 AM, Liaw, Andy [EMAIL PROTECTED] wrote:

 Essentially all that debug() does is like inserting browser() as the first
 line of the function being debug()ed.  You can type just about any command
 at the browser prompt, e.g., for checking data, etc.  ?browser has list of
 special commands for the browser prompt.
 
 Andy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ML optimization question--unidimensional unfolding scalin g

2005-11-03 Thread Peter Muhlberger
Hi Spencer:  Just realized I may have misunderstood your comments about
branching--you may have been thinking about a restart.  Sorry if I
misrepresented them.

See below:


On 11/3/05 11:03 AM, Spencer Graves [EMAIL PROTECTED] wrote:

 Hi, Andy and Peter:
 
  That's interesting.  I still like the idea of making my own local
 copy, because I can more easily add comments and test ideas while
 working through the code.  I haven't used debug, but I think I should
 try it, because some things occur when running a function that don't
 occur when I walk through it line by line, e.g., parsing the call and
 ... arguments.
 

Debug's handy tho I think it is line by line.

  Two more comments on the original question:
 
  1.  What is the structure of your data?  Have you considered
 techniques for Multidimensional Scaling (MDS)?  It seems that your
 problem is just a univariate analogue of the MDS problem.  For metric
 MDS from a complete distance matrix, the solution is relatively
 straightforward computation of eigenvalues and vectors from a matrix
 computed from the distance matrix, and there is software widely
 available for the nonmetric MDS problem.  For a terse introduction to
 that literature, see Venables and Ripley (2002) Modern Applied
 Statistics with S, 4th ed. (Springer, distance methods in sec. 11.1,
 pp. 306-308).
 

I was looking for something on MDS in R, that'll be handy!

The data structure is a set of variables (say about 6) that I have reason to
believe measure an underlying dimension.  I suspect that several of the
variables are unfolding--that is, they have their highest value for some
point on the scale and fall off w/ distance from that point in either
direction.  The degree of fall-off may vary depending on the variable.  Some
seem to fall off very rapidly, others not.  A couple variables probably
monotonically increase w/ the underlying scale, so they don't unfold.  I can
construct a distance matrix consisting of distances between these variables.

Do you think MDS might be able to handle an arrangement like this, w/ some
values folded about a scale point and with drop-off varying between
variables?  The distances between the variables do not map in any
straightforward way into distances on the underlying scale because of
folding and non-linearity.

  2.  If you don't have a complete distance matrix, might it be
 feasible to approach the problem starting small and building larger,
 i.e., start with 3 nodes, then add a fourth, etc.?
 

Not sure I follow, but I do have a complete distance matrix of distances
between the variables.

  spencer graves

Thanks,

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ML optimization question--unidimensional unfolding scaling

2005-10-13 Thread Peter Muhlberger
Hi Spencer:  Thanks for your interest!  Also, the posting guide was helpful.

I think my problem might be solved if I could find a way to terminate nlm or
optim runs from within the user-given minimization function they call.
Optimization is unconstrained.

I'm essentially using normal like curves that translate observed values on a
set of variables (one curve per variable) into latent unfolded values.  The
observed values are on the Y-axis  the latent (hence parameters to be
estimated) are on the X-axis.  The problem is that there are two points into
which an observed value can map on a curve--one on either side of the curve
mean.  Only one of these values actually will be optimal for all observed
variables, but it's easy to show that most estimation methods will get stuck
on the non-optimal value if they find that one first.  Moving away from that
point, the likelihood gets a whole lot worse before the routine will 'see'
the optimal point on the other side of the normal curve.

SANN might work, but I kind of wonder how useful it'd be in estimating
hundreds of parameters--thanks to that latent scale.

My (possibly harebrained) thought for how to estimate this unfolding using
some gradient-based method would be to run through some iterations and then
check to see whether a better solution exists on the 'other side' of the
normal curves.  If it does, replace those parameters with the better ones.
Because this causes the likelihood to jump, I'd probably have to start the
estimation process over again (maybe).  But, I see no way from within the
minimization function called by NLM or optim to tell NLM or optim to
terminate its current run.  I could make the algorithm recursive, but that
eats up resources  will probably have to be terminated w/ an error.

Peter


On 10/11/05 11:11 PM, Spencer Graves [EMAIL PROTECTED] wrote:

  There may be a few problems where ML (or more generally Bayes) fails
 to give sensible answers, but they are relatively rare.
 
  What is your likelihood?  How many parameters are you trying to
 estimate?
 
  Are you using constrained or unconstrained optimization?  If
 constrained, I suggest you remove the constraints by appropriate
 transformation.  When considering alternative transformations, I
 consider (a) what makes physical sense, and (b) which transformation
 produces a log likelihood that is more close to being parabolic.
 
  Hou are you calling optim?  Have you tried all SANN as well as
 Nelder-Mead, BFGS, and CG?  If you are using constrained
 optimization, I suggest you move the constraints to Inf by appropriate
 transformation and use the other methods, as I just suggested.
 
  If you would still like more suggestions from this group, please
 provide more detail -- but as tersely as possible.  The posting guide
 is, I believe, quite useful (www.R-project.org/posting-guide.html).
 
  spencer graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix calculations in R--erroneous?

2005-10-08 Thread Peter Muhlberger

Hi Spencer:  Thanks!  This gives me a number of other ways of thinking about
this problem.  My one concern is that these approaches would also run into
some difficulties with how long it takes to calculate.  I'm interested not
in a single value but a matrix of over 300k values that has to be recomputed
more than a million times.  If I had to apply ifelse or floor to each of
these, it might take too long.  I'll have to see.

Peter

On 10/7/05 5:55 PM, Spencer Graves [EMAIL PROTECTED] wrote:

  Rather than adding 1e-15 to all numbers, I suggest you simply make
 that the floor.  (Or use .Machine$double.eps or 2*.Machine$double.eps in
 place of 1e-15.)
 
  Another alternative that may or may not apply in your case is to
 develop an asymptotic expansion for the log(likelihood) for the small
 numbers.  I've had good success with this kind of method.  For example,
 consider the Box-Cox transformation:
 
  bc(y, b) = (y^b-1)/b
 
  What do we do with b = 0?  We can test for b = 0 and replace those
 cases by the limit log(y).  However, it is numerically more stable to
 use the following:
 
  bc(y, b) = ifelse(abs(b*log(y)).Machine$double.eps,
 (expm1(b*log(y))/b, log(y)).
 
  I don't have time to study your example to see if I could see
 anything like this that could be done, but I think there should be a
 good chance of finding something like this.  Of course, if there are
 only very few 0's, then it hardly matters.  However, if there are quite
 a few, then you need something like this.
 
  hope this helps.
  spencer graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Matrix calculations in R--erroneous?

2005-10-07 Thread Peter Muhlberger
Does anyone know how -log(x) can equal 743 but -log(x+0)=Inf?  That's what
the following stream of calculations suggest:

Browse[2] -log (   1e-323+yMat2 - yMat1 * logitShape(matrix(parsList$Xs,
nrow = numXs, ncol=numOfCurves), matrix(means, nrow = numXs,
ncol=numOfCurves, byrow=TRUE), matrix(sigmas, nrow = numXs,
ncol=numOfCurves, byrow=TRUE))   )[5,9]
[1] Inf

Yet:

Browse[2] logitShape(matrix(parsList$Xs, nrow = numXs, ncol=numOfCurves),
matrix(means, nrow = numXs, ncol=numOfCurves, byrow=TRUE), matrix(sigmas,
nrow = numXs, ncol=numOfCurves, byrow=TRUE))[5,9]
[1] 1

So, the logitShape component equals 1.

Browse[2] yMat1[5,9]
[1] 1

So yMat1[5,9]*logitShape()[5,9]=1

Browse[2] yMat2[5,9]
[1] 1

So, yMat2[5,9]-yMat1[5,9]*logitShape()[5,9]=0

Browse[2] -log (   1e-323)
[1] 743.7469

So, -log( 1e-323)=743 while -log( 1e-323+0)=Inf ?

Any idea of a neat way to get around this?  Even if I put in 1e-50 I still
get Inf.  I deliberately included 1e-323 to insure the function didn't go to
infinity.

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix calculations in R--erroneous?

2005-10-07 Thread Peter Muhlberger
Hi Thomas:  Thanks!  Yes, the function
(yMat2[5,9]-yMat1[5,9]*logitShape()[5,9]) appears to be producing a value of
-1.102216e-16 rather than 0.  I would have thought it would approach 0 from
above given that all input values are at or above zero, but evidently not.

The max function won't do the trick because I need the entire matrix.  I
could do one cell at a time, but this is part of a ML routine that needs to
be evaluated hundreds of thousands of times, so I can't afford to slow it
down that much.

I guess I can add 1e-15 rather than e-323, but wonder what that might end up
doing to my estimates.  Guess I'll find out.

Cheers,  Peter

On 10/7/05 1:12 PM, Thomas Lumley [EMAIL PROTECTED] wrote:

 On Fri, 7 Oct 2005, Peter Muhlberger wrote:
 
 Does anyone know how -log(x) can equal 743 but -log(x+0)=Inf?  That's what
 the following stream of calculations suggest:
 
 Browse[2] -log (   1e-323+yMat2 - yMat1 * logitShape(matrix(parsList$Xs,
 nrow = numXs, ncol=numOfCurves), matrix(means, nrow = numXs,
 ncol=numOfCurves, byrow=TRUE), matrix(sigmas, nrow = numXs,
 ncol=numOfCurves, byrow=TRUE))   )[5,9]
 [1] Inf
 
 Yet:
 
 Browse[2] logitShape(matrix(parsList$Xs, nrow = numXs, ncol=numOfCurves),
 matrix(means, nrow = numXs, ncol=numOfCurves, byrow=TRUE), matrix(sigmas,
 nrow = numXs, ncol=numOfCurves, byrow=TRUE))[5,9]
 [1] 1
 
 So, the logitShape component equals 1.
 
 to within 2e-16
 
 Browse[2] yMat1[5,9]
 [1] 1
 
 So yMat1[5,9]*logitShape()[5,9]=1
 
 to within 2e-16
 
 Browse[2] yMat2[5,9]
 [1] 1
 
 to within 2e-16
 
 So, yMat2[5,9]-yMat1[5,9]*logitShape()[5,9]=0
 
 to within a few parts in 10^16
 
 You haven't actually shown us yMat2[5,9]-yMat1[5,9]*logitShape()[5,9],
 though
 
 Browse[2] -log (   1e-323)
 [1] 743.7469
 
 So, -log( 1e-323)=743 while -log( 1e-323+0)=Inf ?
 
 
 If 0 is really of the order of 1e-16 then this isn't surprising. If the
 only point of 1e-323 is as a guard value for 0 then use max(1e-323,
 yMat2[5,9]-yMat1[5,9]*logitShape()[5,9])
 
 
 -thomas
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] ML optimization question--unidimensional unfolding scaling

2005-10-03 Thread Peter Muhlberger
I'm trying to put together an R routine to conduct unidimensional unfolding
scaling analysis using maximum likelihood.  My problem is that ML
optimization will get stuck at latent scale points that are far from
optimal.  The point optimizes on one of the observed variables but not
others and for ML to move away from this 'local optimum', it has to move in
a direction in which the likelihood is decreasing, which it won't.

It's not hard to know where to look for a more optimal value--it'll be just
on the other side of the mean of a curve.  So, I can find better values, but
these values need to be fed back into ML for continued optimization.
Problem is, optim or nlm don't allow me to feed them new values for
parameters and in any event ML will likely choke w/ parameters jumping
around.  

One solution I've thought of is to restart optim or nlm w/ the new values
whenever a point jumps.  Is there any good way to get optim or nlm to
prematurely terminate, return control to the calling program, while
retaining a copy of the estimates?

Perhaps ML isn't the best approach for this kind of problem.  Suggestions
welcome!

Cheers,
Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] a question about linear mixed model in R

2005-01-19 Thread Peter Muhlberger
On 1/19/05 10:31 AM, Chung Chang [EMAIL PROTECTED] wrote:

 Thanks for your post.
 Yes, your example is indeed similar to my question.
 If i means group, j means individual(subject)

Isn't 'i' individual  j group?

 h:indicator(0:control;1:experiment) k:repeat(if no repeat then k=1)
 the the model is also X_hijk = alpha_h + h * b_i + r_(ij) + e_hijk.
 
 After I posted this question, I found out how to do it in R.
 So I would like to share with you guys and hear the comments from
 you. 
 X:response,b_i subject effect, r(ij) nested effect within subject
 
 lme(X~alpha_h,data=dataset,random=list(subject=~h-1,r=~1),method=ML
 ,na.action=na.omit)
 the fixed effect part is alpha_h
 the random effect is subject effect, the corresponding coefficient
 is h and -1 means no random intercept of subject.
 and random effect of r(nested effect within subject)
 Thanks for your help

Hi Chung Chang:  I gather that subject  r above are the grouping variables.
subject would indicate each individual participant, which probably means you
must have multiple observations per individual.  I'm not clear why you would
nest within subject for h but within r for the model constant, but then I
don't know details about your experiment.

Let me see, though, whether I can apply this to my own experiment
(simplified):

Participants engaged in a 2X2 experiment:

c1 (condition 1):  0, 1 indicator.  1=person participated in group-based
political discussion.  0=no group-based discussion (individual sit  think).

c2:  1=person received a citizenship prime.  0=no citizenship prime

c1  c2 are crossed to yield 4 cells.

grp=a 1:n variable indicating which discussion group a person was in, and
n+1 for those in no discussion (c1==0)

V=some continuous covariate of Y, but one whose coefficient I suspect may be
affected through discussion groups, for those people who were in discussion.

a=constant

lm Model (expanded for clarity):

Y ~ a + c1 + c2 + c1:c2 + V

If I understood correctly, you are suggesting the following lme Model:

Fixed Model for lme:

Y ~ a + c1 + c2 + c1:c2 + c1:V + V

list(grp = ~ c1 + c1:c2 + c1:V)

Is this what you had in mind?

Thanks,

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Function to modify existing data.frame--Improving R Language

2005-01-19 Thread Peter Muhlberger
Thomas  Jeff:  Thanks again for your thoughts.  The program Thomas suggests
below is elegant, but I was avoiding that because I assumed the memory
requirements and amount of time required for a large dataset would be
substantial.  Of course, it depends on what's happening 'under the hood.'
Perhaps mydata doesn't get copied and then replaced w/ a modified copy of
itself, as it seems.  R might simply have one copy  a list of updates in
memory.  I tried something like the program below w/ my data  it only takes
a couple seconds, so this looks like the elegant solution to my problem.
Thomas is right that there is a programming advantage to pass by value,
though I wonder whether for complex programming it would be enough to allow
a function to modify only one workspace object at a time.  I guess I'll see.
R is very slick.

Peter

On 1/19/05 1:31 PM, Thomas Lumley [EMAIL PROTECTED] wrote:

 I don't see why
mydata - some.program(mydata)
 is much less elegant than
mydata.someProgram()
 as a way of updating a data set. It may use more memory, but that wasn't
 the point at issue.
 
 Of course there are advantages to the ability to pass by reference, and
 disadvantages -- the most obvious disadvantage is that it is not easy to
 tell which variables are modified by a given piece of code.
 
 It probably wouldn't be that hard to produce something that looked like a
 data frame but was passed by reference, by wrapping it in a environment.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Function to modify existing data.frame

2005-01-18 Thread Peter Muhlberger
I'm used to statistical languages, such as Stata, in which it's trivial to
pass a list of variables to a function  have that function modify those
variables in the existing dataset rather than create copies of the variables
or having to replace the entire dataset to change a few variables.  In R, I
suppose I could paste together the right instructions in a function and then
execute it, but is there any more straightforward way of doing this I'm
missing?

Thx,

Peter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] a question about linear mixed model in R

2005-01-18 Thread Peter Muhlberger
Hi Chung Cheng:  This seems related to a problem I'm having in some data of
mine as well.  I'm new to R (played w/ it some a year ago)  to lme
modeling, so take this w/ a grain of salt, but here are some thoughts:

In my problem, D would be an indicator of whether a subject was in the
control condition or not.  In the control condition, all people participated
individually, in the experimental condition there was small-group based
discussion.  r(ij) would be some variable that affects the outcome, but
whose effect may be moderated by the group the discussion was in.

The model assumes that the non-control condition values will have a
distribution of coefficients for r(ij).  The coefficient for r(ij) in the
controls need not have the same central value as for the non-controls
(though it would be nice to be able constrain it so it would be).  So, it
might make some sense to split the variable into two variables, one with
zeros for the controls  one w/ zeros for the experimental groups and
estimate the former w/ random effects  the other not.

I'm not 100% sure that's what you're asking, but it seems related.

Peter

Dear all,

I have a somewhat unusual linear mixed model that I can't seem
to code in lme.  It's only unusual in that one random effect is
applied only to some of the observations (I have an indicator
variable
that specifies which observations have this random effect).

The model is:

X_hijk = alpha_h + h * b_i + r_(ij) + e_hijk , where

  h = 0 or 1 (indicator)
  i = 1, ..., N
  j = 1, ..., n_i
  k = 1, ..., K
alpha is fixed, and the rest are random.
I'm willing to assume b, r, and e are mutually independent
and normal with var(b) = sigma^2_b, var(r) = sigma^2_r, and
var(e) = sigma^2.

Any help in writing this model in lme() would be greatly
appreciated.

Thanks,

Chung Cheng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Book recommendations: Multilevel longitudinal analysis

2003-08-25 Thread Peter Muhlberger
Hi, does anyone out there have a recommendation for multilevel / random
effects and longitudinal analysis?

My dream book would be something that's both accessible to a
non-statistician but rigorous (because I seem to be slowly turning into a
statistician) and ideally would use R.

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Re: Programs stopped working--.print

2003-07-26 Thread Peter Muhlberger
Thank you to everyone who replied to my curious problem, which just got more
curious.  Today I closed my copy of R, opened up a different copy of .RData
(in another directory), one that didn't have the .print problem.  Worked
w/ that for a few minutes.  Then closed R again  restarted from the copy of
.RData that was giving me the .print problem in all my programs (and
consistently gave me that problem, even when I reloaded it yesterday).  All
those problems disappeared, even running exactly the same code I couldn't
get to work yesterday.  The .'s disappeared from the code as well.  Go
figure!  I guess I'll always keep a copy of my current .RData file, just in
case this happens again.  For those who asked, I'm working w/ R 1.7.0 on an
OS X 10.2.6 system.

Thanks again,

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Programs stopped working--.print (newbie question)

2003-07-25 Thread Peter Muhlberger
I must have messed up my R environment, but don't know how or how to undo
it.  The problem is this:

I paste the following into R:

test-function()
{
print(hello)
}

And I see this:

 test-function()
+ {
+ .print(hello)
+ }
 test()
Error in test() : couldn't find function .print
 

When I do the same in a fresh environment, I see this:

 test -function()
+ {
+ print(hello)
+ }

 test2()
[1] hello

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Programs stopped working--.print (newbie question)

2003-07-25 Thread Peter Muhlberger
On 7/25/03 5:53 PM, Spencer Graves [EMAIL PROTECTED] wrote:

 I see a period . before 'print' in your function definition.  Might
 this be the problem?
 
 spencer graves

The code I paste in has no . in front of 'print' .  But when the code
displays after I put it in, R puts a . in front of it.  I don't know why.

Thx,

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Problem w/ source

2003-07-24 Thread Peter Muhlberger
I'm trying to use the source command to run commands from a file.  For
instance:  source(do.R), where do.R is a file in the same directory in
which I am running R.

The contents of do.R are:

ls()
print(hello)
sum(y1)
mean(y1)


After  source(do.R), all I see is:

 source(do.R)
[1] hello


I'm using the X11 version of R for Mac OS X (downloadable binary).  Does
anyone know how to get source to work?

Thanks!

Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Problem w/ source

2003-07-24 Thread Peter Muhlberger
Thanks to everyone for their suggestions on getting source to print!  It
seems not everyone was aware of a couple options that gets source to print
out everything.  I'm now using the following command:

source(do.R, print.eval=TRUE, echo=TRUE)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Hypothesis testing after optim

2003-07-14 Thread Peter Muhlberger
Hi folks:  Does anyone know of a way to do (linear) hypothesis tests of
parameters after fitting a maximum-likelihood model w/ optim?  I can't seem
to find anything like a Wald test whose documentation says it applies to
optim output.  

Also, thanks again to everyone who gave me feedback on the robustness of ML
estimation in R!

Peter




Peter Muhlberger
Visiting Professor of Political Science
Institute for the Study of Information Technology and Society (InSITeS)
Carnegie Mellon University

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] How robust is mle in R?

2003-07-13 Thread Peter Muhlberger
A newbie question:  I'm trying to decide whether to run a maximum likelihood
estimation in R or Stata and am wondering if the R mle routine is reasonably
robust.  I'm fairly certain that, with this data, in Stata I would get a lot
of complaints about non-concave functions and unproductive steps attempted,
but would eventually have a successful ML estimate.  I believe that, with
the 'unproductive step' at least, Stata gets around the problem by switching
to some alternative estimation method in difficult cases.  Does anyone know
how robust mle is in R?

Thanks,
Peter

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help