[R] RE: Giving a first good impression of R to Social Scientists

Olivia Lau Fri, 13 Aug 2004 22:02:05 -0700

Dear Roland,

Have you looked at Zelig (http://gking.harvard.edu/zelig)?
Several professors in my department are going to use it to teach
R to political science undergraduates and graduate students this
fall.  We just presented it at the Political Methodology
meeting, and will present it again at the American Political
Science Association meeting, so we hope that other departments
will start to use Zelig as a teaching tool (for applied social
science in general, as an alternative to Stata, etc.).


Although a GUI would be good, students won't learn to use R that
way.  I think that the key to getting them to use the command
line interface is to draw an analogy between R and English (or
another language):  There are rules of syntax; here they are; if
you get a "syntax error", you should look for the following
common errors; here are some simple examples and demos that you
can/want to follow (because you're interested in the problem);
and here are the models in a logical format.

Social scientists aren't statisticians, but they're pretty
clever.  They probably had to learn at least one foreign
language in university, and they're probably pretty careful
writers in any language, so making R seem like just another
language will make R seem *easy* to use.

Yours,

Olivia Lau

> On Thu, 12 Aug 2004, Rau, Roland wrote:
> >
> > That is why would like to ask the experts on this list if
anyone of you has
> > encountered a similar experience and what you could advise
to persuade
> > people quickly that it is worth learning a new software?
>
> One problem is that it may not be true.  Unless these people
are going to
> be doing their own statistics in the future (which is probably
true only
> for a minority) they might actually be better off with a point
and click
> interface.  I'm (obviously) not arguing that SPSS is a better
statistical
> environment than R, but it is easier to learn, and in 10 or 15
weeks they
> may not get to see the benefits of R.
>
>
> -thomas
>
>
>
> ------------------------------
>
> Message: 12
> Date: Thu, 12 Aug 2004 16:24:28 +0100
> From: Barry Rowlingson <[EMAIL PROTECTED]>
> Subject: Re: [R] Giving a first good impression of R to Social
> Scientists
> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Thomas Lumley wrote:
> > On Thu, 12 Aug 2004, Rau, Roland wrote:
> >
> >>That is why would like to ask the experts on this list if
anyone of you has
> >>encountered a similar experience and what you could advise
to persuade
> >>people quickly that it is worth learning a new software?
> >
>
>   The usual way of teaching R seems to be bottom-up. Here's
the command
> prompt, type some arithmetic, make some assignments, learn
about
> function calls and arguments, write your own functions, write
your own
> packages.
>
>   Perhaps a top-down approach might help certain cases. People
using
> point-n-click packages tend to use a limited range of
analyses. Write
> some functions that do these analyses, or give them wrappers
so that
> they get something like:
>
>   > myData = readDataFile("foo.dat")
>     Read 4 variables: Z, Age, Sex, Disease
>
>   > analyseThis(myData, response="Z", covariate="Age")
>
>    Z = 0.36 * Age, Significance level = 0.932
>
>   or whatever. Really spoon feed the things they need to do.
Make it
> really easy, foolproof.
>
>   Then show them what's behind the analyseThis() function. How
its not
> even part of the R distribution. How easy you made it for a
beginner to
> do a complex and novel analysis. Then maybe it'll "click" for
them, and
> they'll see how having a programming language behind their
statistics
> functions lets them explore in ways not thought possible with
the
> point-n-click paradigm. Perhaps they'll start editing
analyseThis() and
> write analyseThat(), start thinking for themselves.
>
>   Or maybe they'll just stare at you blankly...
>
> Baz
>
>
>
> ------------------------------
>
> Message: 13
> Date: Thu, 12 Aug 2004 08:28:18 -0700 (PDT)
> From: Jason Liao <[EMAIL PROTECTED]>
> Subject: [R] truly object oriented programming in R
> To: [EMAIL PROTECTED]
> Message-ID:
<[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
>
> Good morning! I recently implemented a KD tree in JAVA for
faster
> kernel density estimation (part of the code follows). It went
well. To
> hook it with R, however, has proved more difficult. My
question is: is
> it possible to implement the algorithm in R? My impression
seems to
> indicate no as the code requires a complete class-object
framework that
> R does not support. But is there an R package or something
that may
> make it possible? Thanks in advance for your help.
>
> Java implementation of KD tree:
>
> public class Kdnode {
>
>         private double[] center; //center of the bounding box
>         private double diameter; //maximum distance from
center to
> anywhere within the bounding box
>         private int numOfPoints; //number of source data
points in the
> bounding box
>
>         private Kdnode left, right;
>
>
> public Kdnode(double[][] points, int split_dim, int [][]
> sortedIndices, double[][] bBox) {
>            //bBox: the bounding box, 1st row the lower bound,
2nd row
> the upper bound
>                 numOfPoints = points.length;
> int d = points[0].length;
>
>                 center = new double[d];
>                 for(int j=0; j<d; j++) center[j] =
> (bBox[0][j]+bBox[1][j])/2.;
>                 diameter = get_diameter(bBox);
>
> if(numOfPoints==1) {
>                   diameter = 0.;
>                   for(int j=0; j<d; j++) center[j] =
points[0][j];
>   left = null;
>   right = null;
> }
> else {
>                   int middlePoint =
> sortedIndices[split_dim][numOfPoints/2];
>   double splitValue = points[middlePoint][split_dim];
>
>                   middlePoint =
> sortedIndices[split_dim][numOfPoints/2-1];
>                   double splitValue_small =
> points[middlePoint][split_dim];
>
>   int left_size = numOfPoints/2;
>                   int right_size = numOfPoints - left_size;
>
>   double[][] leftPoints = new double[left_size][d];
>                   double[][] rightPoints = new
double[right_size][d];
>
>
>   int[][] leftSortedIndices = new int[d][left_size];
>   int[][] rightSortedIndices = new int[d][right_size];
>
>   int left_counter = 0, right_counter = 0;
>   int[] splitInfo = new int [numOfPoints];
>
>   for(int i = 0; i < numOfPoints; i++) {
>     if(points[i][split_dim] < splitValue) {
> for(int j=0; j<d; j++) leftPoints[left_counter][j] =
points[i][j];
>        splitInfo[i] = right_counter;
>                         left_counter++;
>                     }
>
>     else {
> for(int j=0; j<d; j++) rightPoints[right_counter][j] =
points[i][j];
> splitInfo[i] = left_counter;
>                         right_counter++;
>                     }
>                   }
> // modify appropriately the indices to correspond to the new
lists
> for(int i = 0; i < d; i++) {
> int left_index = 0, right_index = 0;
> for(int j = 0; j < numOfPoints; j++) {
> if(points[sortedIndices[i][j]][split_dim] < splitValue)
> leftSortedIndices[i][left_index++] = sortedIndices[i][j] -
> splitInfo[sortedIndices[i][j]];
> else    rightSortedIndices[i][right_index++] =
sortedIndices[i][j]
> - splitInfo[sortedIndices[i][j]];
>                                 }
> }
>
> // Recursively compute the kdnodes for the points in the two
> splitted spaces
> double[][] leftBBox = new double[2][];
> double[][] rightBBox = new double[2][];
>
>                         for(int i=0; i<2; i++) {
>                                 leftBBox[i] =
> (double[])bBox[i].clone();
>                                 rightBBox[i] =
> (double[])bBox[i].clone();
>                             }
>
>                         leftBBox[1][split_dim] =
splitValue_small;
>                         rightBBox[0][split_dim] = splitValue;
>
>                         int next_dim = (split_dim + 1) % (d);
> left = new Kdnode(leftPoints, next_dim, leftSortedIndices,
> leftBBox);
> right = new Kdnode(rightPoints, next_dim, rightSortedIndices,
> rightBBox);
> }
> }
>
>
>         public double evaluate(double[] target, double delta,
double
> bandwidth) throws Exception
>         {
>
>              double dis_2_center = Common.distance(target,
> center)/bandwidth;
>              double dm = diameter/bandwidth;
>
>              if(dis_2_center >= 1+dm) return 0.;
>              if(numOfPoints==1) return Common.K(dis_2_center);
>
>              /*if(dis_2_center<1)
>              {
>                  double temp2 =
dm*Common.KDeriv(dis_2_center);
>                  if(temp2<delta) return
> Common.K(dis_2_center)*numOfPoints;
>              } */
>
>              return left.evaluate(target,delta, bandwidth) +
> right.evaluate(target,delta, bandwidth);
>         }
>
>
>          public double get_diameter(double[][] bBox)
>         {
>             double value = 0., diff;
>             for (int i=0; i<bBox[0].length;i++)
>             {
>                 diff = (bBox[1][i] - bBox[0][i])/2.;
>                 value += diff*diff;
>             }
>             return Math.sqrt(value);
>         }
> }
>
> =====
> Jason Liao, http://www.geocities.com/jg_liao
> Dept. of Biostatistics, http://www2.umdnj.edu/bmtrxweb
> University of Medicine and Dentistry of New Jersey
> phone 732-235-5429, School of Public Health office
> phone 732-235-8611, Cancer Institute of New Jersey office
> moble phone 908-720-4205
>
>
>
> ------------------------------
>
> Message: 14
> Date: Thu, 12 Aug 2004 15:40:52 +0000 (UTC)
> From: Gabor Grothendieck <[EMAIL PROTECTED]>
> Subject: Re: [R] truly object oriented programming in R
> To: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
>
> Jason Liao <jg_liao <at> yahoo.com> writes:
>
> :
> : Good morning! I recently implemented a KD tree in JAVA for
faster
> : kernel density estimation (part of the code follows). It
went well. To
> : hook it with R, however, has proved more difficult. My
question is: is
> : it possible to implement the algorithm in R? My impression
seems to
> : indicate no as the code requires a complete class-object
framework that
> : R does not support. But is there an R package or something
that may
> : make it possible? Thanks in advance for your help.
>
> R comes with the S3 and S4 object systems out-of-the-box and
there is an
> addon package oo.R available at:
>
>    http://www.maths.lth.se/help/R/R.classes/
>
> that provides a more conventional OO system.   Its likely that
one or more
> of these would satisfy your requirements.
>
>
>
> ------------------------------
>
> Message: 15
> Date: Thu, 12 Aug 2004 17:56:05 +0200
> From: "Kahra Hannu" <[EMAIL PROTECTED]>
> Subject: RE: [R] linear constraint optim with
bounds/reparametrization
> To: "Spencer Graves" <[EMAIL PROTECTED]>, "Ingmar Visser"
> <[EMAIL PROTECTED]>
> Cc: Thomas Lumley <[EMAIL PROTECTED]>,
[EMAIL PROTECTED]
> Message-ID:
> <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset="iso-8859-1"
>
> >From Spencer Graves:
>
> >However, for an equality constraint, I've had good luck by
with an objective function that adds something like the
> >following to my objective function:
constraintViolationPenalty*(A%*%theta-c)^2, where
"constraintViolationPenalty" is
> >passed via "..." in a call to optim.
>
> I applied Spencer's suggestion to a set of eight different
constrained portfolio optimization problems. It seems to give a
usable practice to solve the portfolio problem, when the QP
optimizer is not applicable. After all, practical portfolio
management is more an art than a science.
>
> >I may first run optim with a modest value for
constraintViolationPenalty then restart it with the output of
the
> >initial run as starting values and with a larger value for
constraintViolationPenalty.
>
> I wrote a loop that starts with a small value for the penalty
and stops when the change of the function value, when increasing
the penalty, is less than epsilon. I found that epsilon = 1e-06
provides a reasonable accuracy with respect to computational
time.
>
> Spencer, many thanks for your suggestion.
>
> Hannu Kahra
>
>
>
> ------------------------------
>
> Message: 16
> Date: Thu, 12 Aug 2004 17:59:21 +0200
> From: Martin Maechler <[EMAIL PROTECTED]>
> Subject: Re: [R] error using daisy() in library(cluster). Bug?
> To: Javier Garcia - CEBAS <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=iso-8859-1
>
> [Reverted back to R-help, after private exchange]
>
> >>>>> "MM" == Martin Maechler <[EMAIL PROTECTED]>
> >>>>>     on Thu, 12 Aug 2004 17:12:01 +0200 writes:
>
> >>>>> "javier" == javier garcia <- CEBAS
<[EMAIL PROTECTED]>>
> >>>>>     on Thu, 12 Aug 2004 16:28:27 +0200 writes:
>
>     javier> Martin; Yes I know that there are variables with
all
>     javier> five values 'NA'. I've left them as they are just
>     javier> because of saving a couple of lines in the script,
>     javier> and because I like to see that they are there,
>     javier> although all values are 'NA'.  I don't expect they
>     javier> are used in the analysis, but are they the source
of
>     javier> the problem?
>
>     MM> yes, but only because of "stand = TRUE".
>
>     MM> Yes, one could imagine that it might be good when
>     MM> standardizing these "all NA variables" would work
>
>     MM> I'll think a bit more about it.  Thank you for the
>     MM> example.
>
> Ok. I've thought (and looked at the R code) a bit longer.
> Also considered the fact (you mentioned) that this worked in R
1.8.0.
> Hence, I'm considering the current behavior a bug.
>
> Here is the patch (apply to cluster/R/daisy.q in the *source*
>  or at the appriopriate place in
<cluster_installed>/R/cluster ) :
>
> --- daisy.q 2004/06/25 16:17:47 1.17
> +++ daisy.q 2004/08/12 15:23:26
> @@ -78,8 +78,8 @@
>      if(all(type2 == "I")) {
>   if(stand) {
>              x <- scale(x, center = TRUE, scale = FALSE) #->
0-means
> -            sx <- colMeans(abs(x))
> -            if(any(sx == 0)) {
> +     sx <- colMeans(abs(x), na.rm = TRUE)# can still have
NA's
> +     if(0 %in% sx) {
>                  warning(sQuote("x"), " has constant columns
",
>                          pColl(which(sx == 0)), "; these are
standardized to 0")
>                  sx[sx == 0] <- 1
>
>
> Thank you for helping to find and fix this bug.
> Martin Maechler, ETH Zurich, Switzerland
>
>     javier> El Jue 12 Ago 2004 15:11, MM escribi�:
>
>     >>> Javier, I could well read your .RData and try your
>     >>> script to produce the same error from daisy().
>     >>>
>     >>> Your dataframe is of dimension 5 x 180 and has many
>     >>> variables that have all five values 'NA' (see below).
>     >>>
>     >>> You can't expect to use these, do you?  Martin
>
>
>
> ------------------------------
>
> Message: 17
> Date: Thu, 12 Aug 2004 16:14:07 +0000 (UTC)
> From: Gabor Grothendieck <[EMAIL PROTECTED]>
> Subject: Re: [R] RE: Giving a first good impression of R to
Social
> Scientists
> To: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
>
> Rau, Roland <Rau <at> demogr.mpg.de> writes:
>
> > Yes, I do know the R-Commander. But I did not want to give
them a
> > GUI but rather expose them to the command line after I
demonstrated that the
> > steep learning curve in the beginning is worth the effort
for the final
> > results.
>
> Note that Rcmdr displays all the underlying generated R code
that does
> the analysis as it runs so you are exposed to the command
line.  This
> might pique the interest of students wishing to learn more
while giving
> an easy-to-use and immediately useful environment for those
who just want
> to get results in the shortest most direction fashion.
>
>
>
> ------------------------------
>
> Message: 18
> Date: Thu, 12 Aug 2004 09:25:07 -0700
> From: Seth Falcon <[EMAIL PROTECTED]>
> Subject: Re: [R] Approaches to using RUnit
> To: [EMAIL PROTECTED]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Aug 10, 2004 at 04:53:49PM +0200, Klaus Juenemann
wrote:
> > If you don't organize your code into packages but source
individual R
> > files your approach to source the code at the beginning of a
test file
> > looks the right thing to do.
>
> Appears to be working pretty well for me too ;-)
>
> > We mainly use packages and the code we use to test packages
A and B,
> > say, looks like
>
> SNIP
>
> > We use the tests subdirectory of a package to store our
RUnit tests
> > even though this is not really according to R conventions.
>
> In an off list exchange with A.J. Rossini, we discussed an
alternative
> for using RUnit in a package.  The idea was to put the
runit_*.R files
> (containing test code) into somePackage/inst/runit/ and then
put a
> script, say dorunit.R inside somePackage/test/ that would
create the
> test suite's similar to the code you included in your mail.
The
> advantage of this would be that the unit tests would run using
R CMD
> check.
>
> In the next week or so I hope to package-ify some code and try
this out.
>
>
> + seth
>
>
>
> ------------------------------
>
> Message: 19
> Date: Thu, 12 Aug 2004 12:25:03 -0400
> From: "Liaw, Andy" <[EMAIL PROTECTED]>
> Subject: RE: [R] Giving a first good impression of R to Social
> Scientists
> To: "'Barry Rowlingson'" <[EMAIL PROTECTED]>,
> "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Message-ID:
> <[EMAIL PROTECTED]>
> Content-Type: text/plain
>
> > From: Barry Rowlingson
> >
> > Thomas Lumley wrote:
> > > On Thu, 12 Aug 2004, Rau, Roland wrote:
> > >
> > >>That is why would like to ask the experts on this list if
> > anyone of you has
> > >>encountered a similar experience and what you could advise
> > to persuade
> > >>people quickly that it is worth learning a new software?
> > >
> >
> >   The usual way of teaching R seems to be bottom-up. Here's
> > the command
> > prompt, type some arithmetic, make some assignments, learn
about
> > function calls and arguments, write your own functions,
write
> > your own
> > packages.
> >
> >   Perhaps a top-down approach might help certain cases.
People using
> > point-n-click packages tend to use a limited range of
analyses. Write
> > some functions that do these analyses, or give them wrappers
so that
> > they get something like:
> >
> >   > myData = readDataFile("foo.dat")
> >     Read 4 variables: Z, Age, Sex, Disease
> >
> >   > analyseThis(myData, response="Z", covariate="Age")
> >
> >    Z = 0.36 * Age, Significance level = 0.932
> >
> >   or whatever. Really spoon feed the things they need to do.
Make it
> > really easy, foolproof.
>
> The problem is that the only `fool' that had been `proof'
against is the one
> that the developer(s) had imagined.  One cannot under-estimate
users'
> ability to out-fool the developers' imagination...
>
> Cheers,
> Andy
>
>
> >   Then show them what's behind the analyseThis() function.
> > How its not
> > even part of the R distribution. How easy you made it for a
> > beginner to
> > do a complex and novel analysis. Then maybe it'll "click"
for
> > them, and
> > they'll see how having a programming language behind their
statistics
> > functions lets them explore in ways not thought possible
with the
> > point-n-click paradigm. Perhaps they'll start editing
> > analyseThis() and
> > write analyseThat(), start thinking for themselves.
> >
> >   Or maybe they'll just stare at you blankly...
> >
> > Baz
> >
> > ______________________________________________
> > [EMAIL PROTECTED] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] RE: Giving a first good impression of R to Social Scientists

Reply via email to