[R] Minor documentation issue

2006-04-21 Thread Vivek Satsangi
I looked at ?seq

--
-- Vivek Satsangi
Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Minor documentation issue

2006-04-21 Thread Vivek Satsangi
(Sorry about the last email which was incomplete. I hit 'send' accidentally).

I looked at ?seq. One of the forms given under Usage is seq(from).
This would be the form used if seq is called with only one argument.
However, this should actually say seq(to). For example,
 seq(1)
[1] 1
 seq(3)
[1] 1 2 3

Cheers,
--
-- Vivek Satsangi
Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (newbie) Weighted qqplot?

2006-03-15 Thread Vivek Satsangi
Folks,
Normally, in a data frame, one observation counts as one observation
of the distribution. Thus one can easily produce a CDF and (in Splus
atleast) use cdf.compare to compare the CDF (BTW: what is the R
equivalent of the SPlus cdf.compare() function, if any?)

However, if each point should not count equally, how can I weight the
points before comparing the distributions? I was thinking of somehow
creating multiple observations for each actual observation based on
weights and creating a new dataframe etc. -- but that seem excessive.
Surely there is a simpler way?

 x - rnorm(100)
 y - rnorm(10)
 xw - rnorm(100) * 1.73 # The weights. These won't add up to 1 or N or 
 anything because of missing values.
 yw - rnorm(10) * 6.23 # The weights. These won't add up to 1 or to the same 
 number as xw.
 # The question to answer is, how can I create a qq plot or cdf compare of x 
 vs. y, weighted by their weights, xw and yw (to eventually figure out if y 
 comes from the population x, similar to Kolmogorov-Smirnov GOF)?
 qqplot(x,y) # What now?

Thanks for any help,

--
-- Vivek Satsangi
Student, Rochester, NY USA

Life is short, the art long, opportunity fleeting, experiment
treacherous, judgement difficult.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] (newbie) Weighted qqplot?

2006-03-15 Thread Vivek Satsangi
Folks,
I am documenting what I finally did, for the next person who comes along...

Following Dr. Murdoch's suggestion, I looked at qqplot. The following
approach might be helpful to get to the same information as given by
qqplot.
To summarize the ask: given x, y, xw and yw, show (visually is okay)
whether a and b are from the same distribution. xw is the weight of
each x observation and yw is the weight of each y observation.

Put x and xw into a dataframe.
Sort by x.
Calculate cumulative x weights, normalized to total 1.

Put y and yw into a dataframe.
Sort by y
Calculate cumulative weights, normalized to total 1.

Plot x and y against cumulative normalized weights. The shapes of the
two lines should be similar (to the eye)-- or the distribution is
different.

Vivek

On 3/15/06, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 3/15/2006 8:31 AM, Vivek Satsangi wrote:
  Folks,
  Normally, in a data frame, one observation counts as one observation
  of the distribution. Thus one can easily produce a CDF and (in Splus
  atleast) use cdf.compare to compare the CDF (BTW: what is the R
  equivalent of the SPlus cdf.compare() function, if any?)
 
  However, if each point should not count equally, how can I weight the
  points before comparing the distributions? I was thinking of somehow
  creating multiple observations for each actual observation based on
  weights and creating a new dataframe etc. -- but that seem excessive.
  Surely there is a simpler way?
 
  x - rnorm(100)
  y - rnorm(10)
  xw - rnorm(100) * 1.73 # The weights. These won't add up to 1 or N or 
  anything because of missing values.
  yw - rnorm(10) * 6.23 # The weights. These won't add up to 1 or to the 
  same number as xw.
  # The question to answer is, how can I create a qq plot or cdf compare of 
  x vs. y, weighted by their weights, xw and yw (to eventually figure out if 
  y comes from the population x, similar to Kolmogorov-Smirnov GOF)?
  qqplot(x,y) # What now?

 qqplot doesn't support weights, but it's a simple enough function that
 you could write a version that did.  Look at the cases where length(x)
 is not equal to length(y):  e.g. if length(y)  length(x), qqplot
 constructs a linear approximation to a function mapping 1:nx onto the
 sorted x values, then takes length(y) evenly spaced values from that
 function.  You want to do the same sort of thing, except that instead of
 even spacing, you want to look at the cumulative sums of the weights.

 You might want to use some kind of graphical indicator of whether points
 are heavily weighted or not, but I don't know what to recommend for that.

 By the way, your example above will give negative weights in xw and yw;
 you probably won't like the results if you do that.

 Duncan Murdoch



--
-- Vivek Satsangi
Student, Rochester, NY USA

Life is short, the art long, opportunity fleeting, experiment
treacherous, judgement difficult.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (newbie) Accessing the pieces of a 'by' object

2006-03-07 Thread Vivek Satsangi
Folks,
I know that I can do the following using a loop. That's been a lot
easier for me to write and understand. But I am trying to force myself
to use more vectorized / matrixed code so that eventually I will
become a better R programmer.

I have a dataframe that has some values by Year, Quarter and Ranking.
The variable of interest is the return (F3MRet), to be weighted
averaged within the year, quarter and ranking. At the end, we want to
end up with a table like this:
year  quarter ranking1 ranking2 ... ranking10
1987 1 1.33 1.45 ... 1.99
1987 2 6.45 3.22 ... 8.33
.
.
2005 4 2.22 3.33 ... 1.22

The dataset is too large to post and I can't come up with a small
working example very easily.

I tried the Reshape() package and also the aggregate and reshape
functions. Those don't work too well becuase of the need to pass
weighted.mean a weights vector. I tried the by() function, but now I
don't know how to coerce the returned object into a matrix so that I
can reshape it.

 fvs_weighted.mean - function(y) weighted.mean(y$F3MRet, y$IndexWeight, 
 na.rm=T);
 tmp_byRet - by(dfReturns,
 list(dfReturns$Quarter,dfReturns$Year,dfReturns$Ranking),
 fvs_weighted.mean);

And various other ways to get the tmp_byRet object into a matrix were
tried, eg. unlist(), a loop like this:
dfRet - data.frame(tmp_byRet);
for(i in 1:dim(dfRet)[2]){
   dfRet[ ,i] - as.vector(dfRet[ ,i]);
}
In each case, I got some error or the other.

So, please help me get unstuck. How can I get the tmp_byRet() object
into a matrix or a dataframe?

--
-- Vivek Satsangi
Rochester, NY USA
No amount of sophistication is going to allay the fact that all your
knowledge is about the past and all your decisions are about the
future. -- Ian Wilson

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] (newbie) Accessing the pieces of a 'by' object

2006-03-07 Thread Vivek Satsangi
I am writing to document the answer for the next poor sod who comes along.

To get tmp_byRet() into a multi-dimentional matrix, copy the object
using as.vector(), then copy the dim and dimnames from tmp_byRet into
the new object. However, this may not be what you want, since you
probably want the values of the factors within the object (i.e. it
should be a dataframe, not a matrix).

To get tmp_byRet into a dataframe, use unique() to create a dataframe
with just the unique values of your factors. Add a new column to the
dataframe, where you will store the summary stats. Use a loop to
populate this vector. Then use reshape() on the dataframe to get it to
the shape you want it in. It is difficult at best to vectorize this
and avoid the loop  -- and trying to do so will lead to probably less
transparent code.

Vivek


On 3/7/06, Vivek Satsangi [EMAIL PROTECTED] wrote:
 Folks,
 I know that I can do the following using a loop. That's been a lot
 easier for me to write and understand. But I am trying to force myself
 to use more vectorized / matrixed code so that eventually I will
 become a better R programmer.

 I have a dataframe that has some values by Year, Quarter and Ranking.
 The variable of interest is the return (F3MRet), to be weighted
 averaged within the year, quarter and ranking. At the end, we want to
 end up with a table like this:
 year  quarter ranking1 ranking2 ... ranking10
 1987 1 1.33 1.45 ... 1.99
 1987 2 6.45 3.22 ... 8.33
 .
 .
 2005 4 2.22 3.33 ... 1.22

 The dataset is too large to post and I can't come up with a small
 working example very easily.

 I tried the Reshape() package and also the aggregate and reshape
 functions. Those don't work too well becuase of the need to pass
 weighted.mean a weights vector. I tried the by() function, but now I
 don't know how to coerce the returned object into a matrix so that I
 can reshape it.

  fvs_weighted.mean - function(y) weighted.mean(y$F3MRet, y$IndexWeight, 
  na.rm=T);
  tmp_byRet - by(dfReturns,
  list(dfReturns$Quarter,dfReturns$Year,dfReturns$Ranking),
  fvs_weighted.mean);

 And various other ways to get the tmp_byRet object into a matrix were
 tried, eg. unlist(), a loop like this:
 dfRet - data.frame(tmp_byRet);
 for(i in 1:dim(dfRet)[2]){
dfRet[ ,i] - as.vector(dfRet[ ,i]);
 }
 In each case, I got some error or the other.

 So, please help me get unstuck. How can I get the tmp_byRet() object
 into a matrix or a dataframe?

 --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Minor documentation improvement

2006-02-24 Thread Vivek Satsangi
Gentlemen,

In the documentation for reshape, in the function signature, the
argument direction is not listed. However, it is explained in the
explanation of parameters below.

I am using R 2.2.1.


Out of curiosity: Is the R core team still an all-male affair? I don't
think I have seen a single lady's name.
--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (Newbie) Aggregate for NA values

2006-02-24 Thread Vivek Satsangi
Folks,

Sorry if this question has been answered before or is obvious (or
worse, statistically bad). I don't understand what was said in one
of the search results that seems somewhat related.

I use aggregate to get a quick summary of the data. Part of what I am
looking for in the summary is, how much influence might the NA's have
had, if they were included, and is excluding them from the means
causing some sort of bias. So I want the summary stat for the NA's
also.

Here is a simple example session (edited to remove the typos I made,
comments added later):

 tmp_a - 1:10
 tmp_b - rep(1:5,2)
 tmp_c - rep(1:2,5)
 tmp_d - c(1,1,1,2,2,2,3,3,3,4)
 tmp_df - data.frame(tmp_a,tmp_b,tmp_c,tmp_d);
 tmp_df$tmp_c[9:10] - NA ;
 tmp_df
   tmp_a tmp_b tmp_c tmp_d
1  1 1 1 1
2  2 2 2 1
3  3 3 1 1
4  4 4 2 2
5  5 5 1 2
6  6 1 2 2
7  7 2 1 3
8  8 3 2 3
9  9 4NA 3
1010 5NA 4
 aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_b,tmp_df$tmp_c),mean);
  Group.1 Group.2 x
1   1   1 1
2   2   1 3
3   3   1 1
4   5   1 2
5   1   2 2
6   2   2 1
7   3   2 3
8   4   2 2
# Only one row for each (tmp_b, tmp_c) combination, NA's getting dropped.

 aggregate(tmp_df$tmp_d,by=list(tmp_df$tmp_c),mean);
  Group.1x
1   1 1.75
2   2 2.00

What I want in this last aggregate is, a mean for the values in tmp_d
that correspond to the tmp_c values of NA. Similarly, perhaps there is
a way to make the second last call to aggregate return the values of
tmp_d for the NA values of tmp_c also.

How can I achieve this?

--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Minor documentation improvement

2006-02-24 Thread Vivek Satsangi
Please ignore this message. I was not reading carefully enough, the
parameter is in there.

Vivek

On 2/24/06, Vivek Satsangi [EMAIL PROTECTED] wrote:
 Gentlemen,

 In the documentation for reshape, in the function signature, the
 argument direction is not listed. However, it is explained in the
 explanation of parameters below.

 I am using R 2.2.1.


 Out of curiosity: Is the R core team still an all-male affair? I don't
 think I have seen a single lady's name.
 --
 -- Vivek Satsangi
 Student, Rochester, NY USA



--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



Re: [R] R-help, specifying the places to decimal

2006-02-13 Thread Vivek Satsangi
In addition to round() mentioned earlier, if you are merely looking to
*display* your results differently, you may want to check out the
digits option, e.g. in summary():

(This is the method signature for data.frame 's):

 summary(object, maxsum = 7,
digits = max(3, getOption(digits)-3), ...)


(Begin quoted message)
Date: Mon, 13 Feb 2006 14:03:55 +0530
From: Subhabrata [EMAIL PROTECTED]
Subject: [R] R-help, specifying the places to decimal
To: r-help r-help@stat.math.ethz.ch
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain;   charset=iso-8859-1

Hello - R-experts,


Is there any way with which we can specify the number after
decimal point to take. Like I have a situation where
the values are comming 0.160325923 but I only want
4 place to decimal say 0.1603. Is there any way for that.

I am no expert in R- and this may sound simple to many.sorry


Thanks for any help.

With Regards

Subhabrata

--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bloomberg Data Import to R

2006-02-08 Thread Vivek Satsangi
Hi Sumanta,

1. This messages is much more appropriate for the sig-finance DL
instead. Consider signing up (I read up on Amba, so I am sure you have
good contributions to make in that forum).

2. To my knowledge, there isn't a direct package. However, if you use
Bloomberg's excel plugin, just get the data into excel, save and then
bring it in as usual. I suspect that that's what you are doing
already.

3. You may have better luck with the S-Plus plugins. I am just getting
started (an don't have any support/maintenace contract), so I don't
know what all Insightful has up its sleeve, but I talked to Carol
Wedekind about this just thing yesterday. Dr. Yollin, who also listens
in on the sig-finance list, may be able to advise you better about
what exists.

With warm regards,

Vivek

Message: 70
Date: Wed, 8 Feb 2006 15:51:13 +0530
From: Sumanta Basak [EMAIL PROTECTED]
Subject: [R] Bloomberg Data Import to R
To: r-help@stat.math.ethz.ch
Message-ID:
   [EMAIL PROTECTED]
Content-Type: text/plain

Hi R-Experts,

Can anyone tell me how Bloomberg data can be directly downloaded to R?
Is there any package?

Sumanta Basak.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (Newbie) Merging two data frames

2006-02-01 Thread Vivek Satsangi
This one is an easy question. I am looking for the idiomatic way to do it.

I have two large data frames. I want to merge them. What is the
idiomatic way to say match the rows from dataframe 1 to the rows in
dataframe2 which have the following fields the same: Identifier, Year
and Quarter? (These three fields form something like a composite
primary key in SQL). Then tell me which rows you could not find a
match for etc.

-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Possible improvement in lm

2006-01-18 Thread Vivek Satsangi
Folks,

I do a series of regressions (one for each quarter in the dataset) and
then go and extract the residuals from each stored lm object that is
returned as follows:

vResiduals - as.vector(unlist(resid(lQuarterlyRegressions[[i]])));

Here lQuarterlyRegressions is a vector of objects returned by lm().

Next, I may go find outliers using identify() on a plot or do some
other analysis which tells me which row of the quarterly data I need
to take a closer look at.

However, if I try to match some point in one of the quarters that I
have with its residual, then I have to drop the points from my
current Data which have NA's for either the explanatory variables or
the explained, so that the vector or residuals and the data have the
same indexes.

This lead to some serious confusion/bugs for me, and I am wondering if
it might not be better for lm to put an NA into those rows where the
point was dropped because of NA's in the explanatory or explained
variables (currently it just returns nothing at that index). Ofcourse,
there might be some arguments against this idea, and I would be
interested to hear them.

Thank you for your time and attention,


-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] / Operator not meaningful for factors

2006-01-15 Thread Vivek Satsangi
Folks,
I have a very basic question. The solution eludes me perhaps because
of my own lack of creativity. I am not attaching a fully reproducible
session because the issue may well be becuase of the way the data file
is, and the data file is large (and I don't know whether I can legally
distribute it). If people can suggest things that might be wrong in my
data or the way that I am reading it, I would be most grateful.

I get the following error message in the session quoted at the end of
this email:
/ not meaningful for factors in: Ops.factor(BookValuePS, Price)

As you can see in that some session, I check that the two vectors
being divided are numeric. I also check that the divisor is not 0 at
any index. I also believe that this is not because of the NA's in the
data. My question is, what are other problems that can cause the /
operator to not be meaningful?

I did try some simple examples to try to get the same error. However, 
I am not sure how to put the same NA's that one  gets from
read.table() into a vector:
 a - c(1, 2, 3, NA);
 a
[1]  1  2  3 NA
 b - c( 1, 2, 3, 4);
 c - b / a;
 b
[1] 1 2 3 4
 a - c(1, 2, 3, );
 c - b/a;
Warning message:
longer object length
is not a multiple of shorter object length in: b/a


 Quoted Session below 
  explainPriceSimplified - read.table(combinedClean.csv,
+sep = ,, header=TRUE);
 attach(explainPriceSimplified);
 summary(explainPriceSimplified);
 Symbol   Date  PriceEPS  
   BookValuePS
 XL :   98   Min.   :19870630   22 :   61   Min.   :-1.401e+05
  Min.   :-6.901e+05
 ZION   :   97   1st Qu.:19910930   26.5   :   61   1st Qu.: 4.650e-01
  1st Qu.: 3.892e+00
 YRCW   :   72   Median :19960331   27.5   :   58   Median : 1.060e+00
  Median : 7.882e+00
 AA :   71   Mean   :19957688   30 :   58   Mean   :-1.534e+01
  Mean   : 1.515e+02
 ABS:   71   3rd Qu.:20001231   25 :   56   3rd Qu.: 1.890e+00
  3rd Qu.: 1.444e+01
 ABT:   71   Max.   :20041231   (Other):29561   Max.   : 5.309e+03
  Max.   : 3.366e+06
 (Other):29624  NA's   :  249   NA's   : 2.460e+02
  NA's   : 4.760e+02
 FiscalQuarterRepF12MRet
 2004/2F:  482Min.   :-100.00
 2003/4F:  4711st Qu.:  -8.82
 2004/1F:  470Median :  10.57
 2004/3F:  470Mean   :  13.36
 2003/3F:  4643rd Qu.:  31.12
 2003/2F:  463Max.   :4700.00
 (Other):27284NA's   : 463.00
 mode(Price)
[1] numeric
 mode(EPS)
[1] numeric
 mode(BookValuePS)
[1] numeric
 BP - BookValuePS / Price ;
Warning message:
/ not meaningful for factors in: Ops.factor(BookValuePS, Price)
 which(Price==0)
numeric(0)



--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] / Operator not meaningful for factors

2006-01-15 Thread Vivek Satsangi
Sir,
I made the (incorrect, probably unjustified) deduction of using mode()
based on section 3.1 of An Introduction to R. Since the write up
talks about the mode of an object, and using attr() did not work (it
gives some error saying that mode of name must be character), I
tried mode() and reached this incorrect conclusion.

I have had this confusion for a while now about the fact that
something is numeric AND it is a factor, since if it were just a
vector and not a factor, it would still be numeric, as in:
 a - c (1, 2, 3);
 class(a);
[1] numeric

I'll try to think of a way to improve the explanation in An
Introduction to R so that the next person coming along does not fall
into the same pit.

Thank you for getting me unstuck,

Vivek

On 1/15/06, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 The mode of a factor is numeric, so your test does not do what you think
 it does.

 is.numeric() is the recommended test of a vector being numeric.  I have no
 idea where you got the idea that mode() was a useful test (perhaps you
 could give us the reference you used), but it rather rarely is (typeof is
 usually more informative).

 From the summary quoted, Price is clearly a factor.  Test it with
 is.factor.

 On Sun, 15 Jan 2006, Vivek Satsangi wrote:

  Folks,
  I have a very basic question. The solution eludes me perhaps because
  of my own lack of creativity. I am not attaching a fully reproducible
  session because the issue may well be becuase of the way the data file
  is, and the data file is large (and I don't know whether I can legally
  distribute it). If people can suggest things that might be wrong in my
  data or the way that I am reading it, I would be most grateful.
 
  I get the following error message in the session quoted at the end of
  this email:
  / not meaningful for factors in: Ops.factor(BookValuePS, Price)
 
  As you can see in that some session, I check that the two vectors
  being divided are numeric.

 (see the request above for your reference here)

  I also check that the divisor is not 0 at any index. I also believe that
  this is not because of the NA's in the data. My question is, what are
  other problems that can cause the / operator to not be meaningful?

 Why not test for factor, since that is what the very helpful error message
 told you the problem was?

  I did try some simple examples to try to get the same error. However,
  I am not sure how to put the same NA's that one  gets from
  read.table() into a vector:
  a - c(1, 2, 3, NA);
  a
  [1]  1  2  3 NA
  b - c( 1, 2, 3, 4);
  c - b / a;
  b
  [1] 1 2 3 4
  a - c(1, 2, 3, );
  c - b/a;
  Warning message:
  longer object length
 is not a multiple of shorter object length in: b/a
 
 
   Quoted Session below 
   explainPriceSimplified - read.table(combinedClean.csv,
  +sep = ,, header=TRUE);
  attach(explainPriceSimplified);
  summary(explainPriceSimplified);
  Symbol   Date  PriceEPS
BookValuePS
  XL :   98   Min.   :19870630   22 :   61   Min.   :-1.401e+05
   Min.   :-6.901e+05
  ZION   :   97   1st Qu.:19910930   26.5   :   61   1st Qu.: 4.650e-01
   1st Qu.: 3.892e+00
  YRCW   :   72   Median :19960331   27.5   :   58   Median : 1.060e+00
   Median : 7.882e+00
  AA :   71   Mean   :19957688   30 :   58   Mean   :-1.534e+01
   Mean   : 1.515e+02
  ABS:   71   3rd Qu.:20001231   25 :   56   3rd Qu.: 1.890e+00
   3rd Qu.: 1.444e+01
  ABT:   71   Max.   :20041231   (Other):29561   Max.   : 5.309e+03
   Max.   : 3.366e+06
  (Other):29624  NA's   :  249   NA's   : 2.460e+02
   NA's   : 4.760e+02
  FiscalQuarterRepF12MRet
  2004/2F:  482Min.   :-100.00
  2003/4F:  4711st Qu.:  -8.82
  2004/1F:  470Median :  10.57
  2004/3F:  470Mean   :  13.36
  2003/3F:  4643rd Qu.:  31.12
  2003/2F:  463Max.   :4700.00
  (Other):27284NA's   : 463.00
  mode(Price)
  [1] numeric
  mode(EPS)
  [1] numeric
  mode(BookValuePS)
  [1] numeric
  BP - BookValuePS / Price ;
  Warning message:
  / not meaningful for factors in: Ops.factor(BookValuePS, Price)
  which(Price==0)
  numeric(0)

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595



--
-- Vivek Satsangi
Student, Rochester, NY USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Suggested add to the documentation for the identify() function

2005-11-24 Thread Vivek Satsangi
Folks,

1. Is there a more appropriate list (r-devel?) for posting such
suggestions? I am a newbie to R, and doubtless will have some
suggestions for the documentation -- some good, others not quite so. I
would actually like to help give back to the community (I was
motivated by Prof. Ripley's 2001 talk in which he had commented that
open source software users rarely give back anything.) -- but I know
very little right now, so I might make things worse in some cases.

2. I would like to suggest adding the following to the examples
section of the help on the identify function:

Suppose you want to be able to remove some points from your analysis.

In its simplest form, Identify() will give you the row number of the
points that you  mark. Try running the following 3 commands:

plot.new()
plot(1:10, 1:10)
identify(x=1:10, y=1:10, n=10)

What  you will observe is that when you click on the points of the
plot , it will show  the row number of those points.

If you  are using some other function to produce your plot, identify
can work with that  as wellJust use the same vectors in the
arguments to plot and identify.

Next,  you can remove those outlier points from your data using -
  x1 - x[-c(3,5,7), ]

In  this case x is your orignal matrix and 3,5,7 are the row numbers
shown by  identify() for your outlier data points.

See also: Negative subscripts


3. My most sincere apologies for sending HTML in my email to the
distribution list the last time.

-- Vivek Satsangi

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cacheing in read.table/ attached data?

2005-11-20 Thread Vivek Satsangi
Disclaimer/Apology: I am an R newbie

I am seeing some behaviour that seems to me to be the result of some
cacheing going on at some level, and perhaps this is expected behaviour. I
would just like to understand the basic rules.

What I have is a file with some data. I read it in and then do a summary on
the resulting dataframe. I find the some values are completely outside the
expected range, these value need to be dropped from further analysis as
erroneous observations (yes, I apologize to the purists in advance :-) ).

If I do this and read the file again, then circlesPlot (from fBasics) two of
the columns in the data, then the plot is not updated. The outlier point is
still there. However, when I detach and reattach the dataframe, it seems to
work okay. For example,
# Plot has the outlier point in it.
# Edit the file, commenting out the outlier line, save, then...
 SG - read.table
(c:/Vivek/MFC/Data/SG/combinedSG.tdf,header=TRUE,sep=\t)
 SGm2 - lm(A3Yr ~ A10Holdings, data=SG)
 circlesPlot(A10Holdings,A3Yr, size=NetAssets)
 abline(coef(SGm2)) # Put the regression line on the plot
 SG - read.table
(c:/Vivek/MFC/Data/SG/combinedSG.tdf,header=TRUE,sep=\t)
 summary(SG) #Outlier does not show in the summary
 circlesPlot(A10Holdings,A3Yr, size=NetAssets) # ... But Plot still has the
outlier
 detach(SG)
 attach(SG)
 circlesPlot(A10Holdings,A3Yr, size=NetAssets) # Outlier is gone from the
plot

So, here are my questions:
1. Is there a simpler / more idiomatic way in R, than commenting out the
data in the data file to exclude some outliers in the data (i.e. to do data
trimming). In EViews this is done by setting the sample.
2. Is the flushing of the cache happening as a result of the
detach/attach, or some other reason?

Thanks for any help,

Vivek Satsangi

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html