date:20110712

Hi,

You don't provide us with a reproducible example, so I can't provide you with
actual code. But two approaches come to mind:
1. Create da2 with one row and n columns, then change the appropriate elements,
if any, based on your conditions.
2. Do the conditional parts, then check to see whether da2 is empty.
If it is, then
replace the empty data frame with a data frame of one row and n columns.

Sarah

On Tue, Jul 12, 2011 at 3:51 AM, Trying To learn again
tryingtolearnag...@gmail.com wrote:
 Hi all,

 I first create a matrix/data frame called d2 if another matrix
 accomplishes some restrictions dacc2

 da2-da1[colSums(dacc2)9,]
 da2-da2[(da2[,13]=24),]
 write.csv(da2, file =paste('hggi', i,'.csv',sep = ''))

 The thing is if finally da2 cannot get/passs the filters, it cannot writte a
 csv because there is no any true condition.

 How can I create anyway a csv with zeros of one row and n columns (being n
 the number of columns of da2?

 I need a loop?

Rarely.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric

Jessica,

This would be easier to solve if you gave us more information, like str(PE).

However, my guess is that your data somewhere has a nonnumeric value in that
column, so the entire column is being imported as factor. It's not
really awful -
R is converting those factor values to their numeric levels, just as you asked.

The best solution is to find and deal with the nonnumeric value before
you import
your data (something else you did not tell us about). Failing that, you may find
this useful:
as.numeric(as.character(PE[1, 90:99]))

Sarah

On Tue, Jul 12, 2011 at 4:38 AM, Jessica Lam ma_lk...@yahoo.com.hk wrote:
 Dear R user,

 After I imported data (csv format) in R, I called it out. But it is in
 non-numeric format.
 Then using as.numeric function.
 However, the output is really awful !

 PE[1,90:99]
           V90          V91          V92          V93          V94
 V95          V96          V97          V98          V99
 1  16.8467742   17.5853166   19.7400328   21.7277241   21.5015489
 19.1922102   20.3351524   18.1615471   18.5479946   16.8983887

 as.numeric(PE[1,90:99])
  [1] 11 10 11 10 11  9 10  9  9  8

  How can I solve the above problem??

 Thanks so much!
 Jessica

 --

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric

2011-07-12 Thread Jim Lemon


On 07/12/2011 06:38 PM, Jessica Lam wrote:

Dear R user,

After I imported data (csv format) in R, I called it out. But it is in
non-numeric format.
Then using as.numeric function.
However, the output is really awful !


PE[1,90:99]

V90  V91  V92  V93  V94
V95  V96  V97  V98  V99
1  16.8467742   17.5853166   19.7400328   21.7277241   21.5015489
19.1922102   20.3351524   18.1615471   18.5479946   16.8983887


as.numeric(PE[1,90:99])

  [1] 11 10 11 10 11  9 10  9  9  8

  How can I solve the above problem??


Hi Jessica,
Try

as.numeric(as.character(PE[1,90:99]))

If that works, your variable has probably managed to become a factor.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem finding p-value for entropy in reldist package

2011-07-12 Thread Uwe Ligges

Have you noted you sent your message to the R-help list only and forgot
to include the original poster? You also forgot to cite the original
question (and any other former parts of the thread as far as there was
any). Please do so when sending messages to a mailing list such as R-help.

Thanks,
Uwe Ligges

On 11.07.2011 19:24, VictorDelgado wrote:

Hi Amy Wesolowski,

I don't have a straightfoward answer to you question. I have been working
with reldist too, and the 'rpy' and 'rpluy' functions described by Applying
Relative Distribution Methods in R are also not working here in my 2.9.1
R-version. I think its because they are reldist internall function, so,
maybe its possible that they only work with previous objects and set ups...

But if you look to internal parametres of reldist, you could set ci =
TRUE, it's constructs the confidence interval for entropy by the proportion
of original cohort. It's still unhelpfull to understand how this intervall
is constructed, and also does not show the overall interval, but you can se
the values with $ci.

By the Handcock and Morris (1998) paper is posible to intuit that they are
comparing the 0.00 entropy with the 95% Confidence Interval around the
estimate. For example, in this artigle, pag. 74, they reach an overall
entropy of 0.125, the lower 95%_CI is 0.092. The 0.00 comparision is far
below this lower bound, so its resonable to think the p-value is realy
0.000.

But it's only a clue to approximate the true p-value. But we still needing
to see: 1) how this intervall is constructed (I have no idea what
distribution the entropy should have, and if it changes by data) and 2)
Knowing the first point, how to set alpha values).

Good luck,

Victor Delgado
cedeplar.ufmg.br P.H.D. student
www.fjp.mg.gov.br reseacher

--
View this message in context:
http://r.789695.n4.nabble.com/problem-finding-p-value-for-entropy-in-reldist-package-tp3659806p3660228.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] Gaussian low-pass filter

2011-07-12 Thread Juan Carlos Borrás

gfcoeffs - function(s, n) {
 t - seq(-n,n,1) ## assuming 2*n+1 taps
 return ( exp(-(t^2/(2*s^2)))/sqrt(2*pi*s^2) )
}

2011/6/29 Martin Wilkes m.wil...@worc.ac.uk:
 I want to filter my time series with a low-pass filter using a Gaussian 
 smoothing function defined as:

 w(t) = (2πσ^2)^0.5  exp(-t^2/2σ^2)

 I was hoping to use an existing function to filter my data but help.search 
 and Rsitesearch produced no useful results.

 Can anyone tell me if there is an existing function that will do the job?  If 
 not, how would I begin to go about building such a filter?

 Thanks

 Martin Wilkes
 University of Worcester

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Cheers,
jcb!
___
http://twitter.com/jcborras

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running R on a Computer Cluster in the Cloud - cloudnumbers.com

2011-07-12 Thread Barry Rowlingson

On Tue, Jul 12, 2011 at 7:15 AM, Markus Schmidberger schmi...@in.tum.de wrote:

 This is only a selection of our top features. To get more information
 check out our web-page (http://www.cloudnumbers.com/) or follow our blog
 about cloud computing, HPC and HPC applications (with R):
 http://cloudnumbers.com/blog

 Register and test for free now at cloudnumbers.com:
 http://my.cloudnumbers.com/register

 We are looking forward to get your feedback and consumer insights.

 Spam? Anyway, I quite like Dogbert's insights:

http://dilbert.com/strips/comic/2011-01-07/

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] timezones - any practical solution?

Hello all,

Could someone help me with the time zones in understandable  practical way?
I got completely stucked with this.
Have googled for a while and read the manuals, but without solutions...

---
When data imported  from Excel 2007 into R (2.13)
all time variables, depending on date (summer or winter) get (un-asked for
it!) a time zone addition CEST (for summer dates) or CET (for winter dates).

 Dataset
Start  End1  End2
days2End1.from.Exceldays2End2.from.Excel
days2End1.in.R   days2End2.in.R
1  2010-01-01  2011-01-01  2012-01-01
365 730
365 days  730. days
2  2010-02-01  2011-02-01  2012-01-01
365 699
365 days  699. days
3  2010-03-01  2011-03-01  2012-01-01
365 671
365 days  671. days
4  2010-04-01  2011-04-01  2012-01-01
365 640
 365 days  640.0417 days
5  2010-05-01  2011-05-01  2012-01-01
365 610
365 days  610.0417 days
6  2010-06-01  2011-06-01  2012-01-01
365 579
365 days  579.0417 days
7  2010-07-01  2011-07-01  2012-01-01
365 549
365 days  549.0417 days
8  2010-08-01  2011-08-01  2012-01-01
365 518
 365 days  518.0417 days
9  2010-09-01  2011-09-01  2012-01-01
365 487
365 days  487.0417 days
10 2010-10-01 2011-10-01  2012-01-01
365 457
 365 days  457.0417 days
11 2010-11-01 2011-11-01  2012-01-01
365 426
365 days  426. days
12 2010-12-01 2011-12-01  2012-01-01
365 396
365 days  396. days


Variables 'days2End1.from.Excel  and 'days2End2.from.Excel'   are alculated
in Excel.

Same calculation (with same outcome!) I would like to be able to perform
with R.

Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R.

 Dataset$days2End1.from.Excel
 [1] 365 365 365 365 365 365 365 365 365 365 365 365

 Dataset$days2End1.in.R - with(Dataset, End1- Start)
 Dataset$days2End1.in.R

Time differences in days
 [1] 365 365 365 365 365 365 365 365 365 365 365 365
attr(,tzone)
[1] 

 Dataset$days2End2.from.Excel
 [1] 730 699 671 640 610 579 549 518 487 457 426 396

 Dataset$days2End2.in.R - with(Dataset, End2- Start)

 Dataset$days2End2.in.R
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,tzone)
[1] 

Quastion 1:

As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at
time period April until October, when CEST (summer) times are recorded
 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417
giving decimals on days, where round days expected (640 610 579 549 518 487
457).


Can someone explain me how to deal with it in R?
What is the best way to calculate days in R getting correct calculations?

Question 2:

As I only need to work with dates without time and without time zones, I
would be happy  to remove them if possible.
I tried already the trunc() function but without succes. The result doesn't
change.

 Dataset$days2End2.in.R.TRUNC - with(Dataset, trunc(End2)- trunc(Start))
 Dataset$days2End2.in.R.TRUNC
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,tzone)
[1] 

I would be happy if someone could light up this thing.

Many thanks in advance!
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] time zone - any practical solution?

Hello all,

Could someone help me with the time zones in understandable  practical way?
I got completely stucked with this.

Have googled for a while and read the manuals, but without solutions...



---

When data imported  from Excel 2007 into R (2.13)

all time variables, depending on date (summer or winter) get (un-asked for
it!) a time zone addition CEST (for summer dates) or CET (for winter dates).



 Dataset
Start  End1  End2
days2End1.from.Exceldays2End2.from.Excel
days2End1.in.R   days2End2.in.R
1  2010-01-01  2011-01-01  2012-01-01
365 730
365 days  730. days
2  2010-02-01  2011-02-01  2012-01-01
365 699
365 days  699. days
3  2010-03-01  2011-03-01  2012-01-01
365 671
365 days  671. days
4  2010-04-01  2011-04-01  2012-01-01
365 640
365 days  640.0417 days
5  2010-05-01  2011-05-01  2012-01-01
365 610
365 days  610.0417 days
6  2010-06-01  2011-06-01  2012-01-01
365 579
365 days  579.0417 days
7  2010-07-01  2011-07-01  2012-01-01
365 549
365 days  549.0417 days
8  2010-08-01  2011-08-01  2012-01-01
365 518
365 days  518.0417 days
9  2010-09-01  2011-09-01  2012-01-01
365 487
365 days  487.0417 days
10 2010-10-01 2011-10-01  2012-01-01
365 457
365 days  457.0417 days
11 2010-11-01 2011-11-01  2012-01-01
365 426
365 days  426. days
12 2010-12-01 2011-12-01  2012-01-01
365 396
365 days  396. days

 Dataset$Start
 [1] 2010-01-01 CET  2010-02-01 CET  2010-03-01 CET  2010-04-01 CEST
2010-05-01 CEST 2010-06-01 CEST 2010-07-01 CEST 2010-08-01 CEST
2010-09-01 CEST 2010-10-01 CEST 2010-11-01 CET  2010-12-01 CET

 Dataset$End1
 [1] 2011-01-01 CET  2011-02-01 CET  2011-03-01 CET  2011-04-01 CEST
2011-05-01 CEST 2011-06-01 CEST 2011-07-01 CEST 2011-08-01 CEST
2011-09-01 CEST 2011-10-01 CEST 2011-11-01 CET  2011-12-01 CET

 Dataset$End2
 [1] 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET
2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET
2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET



Variables 'days2End1.from.Excel  and 'days2End2.from.Excel'   are calculated
in Excel.



Same calculation (with same outcome!) I would like to be able to perform
with R.



Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R.



 Dataset$days2End1.from.Excel
 [1] 365 365 365 365 365 365 365 365 365 365 365 365



 Dataset$days2End1.in.R - with(Dataset, End1- Start)

 Dataset$days2End1.in.R


Time differences in days
 [1] 365 365 365 365 365 365 365 365 365 365 365 365
attr(,tzone)
[1] 


 Dataset$days2End2.from.Excel
 [1] 730 699 671 640 610 579 549 518 487 457 426 396



 Dataset$days2End2.in.R - with(Dataset, End2- Start)



 Dataset$days2End2.in.R
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,tzone)
[1] 



Quastion 1:



As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at
time period April until October, when CEST (summer) times are recorded

 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417

giving decimals on days, where round days expected (640 610 579 549 518 487
457).





Can someone explain me how to deal with it in R?

What is the best way to calculate days in R getting correct calculations?



Question 2:



As I only need to work with dates without time and without time zones, I
would be happy  to remove them if possible.

I tried already the trunc() function but without succes. The result doesn't
change.



 Dataset$days2End2.in.R.TRUNC - with(Dataset, trunc(End2)- trunc(Start))

 Dataset$days2End2.in.R.TRUNC
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,tzone)
[1] 




I would be happy if someone could light up this thing.



Many thanks in advance!

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal,

Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding

2011-07-12 Thread Tomaz

Uwe Ligges ligges at statistik.tu-dortmund.de writes:

 
 
 On 12.07.2011 09:01, Tomaz wrote:
  Prof Brian Ripleyripleyat  stats.ox.ac.uk  writes:
 
 
  On Mon, 11 Jul 2011, Tomaz wrote:
 
  I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not
  process Rnw files with windows cp1250 encoding. Sweave complains:
 
  Which is of course not an ISO Standard encoding.  One way out is to use the
  ISO encoding latin2, which is supported.
 
 Have you read this and tried latin2?
 
 Uwe Ligges
 

Windows XP doesn't have latin2 locale and I think that Sweave should support all
locales (encodings) that are supported by latex package inputenc. If someone can
show me how can I set Rgui/Rterm, Emacs/ESS, Sweave/Latex to use utf8 that would
be helpful. My best current setup was to set all tools to use cp1250.

Best regards,

Tomaz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spectral Coherence

2011-07-12 Thread Joseph Park


   Thanks you! I should have realized that without explicitly engaging
   some form of averaging (which raises a windowing question) that the
   coh is always 1.
   On 7/12/2011 4:48 AM, Rolf Turner wrote:

 On 12/07/11 09:04, Joseph Park wrote:

 Greetings,
 I would like to estimate a spectral coherence between
 two timeseries. The stats : spectrum() returns a coh matrix
 which estimates coherence (squared).
 A basic test which from which i expect near-zero coherence:
 x = rnorm(500)
 y = rnorm(500)
 xts = ts(x, frequency = 10)
 yts = ts(y, frequency = 10)
 gxy = spectrum( cbind( xts, yts ) )
 plot( gxy $ freq, gxy $ coh )
 yields a white spectrum of 1. Clearly i'm not using
 this correctly... or i mis-interpret the coh as a cross-spectral
 density estimate of coherence |Gxy|^2/(Gxx Gyy)
 Thanks in advance!

 By default spectrum() calls spec.pgram() with spans=NULL.  The result is
 that
 it calculates the coherence between x and y as
 |I_{xy}(omega)|^2 / (I_{xx}(omega) * I_{yy}(omega))
 where I_{xy}() is the cross periodogram and I_{xx}() and I_{yy}() are the
 respective periodograms.
 This quantity will indeed be identically 1 --- see equation 10.1 in
 Bloomfield, second ed.,
 page 203.
 It would be nice if the help on spectrum mentioned this.
 According to Bloomfield what is needed is not the periodograms but rather
 estimated spectra,
 smoothed  versions of the periodograms, s_{xy}() etc.  Equation 10.4 in
 Bloomfield, second
 ed.,  page  206  indicates  that  the spectral estimates satisfy an
 *inequality*
 |s_{xy}(omega)|^2 = s_{xx}(omega) * s_{yy}(omega)
 whence the coherence is always between 0 and 1.
 To resolve the problem you need to specify the spans argument in the call
 to spectrum,
 e.g.
 gxy - spectrum(cbind(xts,yts),spans=c(5,7))
 HTH
 cheers,
 Rolf Turner
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 101, Issue 12

2011-07-12 Thread mihalicza . peter

Július 7-től 14-ig irodán kívül vagyok, és az emailjeimet nem érem el.

Sürgős esetben kérem forduljon Kárpáti Edithez (karpati.e...@gyemszi.hu).

Üdvözlettel,
Mihalicza Péter


I will be out of the office from 7 July till 14 July with no access to my 
emails.

In urgent cases please contact Ms. Edit Kárpáti (karpati.e...@gyemszi.hu).

With regards,
Peter Mihalicza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Delete row takes ages. alternative?!

2011-07-12 Thread sven

Thanks both of you for help!

This is my conclusion based on Rolf's and David's suggestion: (x=data.frame)

By for-looping over a data.frame and deleting certain rows with
x=x[-i,]

its better to collect all rows which need to be deleted in a vector and do
one final delete step:

collecting: vector=append(vector,i)
x=x[-vector,]

cheers,
sven


--
View this message in context: 
http://r.789695.n4.nabble.com/Delete-row-takes-ages-alternative-tp3656949p3661979.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in error removal

2011-07-12 Thread Mitra, Sumona

Dear all,

I am new to programming in R.
 I deal with microarray data,which is a data frame object type. I need to carry 
out a few statistical procedures on this, one of them being the pearson 
corelation. I need to do this between each row which is a gene. So the desired 
result is a square matrix with the pearson corelation value between each row. 
So the first column would be (1,1)=0,(1,2),(1,3) and so on.

I uploaded the data frame as a:-

a - read.csv(a.csv, header= TRUE, row.names=1)

and then I started the script:-

pearson - function(a){
 r - matrix[x,y]
 for(x in as.vector(a[,1], mode=double)){
 x++{
 for(y in as.vector(a[2,], mode=double)){
 y - x+1
 x++
 {
 r - (cor.test(as.vector(as.matrix(a)[x,], mode=double), 
as.vector(as.matrix(a)[y,], mode=double))$p.value)
 }
 }
 }
 r[x,y]==r[y,x]
 }
 return(r)
 }

However whenever I run it,I get the error:-

 pearson(a)
Error in matrix[x, y] : object of type 'closure' is not subsettable

Please help!

Best Regards
Sumona Mitra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MC-Simulation with foreach: Some cores finish early

2011-07-12 Thread peter_petersen

Dear R-Users,

I run a MC-Simulation using the the packages foreach and doMC on a
PowerMac with 24 cores. There are roughly a hundred parametersets and I
parallelized the program in a way, that each core computes one of these
parametersets completely.

The problem ist, that some parametersets take a lot longer to compute than
others. After a while there are only a quarter of the cores still computing
(their first parameterset), while others are already finished. But some
parametersets are still untouched.

I have thought about changing my parameterfile in a way, that every
combination takes roughly the same time (longer computations are offset with
less repetitions), but maybe there is a more elegant solution.

Is it somehow possible to wake the finished cores, while there is still
work to do? ;-)

Sincerly,
H. Bumann

--
View this message in context: 
http://r.789695.n4.nabble.com/MC-Simulation-with-foreach-Some-cores-finish-early-tp3661998p3661998.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives

2011-07-12 Thread Frank Harrell

Also note that the statistical method you are using does not seem in line
with decision theory, and you are assuming that the threshold actually
exists.  It is seldom the case that the relationship of a predictor with the
response is flat on at least one side of the threshold.  A smooth prediction
model may be in order.
Frank

Eik Vettorazzi wrote:
 
 Hi,
 
 Am 11.07.2011 22:57, schrieb Lyndon Estes:
 ctch[ctch$threshold == 3.5, ]
 # [1] threshold val   tpfptnfntpr
  fpr   tnr   fnr
 #0 rows (or 0-length row.names)
 
 this is the very effective FAQ 7.31 trap.
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
 
 Welcome to the first circle of Patrick Burns' R Inferno!
 
 Also, unname() is a more intuitive way of removing names.
 
 And I think your code is quite inefficient, because you calculate
 quantiles many times, which involves repeated ordering of x, and you may
 use a inefficient size of bin (either to small and therefore calculating
 the same split many times or to large and then missing some splits).
 I'm a bit puzzled what is x and y in your code, so any further advise is
 vague but you might have a look at any package that calculates
 ROC-curves such as ROCR or pROC (and many more).
 
 Hth
 
 -- 
 Eik Vettorazzi
 
 Department of Medical Biometry and Epidemiology
 University Medical Center Hamburg-Eppendorf
 
 Martinistr. 52
 20246 Hamburg
 
 T ++49/40/7410-58243
 F ++49/40/7410-57790
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Named-numeric-vectors-with-the-same-value-but-different-names-return-different-results-when-used-as-s-tp3660833p3662030.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time zone - any practical solution?

Hi Jim,

by dropping them down it gives 1 day less than it should do, on all timezone
notations CEST and CET.


 start
 [1] 2002-09-04 CEST 2000-07-27 CEST 2003-01-04 CET  2001-06-29 CEST
2005-01-12 CET  2000-05-28 CEST 2002-06-01 CEST 2000-06-02 CEST
2000-02-27 CET  2000-09-29 CEST 2003-10-22 CEST 2002-06-03 CEST
[13] 2004-12-30 CET  2000-04-07 CEST 2006-02-03 CET  2003-06-12 CEST
2004-07-15 CEST 2000-04-29 CEST 2000-05-06 CEST 2004-10-27 CEST

 start - format(as.Date(start,%Y-%m-%d),%Y-%m-%d)
 start
 [1] 2002-09-03 2000-07-26 2003-01-03 2001-06-28 2005-01-11
2000-05-27 2002-05-31 2000-06-01 2000-02-26 2000-09-28
2003-10-21 2002-06-02 2004-12-29 2000-04-06 2006-02-02
2003-06-11 2004-07-14
[18] 2000-04-28 2000-05-05 2004-10-26




2011/7/12 Jim Lemon j...@bitwrit.com.au

 On 07/12/2011 08:58 PM, B Laura wrote:

 Hello all,

 Could someone help me with the time zones in understandable  practical
 way?
 I got completely stucked with this.

 Have googled for a while and read the manuals, but without solutions...



 --**--**---

 When data imported  from Excel 2007 into R (2.13)

 all time variables, depending on date (summer or winter) get (un-asked for
 it!) a time zone addition CEST (for summer dates) or CET (for winter
 dates).



  Dataset

 Start  End1  End2
 days2End1.from.Exceldays2End2.from.Excel
 days2End1.in.R   days2End2.in.R
 1  2010-01-01  2011-01-01  2012-01-01
 365 730
 365 days  730. days
 2  2010-02-01  2011-02-01  2012-01-01
 365 699
 365 days  699. days
 3  2010-03-01  2011-03-01  2012-01-01
 365 671
 365 days  671. days
 4  2010-04-01  2011-04-01  2012-01-01
 365 640
 365 days  640.0417 days
 5  2010-05-01  2011-05-01  2012-01-01
 365 610
 365 days  610.0417 days
 6  2010-06-01  2011-06-01  2012-01-01
 365 579
 365 days  579.0417 days
 7  2010-07-01  2011-07-01  2012-01-01
 365 549
 365 days  549.0417 days
 8  2010-08-01  2011-08-01  2012-01-01
 365 518
 365 days  518.0417 days
 9  2010-09-01  2011-09-01  2012-01-01
 365 487
 365 days  487.0417 days
 10 2010-10-01 2011-10-01  2012-01-01
 365 457
 365 days  457.0417 days
 11 2010-11-01 2011-11-01  2012-01-01
 365 426
 365 days  426. days
 12 2010-12-01 2011-12-01  2012-01-01
 365 396
 365 days  396. days

  Dataset$Start

  [1] 2010-01-01 CET  2010-02-01 CET  2010-03-01 CET  2010-04-01
 CEST
 2010-05-01 CEST 2010-06-01 CEST 2010-07-01 CEST 2010-08-01 CEST
 2010-09-01 CEST 2010-10-01 CEST 2010-11-01 CET  2010-12-01 CET

  Dataset$End1

  [1] 2011-01-01 CET  2011-02-01 CET  2011-03-01 CET  2011-04-01
 CEST
 2011-05-01 CEST 2011-06-01 CEST 2011-07-01 CEST 2011-08-01 CEST
 2011-09-01 CEST 2011-10-01 CEST 2011-11-01 CET  2011-12-01 CET

  Dataset$End2

  [1] 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET
 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET
 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET 2012-01-01 CET



 Variables 'days2End1.from.Excel  and 'days2End2.from.Excel'   are
 calculated
 in Excel.



 Same calculation (with same outcome!) I would like to be able to perform
 with R.



 Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R.



  Dataset$days2End1.from.Excel

  [1] 365 365 365 365 365 365 365 365 365 365 365 365



  Dataset$days2End1.in.R- with(Dataset, End1- Start)


  Dataset$days2End1.in.R



 Time differences in days
  [1] 365 365 365 365 365 365 365 365 365 365 365 365
 attr(,tzone)
 [1] 


  Dataset$days2End2.from.Excel

  [1] 730 699 671 640 610 579 549 518 487 457 426 396



  Dataset$days2End2.in.R- with(Dataset, End2- Start)




  Dataset$days2End2.in.R

 Time differences in days
  [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417
 518.0417
 487.0417 457.0417 426. 396.
 attr(,tzone)
 [1] 



 Quastion 1:



 As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at
 time period April until October, when CEST (summer) times are recorded

  640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417

 giving decimals on days, where round days expected (640 610 579 549 518
 487
 457).





 Can someone explain me how to deal with it in R?

 What is

Re: [R] time zone - any practical solution?

2011-07-12 Thread Gabor Grothendieck

On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote:
 Hello all,

 Could someone help me with the time zones in understandable  practical way?
 I got completely stucked with this.

 Have googled for a while and read the manuals, but without solutions...



 ---

 When data imported  from Excel 2007 into R (2.13)

 all time variables, depending on date (summer or winter) get (un-asked for
 it!) a time zone addition CEST (for summer dates) or CET (for winter dates).


Read
http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
which gives many ways of reading Excel into R and read R News 4/1
which discusses appropriate R classes to use (you would b best to use
Date, not POSIXct, in which case you could not have time zone problems
in the first place) and internal representations of R vs. Excel.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric

2011-07-12 Thread Rainer Schuermann

It may be helpful to make sure that, in the dialog that pops up when saving a 
spreadsheet to CSV, the option Save cell content as shown is checked - that 
would leave numbers as numbers, not wrapping them in . That has helped me at 
least in a similar situation!
Rgds,
Rainer


On Tuesday 12 July 2011 06:09:18 Sarah Goslee wrote:
 Jessica,
 
 This would be easier to solve if you gave us more information, like str(PE).
 
 However, my guess is that your data somewhere has a nonnumeric value in that
 column, so the entire column is being imported as factor. It's not
 really awful -
 R is converting those factor values to their numeric levels, just as you 
 asked.
 
 The best solution is to find and deal with the nonnumeric value before
 you import
 your data (something else you did not tell us about). Failing that, you may 
 find
 this useful:
 as.numeric(as.character(PE[1, 90:99]))
 
 Sarah
 
 On Tue, Jul 12, 2011 at 4:38 AM, Jessica Lam ma_lk...@yahoo.com.hk wrote:
  Dear R user,
 
  After I imported data (csv format) in R, I called it out. But it is in
  non-numeric format.
  Then using as.numeric function.
  However, the output is really awful !
 
  PE[1,90:99]
V90  V91  V92  V93  V94
  V95  V96  V97  V98  V99
  1  16.8467742   17.5853166   19.7400328   21.7277241   21.5015489
  19.1922102   20.3351524   18.1615471   18.5479946   16.8983887
 
  as.numeric(PE[1,90:99])
   [1] 11 10 11 10 11  9 10  9  9  8
 
   How can I solve the above problem??
 
  Thanks so much!
  Jessica
 
  --
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MC-Simulation with foreach: Some cores finish early

2011-07-12 Thread Ben Bolker

peter_petersen henning.bumann at gmail.com writes:

 I run a MC-Simulation using the the packages foreach and doMC on a
 PowerMac with 24 cores. There are roughly a hundred parametersets and I
 parallelized the program in a way, that each core computes one of these
 parametersets completely.
 
 The problem ist, that some parametersets take a lot longer to compute than
 others. After a while there are only a quarter of the cores still computing
 (their first parameterset), while others are already finished. But some
 parametersets are still untouched.
 
 I have thought about changing my parameterfile in a way, that every
 combination takes roughly the same time (longer computations are offset with
 less repetitions), but maybe there is a more elegant solution.

  It sounds to me like this would require writing an entire
batch scheduling system within R -- i.e., the system would have
to maintain a queue and track which cores were finished. I'd
love to know if someone's written it, but I sort of doubt it ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave in R 2.13.1 doesn't support cp1250 encoding

2011-07-12 Thread Duncan Murdoch


On 11-07-12 6:42 AM, Tomaz wrote:

Uwe Liggesliggesat  statistik.tu-dortmund.de  writes:




On 12.07.2011 09:01, Tomaz wrote:

Prof Brian Ripleyripleyat   stats.ox.ac.uk   writes:



On Mon, 11 Jul 2011, Tomaz wrote:


I upgraded R on windows xp from 2.12.2 to 2.13.1 and now I can not
process Rnw files with windows cp1250 encoding. Sweave complains:


Which is of course not an ISO Standard encoding.  One way out is to use the
ISO encoding latin2, which is supported.


Have you read this and tried latin2?

Uwe Ligges



Windows XP doesn't have latin2 locale and I think that Sweave should support all
locales (encodings) that are supported by latex package inputenc. If someone can
show me how can I set Rgui/Rterm, Emacs/ESS, Sweave/Latex to use utf8 that would
be helpful. My best current setup was to set all tools to use cp1250.


Have you tried following Brian's advice, and testing the new version? 
It works for me.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] matplot with dates/times on horizontal axis

2011-07-12 Thread Alex van der Spek


matplot(timestamp,xymatrix,type='l')

where timestamp is a vector filled with POSIXct objects and xymatrix is 
a numeric 2x2 matrix plots but the horizontal axis labels are raw 
unformatted timestamps.


I would like to format these in any of the available codes for strftime, 
for instance format=%H:%M.


Passing a vector of formatted strings does not work. Any obvious other 
ways fail upon me as well.


Any ideas to make this work?

Thanks in advance,
Alex van der Spek

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time zone - any practical solution?

Dear Gabor

http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
doesnt describe handling dates with daylight saving time issues.

R classes Date can remove time and timezone, however calculating days
difference between two manipulated variables same problem appear if handling
these without Dates.

R News 4/1 doesnt provide solution to this neither.

Have read and struggled with this stuff for 3 days.

Anyone else who could help on this?

Regards,Laura.


2011/7/12 Gabor Grothendieck ggrothendi...@gmail.com

 On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote:
  Hello all,
 
  Could someone help me with the time zones in understandable  practical
 way?
  I got completely stucked with this.
 
  Have googled for a while and read the manuals, but without solutions...
 
 
 
  ---
 
  When data imported  from Excel 2007 into R (2.13)
 
  all time variables, depending on date (summer or winter) get (un-asked
 for
  it!) a time zone addition CEST (for summer dates) or CET (for winter
 dates).
 

 Read
 http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
 which gives many ways of reading Excel into R and read R News 4/1
 which discusses appropriate R classes to use (you would b best to use
 Date, not POSIXct, in which case you could not have time zone problems
 in the first place) and internal representations of R vs. Excel.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] applying function to multiple columns of a matrix

2011-07-12 Thread Federico Calboli

Hi,

I want to apply a function to a matrix, taking the columns 3 by 3. I could use 
a for loop:

for(i in 1:3){ # here I assume my data matrix has 9 columns
j = i*3
set = my.data[,c(j-2,j-1,j)]
my.function(set)
}

which looks cumbersome and possibly slow. I was hoping there is some function 
in the apply()/lapply() families that could take 3 columns at a time. 

I though of turning mydata in a list, then using lapply()

new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9])
lapply(new.data, my.function)

but that might incur in too much memory penalty and does have the issue of 
requiring a for loop to create the list (not all my data is conveniently of 9 
columns only).

Any suggestion would be much appreciated.

Bw

Federico


--
Federico C. F. Calboli
Department of Epidemiology and Biostatistics
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fixed effects Tobit, Honore style?

2011-07-12 Thread David Hugh-Jones

Hi all,

Is there any code to run fixed effects Tobit models in the style of Honore
(1992) in R?
(The original Honore article is here:
http://www.jstor.org/sici?sici=0012-9682%28199205%2960%3A3%3C533%3ATLALSE%3E2.0.CO%3B2-2)

Cheers
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Print file updated/created date to console?

2011-07-12 Thread Scott Chamberlain

Hello, 

Are there any built in or user defined functions for printing the date created 
or date updated for a given file? Ideally a function that works across 
operating systems. 


Thanks!
Scott Chamberlain



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in error removal



On Jul 12, 2011, at 7:27 AM, Mitra, Sumona wrote:


Dear all,

I am new to programming in R.


You see to think there is a ++ operation in R. That is not so.

I deal with microarray data,which is a data frame object type. I  
need to carry out a few statistical procedures on this, one of them  
being the pearson corelation. I need to do this between each row  
which is a gene. So the desired result is a square matrix with the  
pearson corelation value between each row. So the first column would  
be (1,1)=0,(1,2),(1,3) and so on.


I do not understand what that means. You should offer either a minimal  
dataset or at the very least the results of str(a).




I uploaded the data frame as a:-


If by that you mean you made a failed effort at attaching the data in  
a file, then you need to read the Posting Guide for what the server  
will accept as a file type.




a - read.csv(a.csv, header= TRUE, row.names=1)

and then I started the script:-

pearson - function(a){
r - matrix[x,y]
for(x in as.vector(a[,1], mode=double)){


I do not see a need for as.vector here or at any point later.  a[,1]  
would already be a vector and if it is not numeric to begin with, then  
you are going to get junk.



x++{
for(y in as.vector(a[2,], mode=double)){
y - x+1
x++
{
r - (cor.test(as.vector(as.matrix(a)[x,], mode=double),  
as.vector(as.matrix(a)[y,], mode=double))$p.value)

}
}
}
r[x,y]==r[y,x]
}
return(r)
}

However whenever I run it,I get the error:-


pearson(a)

Error in matrix[x, y] : object of type 'closure' is not subsettable

Please help!

Best Regards
Sumona Mitra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Print file updated/created date to console?

2011-07-12 Thread Eik Vettorazzi

Hi,
file.info()
does that.

Cheers

Am 12.07.2011 15:29, schrieb Scott Chamberlain:
 Hello, 
 
 Are there any built in or user defined functions for printing the date 
 created or date updated for a given file? Ideally a function that works 
 across operating systems. 
 
 
 Thanks!
 Scott Chamberlain
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help in error removal

2011-07-12 Thread David Hugh-Jones

On 12 July 2011 12:27, Mitra, Sumona sumona.mi...@kcl.ac.uk wrote:

 Dear all,

 I am new to programming in R.



You sure are ;-)

 I deal with microarray data,which is a data frame object type. I need to
 carry out a few statistical procedures on this, one of them being the
 pearson corelation. I need to do this between each row which is a gene. So
 the desired result is a square matrix with the pearson corelation value
 between each row. So the first column would be (1,1)=0,(1,2),(1,3) and so
 on.

 I uploaded the data frame as a:-

 a - read.csv(a.csv, header= TRUE, row.names=1)

 and then I started the script:-

 pearson - function(a){
  r - matrix[x,y]


I bet the problem you are getting is here. You want r to be a x by y matrix.
To do this, try r- matrix(nrow=x, ncol=y). But you haven't defined the x
and y, unless we are missing that part of your code.

As I understand it, you want the correlation matrix between all the rows of
your matrix. If so, then look at the help file for cor. (ie type ?cor.)
You will find that it automatically prints the correlatoins between all
columns of a matrix. So, once your data is correctly read in, you should be
able to just do:

cor(t(a))






  for(x in as.vector(a[,1], mode=double)){
  x++{
  for(y in as.vector(a[2,], mode=double)){
  y - x+1
  x++


This code looks like a horrible mess. It's almost never right to loop
through your vectors. In addition, there is no such thing as ++, as
somebody mentioned.



  {
  r - (cor.test(as.vector(as.matrix(a)[x,], mode=double),
 as.vector(as.matrix(a)[y,], mode=double))$p.value)
  }
  }
  }
  r[x,y]==r[y,x]
  }
  return(r)
  }

 However whenever I run it,I get the error:-

  pearson(a)
 Error in matrix[x, y] : object of type 'closure' is not subsettable

 Please help!

 Best Regards
 Sumona Mitra

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RES: applying function to multiple columns of a matrix

2011-07-12 Thread Filipe Leme Botelho

Hi Frederico. I would keep the data as it is, create two small vectors 
referring to the ranges and use a mapply (as a sapply but with multiple 
variables) for the function. Hope the example below is helpful, although as 
usual someone out there will have a better solution for it.

 dta - c()
 for (i in 1:12) dta - cbind(dta,matrix(i,5,1))
 dta
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,]123456789101112
[2,]123456789101112
[3,]123456789101112
[4,]123456789101112
[5,]123456789101112


 rng.a - seq(1,10,by=3)
 rng.b - seq(3,12,by=3)
 rng.a
[1]  1  4  7 10
 rng.b
[1]  3  6  9 12
 mapply(x=rng.a, y=rng.b, function(x,y) sum(dta[,c(x:y)]))
[1]  30  75 120 165


Cheers,
Filipe

-Mensagem original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome 
de Federico Calboli
Enviada em: terça-feira, 12 de julho de 2011 10:07
Para: r-help
Assunto: [R] applying function to multiple columns of a matrix

Hi,

I want to apply a function to a matrix, taking the columns 3 by 3. I could use 
a for loop:

for(i in 1:3){ # here I assume my data matrix has 9 columns
j = i*3
set = my.data[,c(j-2,j-1,j)]
my.function(set)
}

which looks cumbersome and possibly slow. I was hoping there is some function 
in the apply()/lapply() families that could take 3 columns at a time. 

I though of turning mydata in a list, then using lapply()

new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9])
lapply(new.data, my.function)

but that might incur in too much memory penalty and does have the issue of 
requiring a for loop to create the list (not all my data is conveniently of 9 
columns only).

Any suggestion would be much appreciated.

Bw

Federico


--
Federico C. F. Calboli
Department of Epidemiology and Biostatistics
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message and its attachments may contain confidential and/or privileged 
information. If you are not the addressee, please, advise the sender 
immediately by replying to the e-mail and delete this message.

Este mensaje y sus anexos pueden contener información confidencial o 
privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un 
mensaje al remitente.

Esta mensagem e seus anexos podem conter informação confidencial ou 
privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação 
ao remetente e exclusão da mensagem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Print file updated/created date to console?

2011-07-12 Thread Scott Chamberlain

Eik, 

Thanks very much!

Scott

On Tuesday, July 12, 2011 at 8:34 AM, Eik Vettorazzi wrote:

 Hi,
 file.info (http://file.info)()
 does that.
 
 Cheers
 
 Am 12.07.2011 15:29, schrieb Scott Chamberlain:
  Hello, 
  
  Are there any built in or user defined functions for printing the date 
  created or date updated for a given file? Ideally a function that works 
  across operating systems. 
  
  
  Thanks!
  Scott Chamberlain
  
  
  
   [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org (mailto:R-help@r-project.org) mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Eik Vettorazzi
 Institut fÃ¼r Medizinische Biometrie und Epidemiologie
 UniversitÃ¤tsklinikum Hamburg-Eppendorf
 
 Martinistr. 52
 20246 Hamburg
 
 T ++49/40/7410-58243
 F ++49/40/7410-57790


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: lasso regression

Hi,

I am trying to do a lasso regression using the lars package with the following 
data (see attached):

FastestTime

WinPercentage

PlacePercentage

ShowPercentage

BreakAverage

FinishAverage

Time7Average

Time3Average

Finish

116.90

0.14

0.14

0.29

4.43

3.29

117.56

117.77

5.00

116.23

0.29

0.43

0.14

6.14

2.14

116.84

116.80

2.00

116.41

0.00

0.14

0.29

5.71

3.71

117.24

117.17

4.00

115.80

0.57

0.00

0.29

2.14

2.57

116.21

116.53

6.00

117.76

0.14

0.14

0.43

5.43

3.57

118.57

118.87

3.00

117.69

0.14

0.14

0.00

4.71

4.00

118.69

118.60

6.00

116.46

0.14

0.00

0.00

5.14

5.00

118.50

118.97

5.00

119.77

0.00

0.00

0.14

4.57

4.14

120.74

121.03

4.00

116.81

0.14

0.29

0.00

4.86

3.57

117.63

117.40

5.00

117.66

0.14

0.14

0.14

4.57

4.71

119.19

120.57

7.00



#load Data
crs- 
read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
 na.strings=c(,, NA, , ?), encoding=UTF-8)

## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)

and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RES: applying function to multiple columns of a matrix

2011-07-12 Thread Federico Calboli

I just realised that:

apply(matrix(1:dim(my.data)[2], nrow =3), 2, 
function(x){my.function(my.data[,x])})

is the simplest possible method.

bw

F


On 12 Jul 2011, at 14:44, Filipe Leme Botelho wrote:

 Hi Frederico. I would keep the data as it is, create two small vectors 
 referring to the ranges and use a mapply (as a sapply but with multiple 
 variables) for the function. Hope the example below is helpful, although as 
 usual someone out there will have a better solution for it.
 
 dta - c()
 for (i in 1:12) dta - cbind(dta,matrix(i,5,1))
 dta
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]123456789101112
 [2,]123456789101112
 [3,]123456789101112
 [4,]123456789101112
 [5,]123456789101112
 
 
 rng.a - seq(1,10,by=3)
 rng.b - seq(3,12,by=3)
 rng.a
 [1]  1  4  7 10
 rng.b
 [1]  3  6  9 12
 mapply(x=rng.a, y=rng.b, function(x,y) sum(dta[,c(x:y)]))
 [1]  30  75 120 165
 
 
 Cheers,
 Filipe
 
 -Mensagem original-
 De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em 
 nome de Federico Calboli
 Enviada em: terça-feira, 12 de julho de 2011 10:07
 Para: r-help
 Assunto: [R] applying function to multiple columns of a matrix
 
 Hi,
 
 I want to apply a function to a matrix, taking the columns 3 by 3. I could 
 use a for loop:
 
 for(i in 1:3){ # here I assume my data matrix has 9 columns
 j = i*3
 set = my.data[,c(j-2,j-1,j)]
 my.function(set)
 }
 
 which looks cumbersome and possibly slow. I was hoping there is some function 
 in the apply()/lapply() families that could take 3 columns at a time. 
 
 I though of turning mydata in a list, then using lapply()
 
 new.data = list(my.data[,1:3], my.data[,4:6], my.data[,7:9])
 lapply(new.data, my.function)
 
 but that might incur in too much memory penalty and does have the issue of 
 requiring a for loop to create the list (not all my data is conveniently of 9 
 columns only).
 
 Any suggestion would be much appreciated.
 
 Bw
 
 Federico
 
 
 --
 Federico C. F. Calboli
 Department of Epidemiology and Biostatistics
 Imperial College, St. Mary's Campus
 Norfolk Place, London W2 1PG
 
 Tel +44 (0)20 75941602   Fax +44 (0)20 75943193
 
 f.calboli [.a.t] imperial.ac.uk
 f.calboli [.a.t] gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 This message and its attachments may contain confidential and/or privileged 
 information. If you are not the addressee, please, advise the sender 
 immediately by replying to the e-mail and delete this message.
 
 Este mensaje y sus anexos pueden contener información confidencial o 
 privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe 
 un mensaje al remitente.
 
 Esta mensagem e seus anexos podem conter informação confidencial ou 
 privilegiada. Caso não seja o destinatário, solicitamos a imediata 
 notificação ao remetente e exclusão da mensagem.

--
Federico C. F. Calboli
Department of Epidemiology and Biostatistics
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: lasso regression

Hi,

I am trying to do a lasso regression using the lars package with the following 
data:

FastestTime

WinPercentage

PlacePercentage

ShowPercentage

BreakAverage

FinishAverage

Time7Average

Time3Average

Finish

116.90

0.14

0.14

0.29

4.43

3.29

117.56

117.77

5.00

116.23

0.29

0.43

0.14

6.14

2.14

116.84

116.80

2.00

116.41

0.00

0.14

0.29

5.71

3.71

117.24

117.17

4.00

115.80

0.57

0.00

0.29

2.14

2.57

116.21

116.53

6.00



#load Data
crs- 
read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
 na.strings=c(,, NA, , ?), encoding=UTF-8)

## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)

and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot means ?

2011-07-12 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2011-07-11 18:16:25 -0400]:

 What is the point of offering this code?

To illustrate what I was talking about (code is its own specification).
I hoped that there was already a package doing that (and more in that 
direction).

 It seems to be doing what you want
yes.

 Are you trying to use someone else's code

no, I wrote it myself.
I hoped someone would comment on it to help me improve it.
I actually now use findInterval which you suggested.

 (who by the way appears to have been a former SAS programmer

The last time I used SAS was more than 10 years ago.
I am a Lisper (I also know C/C++/Perl c).

 the totally unnecessary semi-colons)

then why are they accepted?
optional syntax elements suck...

thanks for your help.

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 
11.0.60900031
http://truepeace.org http://pmw.org.il http://camera.org http://jihadwatch.org
http://www.PetitionOnline.com/tap12009/ http://iris.org.il http://memri.org
Lisp: it's here to save your butt.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: lasso regression



On Jul 12, 2011, at 9:53 AM, Heiman, Thomas J. wrote:


Hi,

I am trying to do a lasso regression using the lars package with the  
following data (see attached):


Nothing attached. (And now you have also sent an exact duplicate.)

snipped failed attempt to include data inline that was sabotaged by  
using HTML mail




#load Data
crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv 
, na.strings=c(,, NA, , ?), encoding=UTF-8)


This looks wrong. Your data had no commas in it and you are also  
setting na.strings to include commas. If I am wrong then you should  
provide dput on crs instead of




## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE,  
type=lasso)


and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: lasso regression


Hi,

Hopefully I got the formatting down.. I am trying to do a lasso regression 
using the lars package with the following data (the data files is in .csv 
format):

V1  V2  V3  V4  
V5  V6  V7  V8  V9
1   FastestTime WinPercentage   PlacePercentage ShowPercentage  
BreakAverageFinishAverage   Time7AverageTime3AverageFinish
2   116.9   0.14285715  0.14285715  0.2857143   
4.4285713.2857144   117.557144  117.76667   5.0
3   116.22857   0.2857143   0.42857143  0.14285715  
6.1428572.142857116.84286   116.8   2.0
4   116.41428   0.0 0.14285715  0.2857143   
5.7142863.7142856   117.24286   117.14  4.0
5   115.8   0.5714286   0.0 0.2857143   
2.1428572.5714285   116.21429   116.5   6.0

#load Data
crs- 
read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
 na.strings=c(,, NA, , ?), encoding=UTF-8)

## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)

and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to find out whether a string is a factor?

2011-07-12 Thread Sam Steingold

I have two data frames:
 str(ysmd)
'data.frame':   8325 obs. of  6 variables:
 $ X.stock  : Factor w/ 8325 levels A,AA,AA-,..: 2702 
6547 4118 7664 7587 6350 3341 5640 5107 7589 ...
 $ market.cap   : num  -1.00 2.97e+10 3.54e+08 3.46e+08 -1.00 
...
 $ X52.week.low : num  40.2 22.5 27.5 12.2 20.7 ...
 $ X52.week.high: num  43.3 38.2 35.1 19.2 32.7 ...
 $ X3.month.average.daily.volume: num  154 7862250 16330 205784 14697 ...
 $ X50.day.moving.average.price : num  41.8 36.3 30.5 15.2 29.9 ...
 str(top1000)
'data.frame':   1000 obs. of  1 variable:
 $ V1: Factor w/ 1000 levels AA,AAI,AAP,..: 146 96 341 814 382 977 66 1 
737 595 ...

I want to split ysmd into two new data frames: ysmd.top1000 and
ysmd.rest so that ysmd.top1000$X.stock only contains factors from
top1000$V1 and ysmd.rest$X.stock contains all the other factors.

I should be able to just write

ysmd.top1000 - ysmd[ysmd$X.stock is in top1000$V1,]
ysmd.rest - ysmd[ysmd$X.stock not in top1000$V1,]

but how so I check whether a string is a member of a factor?

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 
11.0.60900031
http://mideasttruth.com http://truepeace.org
http://camera.org http://thereligionofpeace.com http://pmw.org.il
Professionalism is being dispassionate about your work.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] High density scatter plot with logarithmic binning

2011-07-12 Thread adhikaa1

How can perform logarithmic binning in the scatterplot? I could only take the
log of the variables and plot them, but I am sure that is not the way. I
have a very huge data, and would want to plot those high density
scatterplots and code then with different colors for the bins/density.

--
View this message in context: 
http://r.789695.n4.nabble.com/High-density-scatter-plot-with-logarithmic-binning-tp3662226p3662226.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MC-Simulation with foreach: Some cores finish early

2011-07-12 Thread Markus Schmidberger

If you switch directly to the multicore package you can use the
mclapply() function. There, check for the parameter mc.preschedule=T /
F. You can use this parameter to improve the load balancing.

I do not know a parameter to tune foreach with this parameter.

Best
Markus

Am Dienstag, den 12.07.2011, 04:31 -0700 schrieb peter_petersen:
Dear R-Users,

I run a MC-Simulation using the the packages foreach and doMC on a
PowerMac with 24 cores. There are roughly a hundred parametersets and I
parallelized the program in a way, that each core computes one of these
parametersets completely.

The problem ist, that some parametersets take a lot longer to compute than
others. After a while there are only a quarter of the cores still computing
(their first parameterset), while others are already finished. But some
parametersets are still untouched.

I have thought about changing my parameterfile in a way, that every
combination takes roughly the same time (longer computations are offset with
less repetitions), but maybe there is a more elegant solution.

Is it somehow possible to wake the finished cores, while there is still
work to do? ;-)

Sincerly,
H. Bumann

--
View this message in context:
http://r.789695.n4.nabble.com/MC-Simulation-with-foreach-Some-cores-finish-early-tp3661998p3661998.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] Creating a zero matrix when a condition doesn´t get it

2011-07-12 Thread Trying To learn again

Many Thanks¡¡¡

I will try this night, I have read this I think could help me.

I´m conscient the question was badly formulated now, I will try to explain
better next time¡¡¡

On a side note: apply always accesses the function you use at least once. If
the input is a dataframe without any rows but with defined variables, it
sends FALSE as an argument to the function. If the dataframe is completely
empty, it sends a logical(0) to the function.

 x - data.frame(a=numeric(0))
 str(x)
'data.frame':   0 obs. of  1 variable:
 $ a: num

 y - apply(x,MARGIN=1,FUN=function(x){print(x)})
[1] FALSE

 x - data.frame()

 str(x)
'data.frame':   0 obs. of  0 variables

 y - apply(x,MARGIN=1,FUN=function(x){print(x)})
logical(0)




2011/7/12 Sarah Goslee sarah.gos...@gmail.com

 Hi,

 You don't provide us with a reproducible example, so I can't provide you
 with
 actual code. But two approaches come to mind:
 1. Create da2 with one row and n columns, then change the appropriate
 elements,
 if any, based on your conditions.
 2. Do the conditional parts, then check to see whether da2 is empty.
 If it is, then
 replace the empty data frame with a data frame of one row and n columns.

 Sarah

 On Tue, Jul 12, 2011 at 3:51 AM, Trying To learn again
 tryingtolearnag...@gmail.com wrote:
  Hi all,
 
  I first create a matrix/data frame called d2 if another matrix
  accomplishes some restrictions dacc2
 
  da2-da1[colSums(dacc2)9,]
  da2-da2[(da2[,13]=24),]
  write.csv(da2, file =paste('hggi', i,'.csv',sep = ''))
 
  The thing is if finally da2 cannot get/passs the filters, it cannot
 writte a
  csv because there is no any true condition.
 
  How can I create anyway a csv with zeros of one row and n columns
 (being n
  the number of columns of da2?
 
  I need a loop?

 Rarely.


 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Avoiding loops to detect number of coincidences

2011-07-12 Thread Trying To learn again

Hi all,

I have this information on a file ht.txt, imagine it is a data frame without
labels:

1 1 1 8 1 1 6 4 1 3 1 3 3
And on other table called pru.txt I have sequences similar this

4 1 1 8 1 1 6 4 1 3 1 3 3
1 6 1 8 1 1 6 4 1 3 1 3 3
1 1 1 8 1 1 6 4 1 3 1 3 3
6 6 6 8 1 1 6 4 1 3 1 3 3
I want to now how many positions are identical between each row
in pru compared with ht.

n and m are the col and row of pru (m is the same number in pru and ht)

I tried this with loops

n-nrow(pru)
m-ncol(pru)

dacc2-mat.or.vec(n, m)

for (g in 1:n){
for (j in 1:m){
if(pru[g,j]-ht[1,j]!=0) dacc2[g,j]=0 else {dacc2[g,j]=1}
}
}

So when I have dacc2 I can filter this:

dar2-pru[colSums(dacc2)2  colSums(dacc2)10,]

There is some way to avoid loops?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: lasso regression

Hi,

I am trying to do a lasso regression using the lars package with the following 
data (see attached):

FastestTime

WinPercentage

PlacePercentage

ShowPercentage

BreakAverage

FinishAverage

Time7Average

Time3Average

Finish

116.90

0.14

0.14

0.29

4.43

3.29

117.56

117.77

5.00

116.23

0.29

0.43

0.14

6.14

2.14

116.84

116.80

2.00

116.41

0.00

0.14

0.29

5.71

3.71

117.24

117.17

4.00

115.80

0.57

0.00

0.29

2.14

2.57

116.21

116.53

6.00

117.76

0.14

0.14

0.43

5.43

3.57

118.57

118.87

3.00

117.69

0.14

0.14

0.00

4.71

4.00

118.69

118.60

6.00

116.46

0.14

0.00

0.00

5.14

5.00

118.50

118.97

5.00

119.77

0.00

0.00

0.14

4.57

4.14

120.74

121.03

4.00

116.81

0.14

0.29

0.00

4.86

3.57

117.63

117.40

5.00

117.66

0.14

0.14

0.14

4.57

4.71

119.19

120.57

7.00



#load Data
crs- 
read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
 na.strings=c(,, NA, , ?), encoding=UTF-8)

## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)

and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reorganize data fram

2011-07-12 Thread anglor

Hi,

I have a data frame of about 700 000 rows which look something like this:

DateTemperature  Category
2007102   16   A
2007102   17   B
2007102   18   C

but need it to be:

Date  TemperatureA  TemperatureB   TemperatureC  
2007102   16 1718

Any suggestions?

/Angelica

--
View this message in context: 
http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cross K Ripley's function and spatio-temporal interaction power

2011-07-12 Thread ruocco


Dear All,
I have a collections of spatial data. I have to analyze pairs of these 
point patterns to test their spatial interaction. I was moving towards 
the cross K Ripley's function. The problem, however, are the following:
1) What is the best way to get a single real value that represents the 
interaction power?
2) How to obtain a value that even allows me to rank the pairwise 
point patterns according to their interaction power?


PS: I have to perform the same analysis for temporal interaction and 
spatio-temporal interaction.


Thanks in advance for your help

Best Regards

Massimiliano Ruocco

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to find out whether a string is a factor?



On Jul 12, 2011, at 10:12 AM, Sam Steingold wrote:


I have two data frames:

str(ysmd)

'data.frame':   8325 obs. of  6 variables:
$ X.stock  : Factor w/ 8325 levels  
A,AA,AA-,..: 2702 6547 4118 7664 7587 6350 3341 5640 5107  
7589 ...
$ market.cap   : num  -1.00 2.97e+10 3.54e+08 3.46e 
+08 -1.00 ...

$ X52.week.low : num  40.2 22.5 27.5 12.2 20.7 ...
$ X52.week.high: num  43.3 38.2 35.1 19.2 32.7 ...
$ X3.month.average.daily.volume: num  154 7862250 16330 205784  
14697 ...

$ X50.day.moving.average.price : num  41.8 36.3 30.5 15.2 29.9 ...

str(top1000)

'data.frame':   1000 obs. of  1 variable:
$ V1: Factor w/ 1000 levels AA,AAI,AAP,..: 146 96 341 814 382  
977 66 1 737 595 ...


I want to split ysmd into two new data frames: ysmd.top1000 and
ysmd.rest so that ysmd.top1000$X.stock only contains factors from
top1000$V1 and ysmd.rest$X.stock contains all the other factors.

I should be able to just write

ysmd.top1000 - ysmd[ysmd$X.stock is in top1000$V1,]
ysmd.rest - ysmd[ysmd$X.stock not in top1000$V1,]

but how so I check whether a string is a member of a factor?


?%in%



--
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final)  
X 11.0.60900031

http://mideasttruth.com http://truepeace.org
http://camera.org http://thereligionofpeace.com http://pmw.org.il
Professionalism is being dispassionate about your work.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: lasso regression

2011-07-12 Thread Gavin Simpson

On Tue, 2011-07-12 at 10:12 -0400, Heiman, Thomas J. wrote:
 Hi,
 
 Hopefully I got the formatting down.. I am trying to do a lasso regression 
 using the lars package with the following data (the data files is in .csv 
 format):
 
   V1  V2  V3  V4  
 V5  V6  V7  V8  V9
 1 FastestTime WinPercentage   PlacePercentage ShowPercentage  
 BreakAverageFinishAverage   Time7AverageTime3AverageFinish
 2 116.9   0.14285715  0.14285715  0.2857143   
 4.4285713.2857144   117.557144  117.76667   5.0
 3 116.22857   0.2857143   0.42857143  0.14285715  
 6.1428572.142857116.84286   116.8   2.0
 4 116.41428   0.0 0.14285715  0.2857143   
 5.7142863.7142856   117.24286   117.14  4.0
 5 115.8   0.5714286   0.0 0.2857143   
 2.1428572.5714285   116.21429   116.5   6.0
 
 #load Data
 crs- 
 read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
  na.strings=c(,, NA, , ?), encoding=UTF-8)
 
 ## define x and y
 x= x-crs[,9]#predictor variables
 y= y-crs[1:8,]  #response variable
 
 
 library(lars)
 cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)
 
 and I get:
 
 LASSO sequence
 Error in one %*% x : requires numeric/complex matrix/vector arguments
 
 Any idea on what I am doing wrong?  Thank you!!

Row 1 contains character data, the variable names. Are you missing a
`header = TRUE` (this is the default in `read.csv()`), or do you have
several header lines?

I also think you have the response/predictors back to front there;
otherwise, why would you need to shrink the coefficient and select from
a model with a single predictor?

HTH

G

 Sincerely,
 
 tom
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Avoiding loops to detect number of coincidences

Hi Trying,

It would be helpful if you provided reproducible examples. It would
also be polite to sign a name so that we have something by which to
address you.

On Tue, Jul 12, 2011 at 8:00 AM, Trying To learn again
tryingtolearnag...@gmail.com wrote:
 Hi all,

 I have this information on a file ht.txt, imagine it is a data frame without
 labels:

 1 1 1 8 1 1 6 4 1 3 1 3 3
 And on other table called pru.txt I have sequences similar this

 4 1 1 8 1 1 6 4 1 3 1 3 3
 1 6 1 8 1 1 6 4 1 3 1 3 3
 1 1 1 8 1 1 6 4 1 3 1 3 3
 6 6 6 8 1 1 6 4 1 3 1 3 3
 I want to now how many positions are identical between each row
 in pru compared with ht.

I have no idea what you are trying to do with the loops below, but if
you are trying to count matches by row:

 a reproducible example
 dput(ht)
c(1, 1, 1, 8, 1, 1, 6, 4, 1, 3, 1, 3, 3)
 dput(pru)
structure(list(V1 = c(4L, 1L, 1L, 6L), V2 = c(1L, 6L, 1L, 6L),
V3 = c(1L, 1L, 1L, 6L), V4 = c(8L, 8L, 8L, 8L), V5 = c(1L,
1L, 1L, 1L), V6 = c(1L, 1L, 1L, 1L), V7 = c(6L, 6L, 6L, 6L
), V8 = c(4L, 4L, 4L, 4L), V9 = c(1L, 1L, 1L, 1L), V10 = c(3L,
3L, 3L, 3L), V11 = c(1L, 1L, 1L, 1L), V12 = c(3L, 3L, 3L,
3L), V13 = c(3L, 3L, 3L, 3L)), .Names = c(V1, V2, V3,
V4, V5, V6, V7, V8, V9, V10, V11, V12, V13
), class = data.frame, row.names = c(NA, -4L))
 # count the positional matches by row
 apply(pru, 1, function(x)sum(x == ht))
[1] 12 12 13 10

Sarah


 n and m are the col and row of pru (m is the same number in pru and ht)

 I tried this with loops

 n-nrow(pru)
 m-ncol(pru)

 dacc2-mat.or.vec(n, m)

 for (g in 1:n){
 for (j in 1:m){
 if(pru[g,j]-ht[1,j]!=0) dacc2[g,j]=0 else {dacc2[g,j]=1}
 }
 }

 So when I have dacc2 I can filter this:

 dar2-pru[colSums(dacc2)2  colSums(dacc2)10,]

 There is some way to avoid loops?

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: lasso regression



On Jul 12, 2011, at 10:12 AM, Heiman, Thomas J. wrote:



Hi,

Hopefully I got the formatting down.. I am trying to do a lasso  
regression using the lars package with the following data (the data  
files is in .csv format):


V1  V2  V3  V4  
V5  V6  V7  V8  V9
1	FastestTime	WinPercentage	PlacePercentage	ShowPercentage	 
BreakAverage	FinishAverage	Time7Average	Time3Average	Finish
2	116.9		0.14285715	0.14285715		0.2857143		4.428571	3.2857144	 
117.557144	117.76667	5.0
3	116.22857	0.2857143	0.42857143		0.14285715		6.142857	2.142857	 
116.84286	116.8		2.0
4	116.41428	0.0		0.14285715		0.2857143		5.714286	3.7142856	117.24286	 
117.14	4.0
5	115.8		0.5714286	0.0			0.2857143		2.142857	2.5714285	116.21429	 
116.5	6.0




It is now clear that you failed to get your data in properly. Since  
stringsAsFactors is set to TRUE by default for all of the read.*  
functions, all of your columns are now factors. Perhaps you had a  
blank line at the beginning of your data? The default for read.csv  
(which is just a wrapper with different parameters for read.table) is  
to set header =TRUE. You should learn to use str() on your data  
immediately after data entry steps.


--
David.


#load Data
crs- read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv 
, na.strings=c(,, NA, , ?), encoding=UTF-8)


## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


library(lars)
cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE,  
type=lasso)


and I get:

LASSO sequence
Error in one %*% x : requires numeric/complex matrix/vector arguments

Any idea on what I am doing wrong?  Thank you!!

Sincerely,

tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize data fram



On Jul 12, 2011, at 8:42 AM, anglor wrote:


Hi,

I have a data frame of about 700 000 rows which look something like  
this:


DateTemperature  Category
2007102   16   A
2007102   17   B
2007102   18   C

but need it to be:

Date  TemperatureA  TemperatureB   TemperatureC
2007102   16 1718


 reshape(dat, idvar=Date, timevar=Category, direction=wide)
 Date Temperature.A Temperature.B Temperature.C
1 2007102161718




Any suggestions?

/Angelica

--
View this message in context: 
http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: lasso regression

2011-07-12 Thread Steve Lianoglou

Hi,

(i) As David suggested, please use `dput` to provide examples of data!

(ii) The nut of your problem is that you are giving lars an object
that it is not expecting. It wants a *matrix* for its `x` variable, as
you'll see in the help for ?lars.

So, as long as this expression:

R is.numeric(x)  is.matrix(x)

Evaluates to FALSE for your x, you won't get it to work.

(iii) Consider using glmnet -- you get the lass for free when you
set a=1, but you can also see if the elastic net is helpful.

-steve



On Tue, Jul 12, 2011 at 10:12 AM, Heiman, Thomas J. thei...@mitre.org wrote:

 Hi,

 Hopefully I got the formatting down.. I am trying to do a lasso regression 
 using the lars package with the following data (the data files is in .csv 
 format):

        V1              V2              V3                      V4             
          V5              V6              V7              V8              V9
 1       FastestTime     WinPercentage   PlacePercentage ShowPercentage  
 BreakAverage    FinishAverage   Time7Average    Time3Average    Finish
 2       116.9           0.14285715      0.14285715              0.2857143     
           4.428571        3.2857144       117.557144      117.76667       5.0
 3       116.22857       0.2857143       0.42857143              0.14285715    
           6.142857        2.142857        116.84286       116.8           2.0
 4       116.41428       0.0             0.14285715              0.2857143     
           5.714286        3.7142856       117.24286       117.14      4.0
 5       115.8           0.5714286       0.0                     0.2857143     
           2.142857        2.5714285       116.21429       116.5       6.0

 #load Data
 crs- 
 read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
  na.strings=c(,, NA, , ?), encoding=UTF-8)

 #    # define x and y
 x= x-crs[,9]    #predictor variables
 y= y-crs[1:8,]  #response variable


 library(lars)
 cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)

 and I get:

 LASSO sequence
 Error in one %*% x : requires numeric/complex matrix/vector arguments

 Any idea on what I am doing wrong?  Thank you!!

 Sincerely,

 tom

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize data fram

2011-07-12 Thread Dennis Murphy

Hi:

Try the cast() function in the reshape package. Using d as the name of
your data frame,

library(reshape)
cast(d, Date ~ Category, value = 'Temperature')
 Date  A  B  C
1 2007102 16 17 18

HTH,
Dennis


On Tue, Jul 12, 2011 at 5:42 AM, anglor angelica.ekens...@dpes.gu.se wrote:
 Hi,

 I have a data frame of about 700 000 rows which look something like this:

 Date        Temperature  Category
 2007102   16                   A
 2007102   17                   B
 2007102   18                   C

 but need it to be:

 Date          TemperatureA  TemperatureB   TemperatureC
 2007102       16                 17                    18

 Any suggestions?

 /Angelica

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Reorganize-data-fram-tp3662123p3662123.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time zone - any practical solution?

2011-07-12 Thread Gabor Grothendieck

On Tue, Jul 12, 2011 at 8:57 AM, B Laura gm.spam2...@gmail.com wrote:
 Dear Gabor

 http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
 doesnt describe handling dates with daylight saving time issues.


Two references were given and its discussed in the R News article.  It
was also mentioned over again in my first post -- namely, don't use
POSIXct and then you don't have time zones and so all these problems
go away.

 R classes Date can remove time and timezone, however calculating days

You don't have to remove the time zone if you never use POSIXct.  Date
class has no time zones in the first place.

 difference between two manipulated variables same problem appear if handling
 these without Dates.


 R News 4/1 doesnt provide solution to this neither.

It certainly discusses how to choose the appropriate date / time
class.  Your problem is that you are using the wrong class for the
problem whereas you seem to be interpreting it as how to fix it up
after having chosen the wrong class.  By that time the wrong design
decision has already been made and that is what is fundamentally
causing the problem. The entire first page of  the  Rnews article
discusses choosing the right class in the first place.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Connecting to Empress DB using RODBC

2011-07-12 Thread Marc Schwartz

On Jul 11, 2011, at 9:16 PM, Steve Parker wrote:

 Hi there,
 I am using the RODBC library to connect to an Empress database.
 I have installed the ODBC data source with the server DNs number and port, 
 and named the source Trawl.
 It is the odbcDriverConnect that seems to have the problem, and I suspect one 
 of the settings in my Data Source is wrong, or that my syntax to identify the 
 database is wrong. I have not set CodeSet or ODBC Version.
 
 Here are the error messages.
 1: In odbcDriverConnect(Trawl) :
  [RODBC] ERROR: state 08001, code -256, message [Empress Software][ODBC 
 DLL]Unable to connect to data source
 2: In odbcDriverConnect(Trawl) :
  [RODBC] ERROR: state 01S00, code 0, message [Microsoft][ODBC Driver Manager] 
 Invalid connection string attribute
 3: In odbcDriverConnect(Trawl) : ODBC connection failed
 Can anyone point me in the right direction? Is there a specific syntax for 
 naming the database other than its name? It does bring up a GUI where I can 
 choose my data source and login, but then it just gives the errors above.
 
 Any help is greatly appreciated.
 Steve


I would repost your query to r-sig-db:

  https://stat.ethz.ch/pipermail/r-sig-db/

but also include information on your OS (presumably some version of Windows), 
the version of R and the version of RODBC, being sure that you are running the 
latest of each (R 2.13.1 and RODBC 1.3-2).

Also include the actual function calls you are making along with the error 
messages, being sure to mask your userID and password where included. We can't 
tell you if your syntax is wrong if you don't include it. The above errors 
could be your syntax or perhaps an ODBC configuration error

See vignette(RODBC) for general information on creating a proper Windows DSN 
for your database.

If Empress has any kind of ODBC client or if you can use something like Excel 
or MS Query to connect via ODBC, you can test your connection to the database 
independent of R. That will help assess if your problem is your basic ODBC 
configuration or if there is something specific to R/RODBC such as your syntax 
or perhaps a driver issue.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] time zone - any practical solution?

2011-07-12 Thread Jeff Newmiller

If you don't need POSIXt types, as Gabor says don't use them. However, there 
are good reasons to use them sometimes, and the most workable solution I have 
found is to set your default timezone in R to a non-DST timezone before you 
convert from character to POSIXct. This is dependent on your OS and particular 
build of R, but on Windows and Linux I have found that
Sys.setenv(TZ=Etc/GMT-2)
at the beginning of my R session should handle Central European Standard Time. 
To identify which time zones are supported on your system, read the 
documentation (there is generally a zoneinfo directory somewhere with filenames 
matching the time zones).
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

B Laura gm.spam2...@gmail.com wrote:

Dear Gabor

http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
doesnt describe handling dates with daylight saving time issues.

R classes Date can remove time and timezone, however calculating days
difference between two manipulated variables same problem appear if handling
these without Dates.

R News 4/1 doesnt provide solution to this neither.

Have read and struggled with this stuff for 3 days.

Anyone else who could help on this?

Regards,Laura.


2011/7/12 Gabor Grothendieck ggrothendi...@gmail.com

 On Tue, Jul 12, 2011 at 6:58 AM, B Laura gm.spam2...@gmail.com wrote:
  Hello all,
 
  Could someone help me with the time zones in understandable  practical
 way?
  I got completely stucked with this.
 
  Have googled for a while and read the manuals, but without solutions...
 
 
 
 _

 
  When data imported from Excel 2007 into R (2.13)
 
  all time variables, depending on date (summer or winter) get (un-asked
 for
  it!) a time zone addition CEST (for summer dates) or CET (for winter
 dates).
 

 Read
 http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windowss=excel
 which gives many ways of reading Excel into R and read R News 4/1
 which discusses appropriate R classes to use (you would b best to use
 Date, not POSIXct, in which case you could not have time zone problems
 in the first place) and internal representations of R vs. Excel.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding a correlation value (like Rsquared) to a 4 parameter logistic fit model.

2011-07-12 Thread Kevin_McEnroy

Hello, 
In my lab we use a four parameter logistic fit model for our ELISA data
(absorbance values).  We are currently testing the use of different solvents
and need to find a way to add a correlation value (such as an R squared or
something similar) so we can test different solvents in making this standard
curve.  We currently use the drc package and this is our script for the 4
parameter:

SC-read.delim(file = C:/Documents and Settings/rekem/My
Documents/SCBook.txt, header = T, check.names = FALSE, as.is = TRUE)


FourP-drm(Response~Expected, data = SC, fct = LL.4())


plot(FourP, main = LTB4 Standard Curve Zi Phase 7, xlab = Expected
(pg/mL), ylab = Response (%Bound))

Thanks for any help.
Kevin McEnroy

--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-a-correlation-value-like-Rsquared-to-a-4-parameter-logistic-fit-model-tp3662480p3662480.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm: mark sample used in estimation

2011-07-12 Thread Anirban Mukherjee

Thanks Peter, Ted!

Best, Anirban

On Tue, Jul 12, 2011 at 4:54 AM, Ted Harding ted.hard...@wlandres.net wrote:
 On 11-Jul-11 07:55:44, Anirban Mukherjee wrote:
 Hi all,

 I wanted to mark the estimation sample: mark what rows (observations)
 are deleted by lm due to missingness. For eg, from the original
 example in help, I have changed one of the values in trt to be NA
 (missing).

# code below
# 
# original example
 ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
 trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)

# change 18th observation of trt
 trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,NA,4.32,4.69)
 group - gl(2,10,20, labels=c(Ctl,Trt))
 weight - c(ctl, trt)
 lm.D9 - lm(weight ~ group)
 summary(lm.D9)

 Call:
 lm(formula = weight ~ group)

 Residuals:
  Min__ 1Q__ Median__ 3Q_ Max
 -1.04556 -0.48378_ 0.05444_ 0.23622_ 1.39444

 Coefficients:
 ___ Estimate Std. Error t value Pr(|t|)
 (Intercept)__ 5.0320 0.2258_ 22.281 5.09e-14 ***
 groupTrt -0.3964 0.3281_ -1.208___ 0.244
 ---
 Signif. codes:_ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.7142 on 17 degrees of freedom
 _ (1 observation deleted due to missingness)
 Multiple R-squared: 0.07907,___ Adjusted R-squared: 0.0249
 F-statistic:_ 1.46 on 1 and 17 DF,_ p-value: 0.2435

# --
# end snippet

 I want to generate an indicator variable to mark the observations used
 in estimation: 1 for a row not deleted, 0 for a row deleted. In this
 case I want an indicator variable that has seventeen 1s, one 0, and
 then 2 1s. I know I can do ind = !is.na(group) in the above example.
 But I am ideally looking for a way that allows one to use any formula
 in lm, and still be able to mark the estimation sample.
 Function/option I am missing? The best I could come up with:

 lm.D9 - lm(weight ~ group, model=TRUE)
 ind - as.numeric(row.names(lm.D9$model))
 esamp - rep(0,length(group)) #substitute nrow(data.frame used in
 estimation) for length(group)
 esamp[ind] - 1
 esamp
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1

 Is this safe (recommended?)?

 Appreciate any help.

 Best, A

 Separately from Peter Dalgaard's response, you raise a generic
 quedtion about how to find out which observations have been
 used in an LM fit when some cases may have been omitted, e.g.
 because of missing values (NA).

 Take the following as an example:

  X   - (1:10)
  Y   - X + rnorm(10)
  LM  - lm(Y ~ X)

  X1  - X
  X1[c(4,8)] - NA ## so cases 4  8 will be omitted
  LM1 - lm(Y ~ X1)

  row.names(LM$model)
  # [1] 1  2  3  4  5  6  7  8  9  10
  row.names(LM1$model)
  # [1] 1  2  3  5  6  7  9  10

  which( (row.names(LM$model) %in% row.names(LM1$model)) )
  # [1]  1  2  3  5  6  7  9 10
  ### These are the indices of the cases which were kept

  which(!(row.names(LM$model) %in% row.names(LM1$model)) )
  # [1] 4 8
  ### These are this indices of the cases which were omitted

 You could also use 'names(LM$res)' and 'names(LM1$res)'
 instead of 'row.names(LM$model' and 'row.names(LM$model)'
 in the above.

 Hoping this helps,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Fax-to-email: +44 (0)870 094 0861
 Date: 11-Jul-11                                       Time: 21:54:05
 -- XFMail --


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] What's wrong with my code?

2011-07-12 Thread Susie

I've written out codes for one particular file, and now I want to generate
the same kind of graphs and files for the rest of similar data files.

When I plugged in these codes, R produced only one plot for the file
eight, and it states my error(see below) I have edited and checked my
codes so many times but still couldn't figure out what's wrong with
it...would you please help me? Thanks!


my.files - list.files()
for (i in 1: length(my.files)) {
temp.dat - read.csv(my.files[i])
eight - read.csv(file=8.csv, header=TRUE, sep=,)
eightout - subset(eight, inout==Outgoing from panel hh  o_duration0,
select=c(inout, enc_callee, o_duration))
f - function(eightoutf) nrow(eightoutf)
eightnocalls - ddply(eightout,.(enc_callee),f)
colnames(eightnocalls)[2] - nocalls
eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
eightnocalls$enc_callee)]
eightout=data.frame(eightout,time=c(1:nrow(eightout)))
plot(eightout$time,eightout$nocalls)
write.csv(eightout, eight.csv, row.names=FALSE)   
pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=))
plot(temp.dat$time, temp.dat$nocalls, main=my.files[i])
dev.off() 
write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=),
row.names=FALSE)
}


R says:
need finite 'xlim' values In addition: 
Warning messages: 1: In min(x) : no non-missing arguments to min; returning
Inf 
   2: In max(x) : no non-missing arguments to max;
returning -Inf 
   3: In min(x) : no non-missing arguments to min;
returning Inf 
   4: In max(x) : no non-missing arguments to max;
returning -Inf

--
View this message in context: 
http://r.789695.n4.nabble.com/What-s-wrong-with-my-code-tp3662579p3662579.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot means ?

2011-07-12 Thread Peter Ehlers


On 2011-07-12 07:03, Sam Steingold wrote:

[snip]


the totally unnecessary semi-colons)


then why are they accepted?
optional syntax elements suck...


They're accepted because they *can* be useful (multiple
statements on one line).
Is there *any* language that can *not* be abused?

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What's wrong with my code?

Hi Susie,

At a guess, there are no non-missing arguments to min or max.

But no, we can't help you. You haven't provided a minimal reproducible
example, and without knowing anything about your data it is impossible
for the list to offer any constructive suggestions.

The posting guide offers suggestions for doing that. In particular,
dput() and str() are both very useful.

Sarah

On Tue, Jul 12, 2011 at 11:15 AM, Susie susiecrab_l...@hotmail.com wrote:
 I've written out codes for one particular file, and now I want to generate
 the same kind of graphs and files for the rest of similar data files.

 When I plugged in these codes, R produced only one plot for the file
 eight, and it states my error(see below) I have edited and checked my
 codes so many times but still couldn't figure out what's wrong with
 it...would you please help me? Thanks!


 my.files - list.files()
 for (i in 1: length(my.files)) {
 temp.dat - read.csv(my.files[i])
 eight - read.csv(file=8.csv, header=TRUE, sep=,)
 eightout - subset(eight, inout==Outgoing from panel hh  o_duration0,
 select=c(inout, enc_callee, o_duration))
 f - function(eightoutf) nrow(eightoutf)
 eightnocalls - ddply(eightout,.(enc_callee),f)
 colnames(eightnocalls)[2] - nocalls
 eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
 eightnocalls$enc_callee)]
 eightout=data.frame(eightout,time=c(1:nrow(eightout)))
 plot(eightout$time,eightout$nocalls)
 write.csv(eightout, eight.csv, row.names=FALSE)
 pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=))
 plot(temp.dat$time, temp.dat$nocalls, main=my.files[i])
 dev.off()
 write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=),
 row.names=FALSE)
 }


 R says:
 need finite 'xlim' values In addition:
 Warning messages: 1: In min(x) : no non-missing arguments to min; returning
 Inf
                           2: In max(x) : no non-missing arguments to max;
 returning -Inf
                           3: In min(x) : no non-missing arguments to min;
 returning Inf
                           4: In max(x) : no non-missing arguments to max;
 returning -Inf


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: lasso regression

2011-07-12 Thread Patrick Breheny


On 07/12/2011 09:53 AM, Heiman, Thomas J. wrote:

## define x and y
x= x-crs[,9]#predictor variables
y= y-crs[1:8,]  #response variable


This cannot be correct.  The response variable is a vector, while the 
predictor variables form a matrix.  You have the response variable 
consisting of only the first 8 observations, then all the columns. 
Perhaps you mean:


X - crs[,1:8]
y - crs[,9]

If this is not the case, please include the output of

head(crs)

and then tell us which variable is your response.

--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Role of na.rm inside mean()

2011-07-12 Thread Doran, Harold

This is just posed out of curiosity, (not as a criticism per se). But what is 
the functional role of the argument na.rm inside the mean() function? If there 
are missing values, mean() will always return an NA as in the example below. 
But, is there ever a purpose in computing a mean only to receive NA as a result?

In 10 years of using R, I have always used mean() in order to get a result, 
which is the opposite of its default behavior (when there are NAs). Can anyone 
suggest a reason why it is in fact desired to get NA as a result of computing 
mean()?

 x - rnorm(100)
 x[1] - NA

 mean(x)
[1] NA

 mean(x, na.rm=TRUE)
[1] 0.08136736

If the reason is to alert the user that the vector has missing values, I 
suppose I could buy that. But, I think other checks are better

Harold


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Role of na.rm inside mean()

2011-07-12 Thread Jeff Newmiller

In SQL, the default is to ignore NULL (equivalent to NA in R).

However, it can be dangerous to fail to verify how much data was actually used 
in an aggregation, so the logic behind the default na.rm setting may be one of 
encouraging the user to take responsibility for missing data.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Doran, Harold hdo...@air.org wrote:

This is just posed out of curiosity, (not as a criticism per se). But what is 
the functional role of the argument na.rm inside the mean() function? If there 
are missing values, mean() will always return an NA as in the example below. 
But, is there ever a purpose in computing a mean only to receive NA as a result?

In 10 years of using R, I have always used mean() in order to get a result, 
which is the opposite of its default behavior (when there are NAs). Can anyone 
suggest a reason why it is in fact desired to get NA as a result of computing 
mean()?

 x - rnorm(100)
 x[1] - NA

 mean(x)
[1] NA

 mean(x, na.rm=TRUE)
[1] 0.08136736

If the reason is to alert the user that the vector has missing values, I 
suppose I could buy that. But, I think other checks are better

Harold


[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Role of na.rm inside mean()

2011-07-12 Thread Duncan Murdoch


On 12/07/2011 12:26 PM, Doran, Harold wrote:

This is just posed out of curiosity, (not as a criticism per se). But what is 
the functional role of the argument na.rm inside the mean() function? If there 
are missing values, mean() will always return an NA as in the example below. 
But, is there ever a purpose in computing a mean only to receive NA as a result?


The general idea in R is that NA stands for unknown.  If some of the 
values in a vector are unknown, then the mean of the vector is also 
unknown.  NA is also used in other ways sometimes; then it makes sense 
to remove it and compute the mean of the other values.


Duncan Murdoch


In 10 years of using R, I have always used mean() in order to get a result, 
which is the opposite of its default behavior (when there are NAs). Can anyone 
suggest a reason why it is in fact desired to get NA as a result of computing 
mean()?

  x- rnorm(100)
  x[1]- NA

  mean(x)
[1] NA

  mean(x, na.rm=TRUE)
[1] 0.08136736

If the reason is to alert the user that the vector has missing values, I 
suppose I could buy that. But, I think other checks are better

Harold


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Role of na.rm inside mean()

Hi Harold,

Many (most?) of the statistics function have a similar argument.  I
suspect it is sort of to warn the user---you have to be explicit about
it rather than the program just silently removing or ignoring values
that would not work in the function called.  I can think of one
example where I want a missing value returned.  In psychology we often
create scores on some construct (say optimism), by averaging
individuals' response to several questions.  In certain cases if a
subject does not respond to one question, their overall score should
be missing.  This is easily accomplished by letting na.rm = FALSE.

Cheers,

Josh

On Tue, Jul 12, 2011 at 9:26 AM, Doran, Harold hdo...@air.org wrote:
 This is just posed out of curiosity, (not as a criticism per se). But what is 
 the functional role of the argument na.rm inside the mean() function? If 
 there are missing values, mean() will always return an NA as in the example 
 below. But, is there ever a purpose in computing a mean only to receive NA as 
 a result?

 In 10 years of using R, I have always used mean() in order to get a result, 
 which is the opposite of its default behavior (when there are NAs). Can 
 anyone suggest a reason why it is in fact desired to get NA as a result of 
 computing mean()?

 x - rnorm(100)
 x[1] - NA

 mean(x)
 [1] NA

 mean(x, na.rm=TRUE)
 [1] 0.08136736

 If the reason is to alert the user that the vector has missing values, I 
 suppose I could buy that. But, I think other checks are better

 Harold


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] For applying formula in rows

2011-07-12 Thread Bansal, Vikas

Dear all,

I have a problem and it is very difficult for me to get a code.
I am reading a file(attached with this mail) using the code-

 df=read.table(summary.txt,fill=T,sep=,colClasses = character,header=T)
and dataframe df is like this-

V1V2  CaseA CaseC CaseG CaseT new
 10 135344109 0 0 1 012
 10 135344110 0 1 0 0 12
 10 135344111 0 0 1 0 12
 10 135344112 0 0 1 0  12
 10 135344113 0 0 1 0  12
 10 135344114 1 0 0 0  12
 10 135344115 1 0 0 0   12
 10 135344116 0 0 0 1   12
 10 135344117 0 1 0 0   12
 10 135344118 0 0 0 112

I want to apply a formula which is  (number/total)*new*2.
where number is in column caseA,G,C,T and total is sum of these 4 columns.I 
will explain with an example.the output of first row should be-

V1V2  CaseA CaseC CaseG CaseT new
 10 135344109 0 0 24 012

because sum of 3rd,4th,5th and 6th column is 1 for first row.and for case A,C 
and T if we will apply above formula the answer will be zero (0/1)*12*2 which 
is equal to 0 but for Case G-
(1/1)*12*2 which is equal to 24.


Can you please help me.

 



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London V1V2 CaseA CaseC CaseG CaseT new
 10 135344109 0 0 1 0  12
 10 135344110 0 1 0 0  12
 10 135344111 0 0 1 0  12
 10 135344112 0 0 1 0  12
 10 135344113 0 0 1 0  12
 10 135344114 1 0 0 0  12
 10 135344115 1 0 0 0  12
 10 135344116 0 0 0 1  12
 10 135344117 0 1 0 0  12
 10 135344118 0 0 0 1  12
 10A*A0 0 0 0  12
 10 135344120 1 0 0 0  12
 10 135344121 0 0 1 0  12
 10 135344122 0 1 0 0  12
 10 135344123 0 1 0 0  12
 10 135344124 0 1 0 0  12
 10 135344125 0 0 0 1  12
 10 135344126 0 0 1 0  12
 10 135344127 0 0 1 0  12
 10 135344128 0 1 0 0  12
 10 135344129 0 1 0 0  12
 10 135344130 0 0 0 1  12
 10 135344185 0 1 0 0  12
 10 135344186 1 0 0 0  12
 10 135344187 0 0 1 0  12
 10 135344188 1 0 0 0  12
 10 135344189 0 1 0 0  12
 10 135344190 0 0 0 1  12
 10 135344191 0 0 1 0  12
 10 135344192 0 1 0 0  12
 10 135344193 0 1 0 0  12
 10 135344194 0 1 0 0  12
 10 135344195 0 0 0 1  12
 10 135344196 0 0 1 0  12
 10 135344197 0 1 0 0  12
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For applying formula in rows

Hi Vikas,

Here is one way:

df - read.table(summary.txt, header = TRUE)
str(df)
df[, total] - rowSums(df[, 3:6])
df[, 3:6] - apply(df[, 3:6], 2, function(x) x / df[, total] * df[,
new] * 2)

 head(df)
  V1V2 CaseA CaseC CaseG CaseT new total
1 10 135344109 0 024 0  12 1
2 10 135344110 024 0 0  12 1
3 10 135344111 0 024 0  12 1
4 10 135344112 0 024 0  12 1
5 10 135344113 0 024 0  12 1
6 10 13534411424 0 0 0  12 1

Note that I read the data in differently than you did.  This matters.

Cheers,

Josh

2011/7/12 Bansal, Vikas vikas.ban...@kcl.ac.uk:
 Dear all,

 I have a problem and it is very difficult for me to get a code.
 I am reading a file(attached with this mail) using the code-

  df=read.table(summary.txt,fill=T,sep=,colClasses = character,header=T)
 and dataframe df is like this-

 V1        V2      CaseA CaseC CaseG CaseT new
  10 135344109     0     0             1     0        12
  10 135344110     0     1             0     0         12
  10 135344111     0     0             1     0         12
  10 135344112     0     0             1     0          12
  10 135344113     0     0             1     0          12
  10 135344114     1     0             0     0          12
  10 135344115     1     0             0     0           12
  10 135344116     0     0             0     1           12
  10 135344117     0     1             0     0           12
  10 135344118     0     0             0     1            12

 I want to apply a formula which is  (number/total)*new*2.
 where number is in column caseA,G,C,T and total is sum of these 4 columns.I 
 will explain with an example.the output of first row should be-

 V1        V2      CaseA CaseC CaseG CaseT new
  10 135344109     0     0             24     0        12

 because sum of 3rd,4th,5th and 6th column is 1 for first row.and for case A,C 
 and T if we will apply above formula the answer will be zero (0/1)*12*2 which 
 is equal to 0 but for Case G-
 (1/1)*12*2 which is equal to 24.


 Can you please help me.





 Thanking you,
 Warm Regards
 Vikas Bansal
 Msc Bioinformatics
 Kings College London
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installation of package 'mapproj' had non-zero exit status

2011-07-12 Thread B77S

## Hello.. I have asked a similar question, but this is not fixed as before. 

## I am running the following using Ubuntu OS:

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)


## when I do this: 

 install.packages(mapproj, dependencies=T)


## I get this:  

Installing package(s) into â/home/brad/R/x86_64-pc-linux-gnu-library/2.13â
(as âlibâ is unspecified)
also installing the dependency âmapsâ

trying URL 'http://cran.case.edu/src/contrib/maps_2.1-6.tar.gz'
Content type 'application/x-gzip' length 1371854 bytes (1.3 Mb)
opened URL
==
downloaded 1.3 Mb

trying URL 'http://cran.case.edu/src/contrib/mapproj_1.1-8.3.tar.gz'
Content type 'application/x-gzip' length 23955 bytes (23 Kb)
opened URL
==
downloaded 23 Kb

* installing *source* package âmapsâ ...
** libs
** arch - 
gcc -std=gnu99 -O3 -pipe  -gGmake.c   -o Gmake
Gmake.c: In function âget_lhâ:
Gmake.c:111: warning: cast from pointer to integer of different size
Gmake.c:113: warning: cast from pointer to integer of different size
Gmake.c: In function âmainâ:
Gmake.c:211: warning: cast from pointer to integer of different size
Gmake.c:214: warning: cast from pointer to integer of different size
Gmake.c:217: warning: cast from pointer to integer of different size
Gmake.c:219: warning: cast from pointer to integer of different size
Gmake.c:221: warning: cast from pointer to integer of different size
Gmake.c:224: warning: cast from pointer to integer of different size
Gmake.c:227: warning: cast from pointer to integer of different size
gcc -std=gnu99 -O3 -pipe  -gLmake.c   -o Lmake
Lmake.c: In function âmainâ:
Lmake.c:223: warning: cast from pointer to integer of different size
Lmake.c:228: warning: cast from pointer to integer of different size
Lmake.c:230: warning: cast from pointer to integer of different size
Lmake.c:232: warning: cast from pointer to integer of different size
Lmake.c:235: warning: cast from pointer to integer of different size
Converting world to world2
f convert.awk  world.line  world2.line
/bin/bash: f: command not found
make: [world2.line] Error 127 (ignored)
make county.L state.L usa.L nz.L world.L world2.L italy.L france.L
make[1]: Entering directory `/tmp/RtmpssTER5/R.INSTALL21eb6525/maps/src'
./Lmake 0 s b county.line county.linestats ../inst/mapdata/county.L
./Lmake 0 s b state.line state.linestats ../inst/mapdata/state.L
./Lmake 0 s b usa.line usa.linestats ../inst/mapdata/usa.L
./Lmake 0 s b nz.line nz.linestats ../inst/mapdata/nz.L
./Lmake 0 s b world.line world.linestats ../inst/mapdata/world.L
./Lmake 0 s b world2.line world2.linestats ../inst/mapdata/world2.L
Cannot read left and right at line 1
make[1]: *** [world2.L] Error 1
make[1]: Leaving directory `/tmp/RtmpssTER5/R.INSTALL21eb6525/maps/src'
make: *** [ldata] Error 2
ERROR: compilation failed for package âmapsâ
* removing â/home/brad/R/x86_64-pc-linux-gnu-library/2.13/mapsâ
ERROR: dependency âmapsâ is not available for package âmapprojâ
* removing â/home/brad/R/x86_64-pc-linux-gnu-library/2.13/mapprojâ

The downloaded packages are in
â/tmp/RtmpwXL9El/downloaded_packagesâ
Warning messages:
1: In install.packages(mapproj, dependencies = T) :
  installation of package 'maps' had non-zero exit status
2: In install.packages(mapproj, dependencies = T) :
  installation of package 'mapproj' had non-zero exit status
 




## Any idea as to why?  this also happens when I try to install the 'maps'
package




--
View this message in context: 
http://r.789695.n4.nabble.com/installation-of-package-mapproj-had-non-zero-exit-status-tp3662940p3662940.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] elimination duplicate elements sampling!

2011-07-12 Thread Brian Diggs


On 7/7/2011 3:23 PM, elephann wrote:

Hi everyone!
I have a data frame with 1112 time series and I am going to randomly
sampling r samples for z times to compose different portfolio size(r
securities portfolio). As for r=2 and z=1,that's:
z=1
A=seq(1:1112)
x1=sample(A,z,replace =TRUE)
x2=sample(A,z,replace =TRUE)
M=cbind(x1,x2) # combination of 2 series
Because in a portfolio with x1[i]=x2[i],(i=1,2,...,1) means a 1
securities' portfolio,not 2 securities',it should be eliminated and
resampling. With r increase, for example r=k, how do I efficiently
eliminated all such portfolio as x1[i]=x2[i]=...=xk[i]?


Why not sample without replacement the r portfolios, and replicate that 
z times?


z - 1 # number of replicates
r - 2 # number in each replicate
A - 1:1112 # space to sample from

M - t(replicate(z, sample(A, r)))


Besides, any r securities' portfolio with the same securities' combination
means the same portfolio(given same weights as here), e.g.
M(x1[i],x5[i],x7[i],x1000[i]) and M(x5[i],x7[i],x1[i],x1000[i]) or
M(x1[i],x7[i],x5[i],x1000[i]) are the same, how do I efficiently eliminat
these possibilities?


Do you mean you don't want any of the replicates to be the same?  You 
can eliminate duplicates


M - t(replicate(z, sort(sample(A, r
M - M[!duplicated(M),]

Or you can create all possible portfolios of size r, and sample z from 
that without replacement to do it in one pass.


cmb - t(combn(A, r))
M - cmb[sample(nrow(cmb), z),]

Note this is not practical for r  2. cmb is an array of size r by 
choose(length(A), r) (which is 2 x 617716 in this case).  In fact, for r 
 3, this won't even work with the 1112 sample space.  For r = 3, cmb 
is 3 x 228554920.  But for the three portfolio case, the probability of 
getting a duplicate portfolio is small.


Better is to sample a few extra so that you still have sufficient after 
throwing out duplicates


M - t(replicate(1.01*z, sort(sample(A, r
M - M[!duplicated(M),][1:z,]

The 1.01 multiplier may not be big enough; there is no multiplier that 
will guarantee that you will have z samples when you are done.  Although 
the second line will throw an error if there are not z unique samples, 
so it may be easier to pick up.



--
View this message in context: 
http://r.789695.n4.nabble.com/elimination-duplicate-elements-sampling-tp3652791p3652791.html
Sent from the R help mailing list archive at Nabble.com.


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] What's wrong with my code? (Edited version-added my data)

2011-07-12 Thread Susie

I've written out codes for one particular file, and now I want to generate
the same kind of graphs and files for the rest of similar data files.


For example, a file 8.csv would look like such:

enc_callee  inout   o_duration  type
A   out 342 de
B   in  234 de
C   out 132 de
E   in  111 de
A   in  13  cf
H   in  15.7cf
G   out 32  de
A   out 32  cf
I   in  14  de
K   in  189 de
J   out 34.1cf
B   in  98.7de
H   out 23  de
C   out 43  cf
H   in  567 cf
I   out 12  de
E   out 12  de
K   out 12  cf
B   in  1   cf
A   out 29  de
D   out 89  cf
J   in  302 de
H   in  12  cf
A   in  153 cf
C   out 233 de


My command to deal with this simple file would be:

eight - read.csv(file=8.csv, header=TRUE, sep=,)
eightout - subset(eight, inout==out  o_duration0, select=c(inout,
enc_callee, o_duration))
f - function(eightoutf) nrow(eightoutf)
eightnocalls - ddply(eightout,.(enc_callee),f)
colnames(eightnocalls)[2] - nocalls
eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
eightnocalls$enc_callee)]
eightout=data.frame(eightout,time=c(1:nrow(eightout)))
plot(eightout$time,eightout$nocalls)
write.csv(eightout, eightM.csv, row.names=FALSE)  


And then, R will produce eightM.csv as such:

   inout enc_callee o_duration nocalls time
1out  A  342.0   31
3out  C  132.0   32
7out  G   32.0   13
8out  A   32.0   34
11   out  J   34.1   15
13   out  H   23.0   1   6
14   out  C   43.0   3   7
16   out  I   12.0   1   8
17   out  E   12.0   1  9
18   out  K   12.0   1  10
20   out  A   29.0   3  11
21   out  D   89.0   1 12
25   out  C  233.0   3 13

I will also get a plot
http://r.789695.n4.nabble.com/file/n3662910/eightM.png 


What I want to do now, is that I have a few hundred similar files, and I
want to generate the same type of plots and files, so I've written the
following codes, however, R states that there's some error. I've tried
editing many times but wasn't successful. 


my.files - list.files()
for (i in 1: length(my.files)) {
temp.dat - read.csv(my.files[i])
eight - read.csv(file=8.csv, header=TRUE, sep=,)
eightout - subset(eight, inout==out  o_duration0, select=c(inout,
enc_callee, o_duration))
f - function(eightoutf) nrow(eightoutf)
eightnocalls - ddply(eightout,.(enc_callee),f)
colnames(eightnocalls)[2] - nocalls
eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
eightnocalls$enc_callee)]
eightout=data.frame(eightout,time=c(1:nrow(eightout)))
plot(eightout$time,eightout$nocalls)
write.csv(eightout, eight.csv, row.names=FALSE)   
pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=))
plot(temp.dat$time, temp.dat$nocalls, main=my.files[i])
dev.off() 
write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=),
row.names=FALSE)
}


R says:
need finite 'xlim' values In addition: 
Warning messages: 1: In min(x) : no non-missing arguments to min; returning
Inf 
   2: In max(x) : no non-missing arguments to max;
returning -Inf 
   3: In min(x) : no non-missing arguments to min;
returning Inf 
   4: In max(x) : no non-missing arguments to max;
returning -Inf



I wonder what went wrong with my codes, please help me!
Thank you very much!!

--
View this message in context: 
http://r.789695.n4.nabble.com/What-s-wrong-with-my-code-Edited-version-added-my-data-tp3662910p3662910.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Explain how it gets back out?

2011-07-12 Thread mousy0815

Probability - function(N, f, m, b, x, t) {
#N is the number of lymph nodes
#f is the fraction of Dendritic cells (in the correct node) that have 
the
antigen
#m is the number of time steps
#b is the starting position (somewhere in the node or somewhere in the 
gap
between nodes. It is a number between 1 and (x+t))
#x is the number of time steps it takes to traverse the gap
#t is the number of time steps it takes to traverse a node. 
A - 1/N
B - 1-A
C - 1-f
D - (((m+b-1)%%(x+t))+1) 

if (b=t) {starts inside node
if (m=(t-b)){return(B + A*(C^m))} # start  end in 
first node  
if (D=t) { # we finish in a node
a - (B + A*(C^(t-b))) #first node
b - ((B + A*(C^t))^(floor((m+b)/(x+t))-1))  # 
intermediate nodes (if
any)
c - (B + A*(C^D))  # last node   
d - (a*b*c)
return(d)
} else {Probability(N, f, (m-1), b, x, t)} ## 
finish in a gap   
} else {## starts outside node
if (m=(x+t-b)) {return(1)} #also end in the gap
if (D=t) { #end in a node
b - ((B + A*(C^t))^(floor((m/(x+t)
c - (B + (A*(C^D)))
d - (b*c)
return(d)
} else {Probability(N, f, (m-1), b, x, t)} 
#outside node
}
}   

I have the following code and I know it works, but I need to explain what is
going on, particularly with the recursion.
Is it true that when each call finishes - it will pass a quantity back to
the next generation above until you return to the start of the chain, then
outputs the final result.? If so, could someone explain it in a bit more
clearly? if not, how does the recursion work - how does it finally output a
value?

--
View this message in context: 
http://r.789695.n4.nabble.com/Explain-how-it-gets-back-out-tp3662928p3662928.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting NaN values in localG()

2011-07-12 Thread dab98

Hi,

I'm currently trying to calculate local Getis-Ord Gi* statistics for a
169x315 cell matrix of temperature values, below is the code I currently
have (diffc is the data vector I am removing NaN values from, and I am
moving said values to diffD; -999 represents NaN values; id contains ID
values for cells I want to use in the calculation, which I already know to
contain 25064 values):

/counter = 1
diffD = array(0,25064)
id = array(0,25064)
for(i in 1:53235){
if(diffc[i]!=-999){
diffD[counter] = diffc[i]
id[counter] = i
counter = counter+1
}
}   ##Isolates values I want to use in localG calculation

neigh = cell2nb(169,315,type=='queen')
neigh2 = subset.nb(neigh,(1:length(neigh) %in% id)
mylist = nb2listw(neigh2,style=B)
stats = localG(diffD,mylist)/

Unfortunately, when I get to the last line of the code, I receive the
following error:

/Error in matrix(0, nrow = nrow(x), ncol = ncol(x)) : 
  invalid 'ncol' value (too large or NA)/

I can't figure out what it is referring to, as I have verified that there
are no NA values and ncol should only be 1, as diffD and mylist are the
same size (25064 data regions). My data works when I don't remove the cells
with values of -999, however it returns some ridiculous Z-values (as
expected). All I can think of is that I'm either using subset.nb()
incorrectly or subset.nb() isn't returning a useable nb object in localG().
I'm basically trying to mimic ArcGIS' Hot Spot Analysis to locate cold and
hot spots spatially in this code.

Thanks,
Dan

--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-NaN-values-in-localG-tp3662781p3662781.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric

2011-07-12 Thread Jessica Lam

It works well. Thanks so much.

--
View this message in context: 
http://r.789695.n4.nabble.com/as-numeric-tp3661739p3662671.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to smoothen a geodata set in R

2011-07-12 Thread Tariq

Hello, I'm new to this list. Sorry if my question or parts of it already came
up before.

For my research in geostatistics, I am working with large sets of data in R
(basically large matrices containing discrete x and y coordinates and a
value for a certain parameter). These sets are obtained by kriging. The
operation I'd like to perform is smoothen the output data set. I want to do
it by adding each data point and its 8 surrounding points and dividing this
by nine (gives an average), and then replacing each element in the matrix
with the result.

Question 1: is there a way to address the parameter value of a single
element (for example, the value for element [x=452, y=682] inside the
matrix) and perform an operation on it in R?
Question 2: is there a way to programm R into a loop, so that the same
operation can be performed on all elements inside the matrix?
Question 3: is it a problem if my data is geodata (made with the geoR
library)? 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-smoothen-a-geodata-set-in-R-tp3662902p3662902.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What's wrong with my code? (Edited version-added my data)

Dear Susie,

See inline for some suggestions, but generally, I think you would
benefit from breaking this down into smaller pieces.  The error you
are getting indicates the problem has to do with the plotting, but
that will be trickier to isolate while also dealing with reading in
data, looping, etc.

On Tue, Jul 12, 2011 at 10:11 AM, Susie susiecrab_l...@hotmail.com wrote:
 I've written out codes for one particular file, and now I want to generate
 the same kind of graphs and files for the rest of similar data files.


 For example, a file 8.csv would look like such:

 enc_callee      inout   o_duration      type
 A       out     342     de
 B       in      234     de
 C       out     132     de
 E       in      111     de
 A       in      13      cf
 H       in      15.7    cf
 G       out     32      de
 A       out     32      cf
 I       in      14      de
 K       in      189     de
 J       out     34.1    cf
 B       in      98.7    de
 H       out     23      de
 C       out     43      cf
 H       in      567     cf
 I       out     12      de
 E       out     12      de
 K       out     12      cf
 B       in      1       cf
 A       out     29      de
 D       out     89      cf
 J       in      302     de
 H       in      12      cf
 A       in      153     cf
 C       out     233     de


 My command to deal with this simple file would be:

 eight - read.csv(file=8.csv, header=TRUE, sep=,)
 eightout - subset(eight, inout==out  o_duration0, select=c(inout,
 enc_callee, o_duration))
 f - function(eightoutf) nrow(eightoutf)
 eightnocalls - ddply(eightout,.(enc_callee),f)
 colnames(eightnocalls)[2] - nocalls
 eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
 eightnocalls$enc_callee)]
 eightout=data.frame(eightout,time=c(1:nrow(eightout)))
 plot(eightout$time,eightout$nocalls)
 write.csv(eightout, eightM.csv, row.names=FALSE)


 And then, R will produce eightM.csv as such:

   inout enc_callee o_duration nocalls time
 1    out          A      342.0       3        1
 3    out          C      132.0       3        2
 7    out          G       32.0       1        3
 8    out          A       32.0       3        4
 11   out          J       34.1       1        5
 13   out          H       23.0       1       6
 14   out          C       43.0       3       7
 16   out          I       12.0       1       8
 17   out          E       12.0       1      9
 18   out          K       12.0       1      10
 20   out          A       29.0       3      11
 21   out          D       89.0       1     12
 25   out          C      233.0       3     13

 I will also get a plot
 http://r.789695.n4.nabble.com/file/n3662910/eightM.png


 What I want to do now, is that I have a few hundred similar files, and I
 want to generate the same type of plots and files, so I've written the
 following codes, however, R states that there's some error. I've tried
 editing many times but wasn't successful.


 my.files - list.files()

 for (i in 1: length(my.files)) {
 temp.dat - read.csv(my.files[i])

Maybe Im missing something, but starting here, I do not see anything
that changes with each iteration of your loop.  It will just keep
reading in, editing and writing out 8.csv over and over.  If I'm
right, then you should just move this part outside of the loop so it
is just done once.

 eight - read.csv(file=8.csv, header=TRUE, sep=,)
 eightout - subset(eight, inout==out  o_duration0, select=c(inout,
 enc_callee, o_duration))
 f - function(eightoutf) nrow(eightoutf)
 eightnocalls - ddply(eightout,.(enc_callee),f)
 colnames(eightnocalls)[2] - nocalls
 eightout$nocalls - eightnocalls$nocalls [match(eightout$enc_callee,
 eightnocalls$enc_callee)]
 eightout=data.frame(eightout,time=c(1:nrow(eightout)))
 plot(eightout$time,eightout$nocalls)
 write.csv(eightout, eight.csv, row.names=FALSE)

{end part that does not seem to change}

 pdf(paste(Sys.Date(),_,my.files[i],_.pdf, sep=))
 plot(temp.dat$time, temp.dat$nocalls, main=my.files[i])

From the error, my guess is that the problem is right here.  Try
looking at temp.dat$time and temp.dat$nocalls to see if the data are
appropriate for plotting.  Are any of the pdfs and files getting
produced? If yes, this would strongly suggest that your code is
working, but some of your data files are not plottable.

Something else you could try would be to add str(temp.dat) right after
you read in the data in your loop, this should print out the basic
structure of the data and might give you some clues.

HTH,

Josh

 dev.off()
 write.csv(temp.dat, paste(Sys.Date(),_,my.files[i],_.csv, sep=),
 row.names=FALSE)
 }


 R says:
 need finite 'xlim' values In addition:
 Warning messages: 1: In min(x) : no non-missing arguments to min; returning
 Inf
                           2: In max(x) : no non-missing arguments to max;
 returning -Inf
                           3: In min(x) : no non-missing arguments to min;
 returning Inf
                           4: In max(x) : no non-missing arguments to

[R] Deviance of zeroinfl/hurdle models

2011-07-12 Thread Carson Farmer

Dear list, I'm wondering if anyone can help me calculate the deviance
of either a zeroinfl or hurdle model from package pscl?
Even if someone could point me to the correct formula for calculating
the deviance, I could do the rest on my own.

I am trying to calculate a pseudo-R-squared measure based on the
R^{2}_{DEV} of [1], so I need to be able to calculate the deviance of
the full and null models. Does anyone have any suggestions?
Alternatively, does anyone have a suggestion for a better measure to
report (I'm aware that R^2 measures aren't really appropriate here),
preferably something that is easy enough to program or compute using
existing packages...

Thanks in advance,

Carson

[1] Cameron, A.C., Windmeijer, F.A.G., 1996. R^2 measures for count
data regression models with applications to health-care utilization.
J. Bus. Econom. Statist. 14, 209–220


-- 
Carson J. Q. Farmer
ISSP Doctoral Fellow
National Centre for Geocomputation
National University of Ireland, Maynooth,
http://www.carsonfarmer.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Explain how it gets back out?

2011-07-12 Thread Ted Harding

On 12-Jul-11 17:18:26, mousy0815 wrote:
 Probability - function(N, f, m, b, x, t) {
   #N is the number of lymph nodes
   #f is the fraction of Dendritic cells (in the correct node) that
have
 the
 antigen
   #m is the number of time steps
   #b is the starting position (somewhere in the node or somewhere
in the
 gap
 between nodes. It is a number between 1 and (x+t))
   #x is the number of time steps it takes to traverse the gap
   #t is the number of time steps it takes to traverse a node. 
   A - 1/N
   B - 1-A
   C - 1-f
   D - (((m+b-1)%%(x+t))+1) 
 
   if (b=t) {starts inside node
   if (m=(t-b)){return(B + A*(C^m))} # start  end
in first node  
   if (D=t) { # we finish in a node
   a - (B + A*(C^(t-b))) #first node
   b - ((B +
A*(C^t))^(floor((m+b)/(x+t))-1))  # intermediate nodes
 (if
 any)
   c - (B + A*(C^D))  # last node   
   d - (a*b*c)
   return(d)
   } else {Probability(N, f, (m-1), b, x,
t)} ## finish in a gap   
   } else {## starts outside node
   if (m=(x+t-b)) {return(1)} #also end in the gap
   if (D=t) { #end in a node
   b - ((B + A*(C^t))^(floor((m/(x+t)
   c - (B + (A*(C^D)))
   d - (b*c)
   return(d)
   } else {Probability(N, f, (m-1), b, x,
t)} #outside node
   }
   }   
 
 I have the following code and I know it works, but I need to
 explain what is going on, particularly with the recursion.
 Is it true that when each call finishes - it will pass a
 quantity back to the next generation above until you return
 to the start of the chain, then outputs the final result.?
 If so, could someone explain it in a bit more clearly? if not,
 how does the recursion work - how does it finally output a
 value?
 --

This is a generic reply, rather than referring to your specific
code above.

The most succinct definition of recursion is in Ted's Dictionary:

*Recursion*
  If you understand *Recursion*, then stop reading now and
  do something else.
  Otherwise, see *Recursion.

(I have found that this goes down well in lectures. It presupposes
however that the reader will eventually catch on, so the definition
is not suitable for the infinitely stupid -- which is perhaps a
realistic assumption in a lecture context).

The really important element in the above definition is the initial
escape clause (which, by the above assumption, will eventually
be realised). A proper recursive definition must include something
which will eventually cause it to return a result to level above.

The structure of the process which occurs when a recursive function
is called can be illustrated by a function to compute n! (the factorial
of a a positive integer n):

factorial - function(n) {
  if(n==0) return(1) else## Escape clause
  return( n*factorial(n-1) )
}

So what happens when you call 'factorial(3)' is:

  n==3 so !(n==0) so
  return(3*(
n==2 so !(n==0) so
  return(2*(
n==1 so !(n==0) so
  return(1*(
n==0 so return 1
 1*(1) = 1
 2*(1) = 2
 3*(2) = 6
  return(6)


Another way of looking at it is that each successive call
opens a *( in the expression

  (3*(3-1=2*(2-1=1*(1-1==0 - 1 | Escape clause activated here
= (3*(2*(1*(1 ...

and then backing up through the levels completes each level
with a ) and passes up the result, resulting successively in

  (3*(2*(1*(1) ...
= (3*(2*(1*1 ...
= (3*(2*(1 ...

  (3*(2*(1) ...
= (3*(2*1
= (3*(2 ...

  (3*(2) ...
= (3*2 ...
= (6 ...

  (6)
= 6

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-11   Time: 20:02:47
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grey colored lines and overwriting labels i qqplot2

2011-07-12 Thread Brian Diggs


Merging two posts (data and questions); see inline below.

On 7/11/2011 7:55 PM, Sigrid wrote:

Thank you, Dennis.


This is my regenerated dput codes. They should be correct as I closed off R
and re-ran them based on the dput output.


NB, this is the test dataset used later


structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), treatment =
structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L,
5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L,
5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L,
7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L,
6L, 6L, 6L, 7L, 7L, 7L), .Label = c(A, B, C, D, E,
F, G), class = factor), total = c(135L, 118L, 121L, 64L,
53L, 49L, 178L, 123L, 128L, 127L, 62L, 129L, 126L, 99L, 183L,
45L, 57L, 45L, 72L, 30L, 71L, 123L, 89L, 102L, 60L, 44L, 59L,
124L, 145L, 126L, 103L, 67L, 97L, 66L, 76L, 108L, 36L, 48L, 41L,
69L, 47L, 57L, 167L, 136L, 176L, 85L, 36L, 82L, 222L, 149L, 171L,
145L, 122L, 192L, 136L, 164L, 154L, 46L, 57L, 57L, 70L, 55L,
102L, 111L, 152L, 204L, 41L, 46L, 103L, 156L, 148L, 155L, 103L,
124L, 176L, 111L, 142L, 187L, 43L, 52L, 75L, 64L, 91L, 78L, 196L,
314L, 265L, 44L, 39L, 98L, 197L, 273L, 274L, 89L, 91L, 74L, 91L,
112L, 98L, 140L, 90L, 121L, 120L, 161L, 83L, 230L, 266L, 282L,
35L, 53L, 57L, 315L, 332L, 202L, 90L, 79L, 89L, 67L, 116L, 109L,
44L, 68L, 75L, 29L, 52L, 52L, 253L, 203L, 87L, 105L, 234L, 152L,
247L, 243L, 144L, 167L, 165L, 95L, 300L, 128L, 125L, 84L, 183L,
88L, 153L, 185L, 175L, 226L, 216L, 118L, 118L, 94L, 224L, 259L,
176L, 175L, 147L, 197L, 141L, 176L, 187L, 87L, 92L, 148L, 86L,
139L, 122L), country = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c(high, low), class = factor)), .Names = c(year,
treatment, total, country), class = data.frame, row.names = c(NA,
-167L))

I hope be useful for you when giving me a hand with my difficulties.



On 7/9/2011 8:24 PM, Sigrid wrote:

I created this graph in ggplot and added ablines to the different facets by
specifying with subset commands.  As you might see, there are still a few
issues.


1.) I would like to have the diamonds in a grey scale instead of colors. I
accomplished this (see graph 2) until I overwrote the label title for the
treatments and the colors came back (graph 1). I used these two commands:
p=ggplot(data = test, aes(x = YEAR, y = TOTAL, colour = TREATMENT)) +
geom_point() + facet_wrap(~country)+scale_colour_grey()+
scale_y_continuous(number of votes)+ scale_x_continuous(Years)+
scale_x_continuous(breaks=1:4) + scale_colour_hue(breaks='A', labels='label
A')+ scale_colour_hue(breaks='B', labels='label B')

How can I keep the grey scale, but avoid changing back to colors when using
the scale_colour_hue command?


You should only have one scale_ call for each scale type.  Here, you 
have three scale_colour_ calls, the first selecting a grey scale, the 
second defining a single break with its label (and thus implicitly 
subsetting on that single break value), and a second which defines a 
different break/label/subset.  Only the last one has any effect.



http://r.789695.n4.nabble.com/file/n3657119/color_graph.gif


2.) Furthermore, only one of the overwritten labels of the treatments came
up, despite putting in two commands (graph 1).  What could have happened
here?

p +

[R] Generating a histogram with R

2011-07-12 Thread a217

Hello,

I have a sample file:

chr22   100 150 125 21  0.145   +
chr22   200 300 212 13  0.05+
chr22   345 365 351 12  0.09+
chr22   500 750 510 15  0.10+
chr22   500 750 642 9   0.02+
chr22   800 900 850 10  0.05+


where I need to generate a histogram from the data in column 6 (i.e. 0.145,
0.05, etc.). To make it easier to read, I would plot the data as 1-0.05=0.95
for all of the data in column 6.

What I would like to know is how to generate a histogram with the data from
one file? Also, would I be able to generate one histogram from multiple
files as well (with the same format)?

For example, I have multiple files in the same format as the sample file
above, and I would like to make one histogram for all column six data in all
files.

Thank you,
a217

--
View this message in context: 
http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating a histogram with R

2011-07-12 Thread Bert Gunter

Hello:

R has an extensive Help system. Please learn to use it.

?histogram
?help

Also see the online manual tutorial An Introduction to R

-- Bert

On Tue, Jul 12, 2011 at 12:41 PM, a217 aj...@case.edu wrote:
 Hello,

 I have a sample file:

 chr22   100     150     125     21      0.145   +
 chr22   200     300     212     13      0.05    +
 chr22   345     365     351     12      0.09    +
 chr22   500     750     510     15      0.10    +
 chr22   500     750     642     9       0.02    +
 chr22   800     900     850     10      0.05    +


 where I need to generate a histogram from the data in column 6 (i.e. 0.145,
 0.05, etc.). To make it easier to read, I would plot the data as 1-0.05=0.95
 for all of the data in column 6.

 What I would like to know is how to generate a histogram with the data from
 one file? Also, would I be able to generate one histogram from multiple
 files as well (with the same format)?

 For example, I have multiple files in the same format as the sample file
 above, and I would like to make one histogram for all column six data in all
 files.

 Thank you,
 a217

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating a histogram with R



On Jul 12, 2011, at 3:41 PM, a217 wrote:


Hello,

I have a sample file:

chr22   100 150 125 21  0.145   +
chr22   200 300 212 13  0.05+
chr22   345 365 351 12  0.09+
chr22   500 750 510 15  0.10+
chr22   500 750 642 9   0.02+
chr22   800 900 850 10  0.05+


where I need to generate a histogram from the data in column 6 (i.e.  
0.145,
0.05, etc.). To make it easier to read, I would plot the data as  
1-0.05=0.95

for all of the data in column 6.


That makes no sense to me, unless  you want to pre-multiply all values  
by 0.95.




What I would like to know is how to generate a histogram with the  
data from
one file? Also, would I be able to generate one histogram from  
multiple

files as well (with the same format)?


?hist
?histogram (# lattice

There are a ton of worked examples in the Archives. Learn to search.  
Reasonable search terms once you get to Barons site with RSiteSearch  
are (after setting the web interface to get r-help postings): grouped  
histogram




For example, I have multiple files in the same format as the sample  
file
above, and I would like to make one histogram for all column six  
data in all

files.


Also there are a ton of worked examples in the archive dealing with  
accessing multiple files.




Thank you,
a217

--
View this message in context: 
http://r.789695.n4.nabble.com/Generating-a-histogram-with-R-tp3663350p3663350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deviance of zeroinfl/hurdle models

2011-07-12 Thread Ben Bolker

Carson Farmer carson.farmer at gmail.com writes:

 
 Dear list, I'm wondering if anyone can help me calculate the deviance
 of either a zeroinfl or hurdle model from package pscl?
 Even if someone could point me to the correct formula for calculating
 the deviance, I could do the rest on my own.


What about 

library(pscl)
example(hurdle)
-2*logLik(fm_hnb2) 

 ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] when to use `which'?

2011-07-12 Thread Sam Steingold

when do I need to use which()?
 a - c(1,2,3,4,5,6)
 a
[1] 1 2 3 4 5 6
 a[a==4]
[1] 4
 a[which(a==4)]
[1] 4
 which(a==4)
[1] 4
 a[which(a2)]
[1] 3 4 5 6
 a[a2]
[1] 3 4 5 6


seems unnecessary...

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 
11.0.60900031
http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com
http://truepeace.org http://thereligionofpeace.com
Good programmers treat Microsoft products as damage and route around it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to smoothen a geodata set in R

2011-07-12 Thread Daniel Malter

To answer your questions:

Yes, yes, and probably no. You will have to pick up any introductory manual
of R where questions 1 and 2 will be discussed.

For 1: you index x as in x[452,682]. For 2: there are ways to write (and
avoid) loops in R (e.g. for or while loops). Often avoidance is preferable
because R is not very fast looping. For 3: I cannot say for sure, but as
long as you data is in matrix format or something that can be coerced to it,
no.

Daniel


Tariq wrote:
 
 Hello, I'm new to this list. Sorry if my question or parts of it already
 came up before.
 
 For my research in geostatistics, I am working with large sets of data in
 R (basically large matrices containing discrete x and y coordinates and a
 value for a certain parameter). These sets are obtained by kriging. The
 operation I'd like to perform is smoothen the output data set. I want to
 do it by adding each data point and its 8 surrounding points and dividing
 this by nine (gives an average), and then replacing each element in the
 matrix with the result.
 
 Question 1: is there a way to address the parameter value of a single
 element (for example, the value for element [x=452, y=682] inside the
 matrix) and perform an operation on it in R?
 Question 2: is there a way to programm R into a loop, so that the same
 operation can be performed on all elements inside the matrix?
 Question 3: is it a problem if my data is geodata (made with the geoR
 library)?
 

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-smoothen-a-geodata-set-in-R-tp3662902p3663432.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] when to use `which'?

2011-07-12 Thread Bert Gunter

Well ...
which(a==4)^2

??

-- Bert

On Tue, Jul 12, 2011 at 1:17 PM, Sam Steingold s...@gnu.org wrote:
 when do I need to use which()?
 a - c(1,2,3,4,5,6)
 a
 [1] 1 2 3 4 5 6
 a[a==4]
 [1] 4
 a[which(a==4)]
 [1] 4
 which(a==4)
 [1] 4
 a[which(a2)]
 [1] 3 4 5 6
 a[a2]
 [1] 3 4 5 6


 seems unnecessary...

 --
 Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 
 11.0.60900031
 http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com
 http://truepeace.org http://thereligionofpeace.com
 Good programmers treat Microsoft products as damage and route around it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] when to use `which'?



On Jul 12, 2011, at 4:17 PM, Sam Steingold wrote:


when do I need to use which()?

a - c(1,2,3,4,5,6)
a

[1] 1 2 3 4 5 6

a[a==4]

[1] 4

a[which(a==4)]

[1] 4

which(a==4)

[1] 4

a[which(a2)]

[1] 3 4 5 6

a[a2]

[1] 3 4 5 6




seems unnecessary...



It is unnecessary when `a` is a toy case and has no NA's. And you will  
find some of the cognoscenti trying to correct you when you do use  
which().


a - c(1,2, NA ,3,4, NaN, 5,6)

 data.frame(lets= letters[1:8], stringsAsFactors=FALSE)[a0, ]
[1] a b NA  d e NA  g h
 data.frame(lets= letters[1:8], stringsAsFactors=FALSE)[which(a0), ]
[1] a b d e g h

If you have millions of records and tens of thousands of NA's (say ~  
1% of the data), imagine what your console looks like if you try to  
pick out records from one day and get 10,000 where you were expecting  
100. A real PITA when you are doing real work.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] when to use `which'?