Re: [R] data manipulation question

2021-08-23 Thread Jim Lemon
Hi Kai,
How about setting:

germlinepatients$DisclosureStatus <- NA

then having your three conditional statements as indices:

germlinepatients$DisclosureStatus[germlinepatients$gl_resultsdisclosed
== 1] <-"DISCLOSED"
germlinepatients$DisclosureStatus[germlinepatients$
gl_resultsdisclosed == 0] <- "ATTEMPTED"
 germlinepatients$DisclosureStatus[is.na(germlinepatients$gl_resultsdisclosed) &
 germlinepatients$gl_discloseattempt1 != "ATTEMPTED"] <-"ATTEMPTED"

I know it's not elegant and you could join the last two statements
with OR (|) but it may work.

Jim

On Tue, Aug 24, 2021 at 9:22 AM Kai Yang via R-help
 wrote:
>
> Hello List,
> I wrote the script below to assign value to a new field DisclosureStatus.
> my goal is if gl_resultsdisclosed=1 then DisclosureStatus=DISCLOSED
> else if gl_resultsdisclosed=0 then DisclosureStatus= ATTEMPTED
> else if gl_resultsdisclosed is missing and gl_discloseattempt1 is not missing 
> then DisclosureStatus= ATTEMPTED
> else missing
>
>
> germlinepatients$DisclosureStatus <-
>   ifelse(germlinepatients$gl_resultsdisclosed==1, "DISCLOSED",
> ifelse(germlinepatients$ gl_resultsdisclosed==0, "ATTEMPTED",
>ifelse(is.na(germlinepatients$gl_resultsdisclosed) & 
> germlinepatients$gl_discloseattempt1!='', "ATTEMPTED",
>NA)))
>
> the first 3 row give me right result, but the 3rd row does not. After 
> checking the data, there are 23 cases are gl_resultsdisclosed is missing and 
> gl_discloseattempt1 is not missing.  the code doesn't has any error message.
> Please help
> thank you
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-19 Thread Duncan Mackay
, 210.8, 203.6, 175.2, 
168.7, 155.9, 147.3, 137, 141.1, 167.4, 160.2, 191.9, 174.4, 
208.2, 159.4, 161.1, 172.1, 158.4, 114.6, 159.6, 159.7, 159.4, 
160.7, 165.5, 205, 205.2, 141.6, 148.1, 184.9, 132.5, 137.3, 
135.5, 121.7, 166.1, 146.8, 162.8, 186.8, 185.5, 151.5, 158.1, 
143, 151.2, 147.6, 130.7, 137.5, 146.1, 133.6, 167.9, 181.9, 
202, 166.5, 151.3, 146.2, 148.3, 144.7, 123.6, 151.6, 133.9, 
137.4, 181.6, 182, 190, 161.2, 155.5, 141.9, 164.6, 136.2, 126.8, 
152.5, 126.6, 150.1, 186.3, 147.5, 200.4, 177.2, 127.4, 177.1, 
154.4, 135.2, 126.4, 147.3, 140.6, 152.3, 151.2, 172.2, 215.3, 
154.1, 159.3, 160.4, 151.9, 148.4, 139.6, 148.2, 153.5, 145.1, 
183.7, 210.5, 203.3, 153.3, 144.3, 169.6, 143.7, 160, 135.5, 
141.7, 159.9, 145.6, 183.4, 198.1, 186.7, 171.9, 150.5, 163, 
153.6, 152.8, 135.4, 148.3, 148.3, 133.5, 193.8, 208.4, 197), 
elec = c(1497L, 1463L, 1648L, 1595L, 1777L, 1824L, 1994L, 
1835L, 1787L, 1699L, 1633L, 1645L, 1597L, 1577L, 1709L, 1756L, 
1936L, 2052L, 2105L, 2016L, 1914L, 1925L, 1824L, 1765L, 1721L, 
1752L, 1914L, 1857L, 2159L, 2195L, 2287L, 2276L, 2096L, 2055L, 
2004L, 1924L, 1851L, 1839L, 2019L, 1937L, 2270L, 2251L, 2382L, 
2364L, 2129L, 2110L, 2072L, 1980L, 1995L, 1932L, 2171L, 2162L, 
2489L, 2424L, 2641L, 2630L, 2324L, 2412L, 2284L, 2186L, 2184L, 
2144L, 2379L, 2383L, 2717L, 2774L, 3051L, 2891L, 2613L, 2600L, 
2493L, 2410L, 2390L, 2463L, 2616L, 2734L, 2970L, 3125L, 3342L, 
3207L, 2964L, 2919L, 2764L, 2732L, 2622L, 2698L, 2950L, 2895L, 
3200L, 3408L, 3679L, 3473L, 3154L, 3107L, 3052L, 2918L, 2786L, 
2739L, 3125L, 3033L, 3486L, 3661L, 3927L, 3851L, 3456L, 3390L, 
3280L, 3166L, 3080L, 3069L, 3340L, 3310L, 3798L, 3883L, 4191L, 
4213L, 3766L, 3628L, 3520L, 3322L, 3250L, 3287L, 3552L, 3440L, 
4153L, 4265L, 4655L, 4492L, 4051L, 3967L, 3807L, 3639L, 3647L, 
3560L, 3929L, 3858L, 4485L, 4697L, 4977L, 4675L, 4596L, 4491L, 
4127L, 4144L, 4014L, 3994L, 4320L, 4400L, 5002L, 5091L, 5471L, 
5193L, 4997L, 4737L, 4546L, 4498L, 4350L, 4206L, 4743L, 4582L, 
5191L, 5457L, 5891L, 5618L, 5158L, 5030L, 4800L, 4654L, 4453L, 
4440L, 4945L, 4788L, 5425L, 5706L, 6061L, 5846L, 5242L, 5408L, 
5114L, 5042L, 5008L, 4657L, 5359L, 5193L, 5891L, 5980L, 6390L, 
6366L, 5756L, 5640L, 5429L, 5398L, 5413L, 5141L, 5695L, 5554L, 
6369L, 6592L, 7107L, 6917L, 6353L, 6205L, 5830L, 5646L, 5379L, 
5489L, 5824L, 5907L, 6482L, 6795L, 7028L, 6776L, 6274L, 6362L, 
5940L, 5958L, 5769L, 5887L, 6367L, 6165L, 6868L, 7201L, 7601L, 
7581L, 7090L, 6841L, 6408L, 6435L, 6176L, 6138L, 6717L, 6470L, 
7312L, 7763L, 8171L, 7788L, 7311L, 6679L, 6704L, 6724L, 6552L, 
6427L, 7105L, 6869L, 7683L, 8082L, 8555L, 8386L, 7553L, 7398L, 
7112L, 6886L, 7077L, 6820L, 7426L, 7143L, 8261L, 8240L, 8977L, 
8991L, 8026L, 7911L, 7510L, 7381L, 7366L, 7414L, 7824L, 7524L, 
8279L, 8707L, 9486L, 8973L, 8231L, 8206L, 7927L, 7999L, 7834L, 
7521L, 8284L, 7999L, 8940L, 9381L, 10078L, 9796L, 8471L, 
8572L, 8150L, 8168L, 8166L, 7903L, 8606L, 8071L, 9178L, 9873L, 
10476L, 9296L, 8818L, 8697L, 8381L, 8293L, 7942L, 8001L, 
8744L, 8397L, 9115L, 9773L, 10358L, 9849L, 9083L, 9143L, 
8800L, 8741L, 8492L, 8795L, 9354L, 8796L, 10072L, 10174L, 
11326L, 10744L, 9806L, 9740L, 9373L, 9244L, 9407L, 8827L, 
9880L, 9364L, 10580L, 10899L, 11687L, 11280L, 10208L, 10212L, 
9725L, 9721L, 9846L, 9407L, 10265L, 9970L, 10801L, 11246L, 
12167L, 11578L, 10645L, 10613L, 10104L, 10348L, 10263L, 9973L, 
10803L, 10409L, 11458L, 11845L, 12559L, 12070L, 11221L, 11338L, 
10761L, 11012L, 10923L, 10790L, 11427L, 10788L, 11772L, 12104L, 
12634L, 12772L, 11764L, 11956L, 11646L, 11750L, 11485L, 11198L, 
12265L, 11704L, 12419L, 13259L, 13945L, 13839L, 12387L, 12546L, 
12038L, 11977L, 12336L, 11793L, 12877L, 11923L, 13306L, 13988L, 
14002L, 14338L, 12867L, 12761L, 12449L, 12658L)), .Names = c("choc", 
"beer", "elec"), class = "data.frame", row.names = c(NA, -396L
))

Regards

Duncan

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Monday, 19 December 2016 13:47
To: Duncan Mackay
Cc: R
Subject: Re: [R] data manipulation


> On Dec 18, 2016, at 5:39 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
> 
> 
> Hi David
> 
> Thanks for the info. 
> As a test I am attaching it anyway

Nothing allowed as an attachment by the server.

--
David.
> 
> Regards
> 
> Duncan
> 
> -Original Message-
> From: David Winsemius [mailto:dwinsem...@comcast.net] 
> Sent: Monday, 19 December 2016 05:36
> To: Duncan Mackay
> Cc: R
> Subject: Re: [R] data manipulation
> 
> 
>> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
>> 
>> Hi 
>> 
>> Coming late to the discussion  - I deleted the original message
>> I found that I have a cbe.dat that I dow

Re: [R] data manipulation

2016-12-18 Thread David Winsemius

> On Dec 18, 2016, at 3:31 PM, peter dalgaard <pda...@gmail.com> wrote:
> 
> 
>> On 18 Dec 2016, at 19:36 , David Winsemius <dwinsem...@comcast.net> wrote:
>> 
>>> 
>>> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
>>> 
>>> Hi 
>>> 
>>> Coming late to the discussion  - I deleted the original message
>>> I found that I have a cbe.dat that I downloaded some years ago from
>>> cowpertwaite's site .
>>> 
>>> And have attached it
>> 
>> Experience has shown that when you attach a file that you hope to be 
>> distributed to the list it needs to have a .txt extension. Leaving it with a 
>> .csv, .tsv, or .dat extension will cause it to be dropped by the server, 
>> even if the contents of the file are ASCII text.
> 
> That's not the actual mechanism, as far as I understand. If the content-type 
> is text/plain, the server will pass it through just fine. It's the mail 
> program at the sender end that refuses to send .csv files and friends as 
> text/plain, based on the extension. But the net result is essentially the 
> same.

My hypothesis is that all mail clients in common use fail to label the .csv, 
.dat and .tsv files as text/plain if this explanation is correct. I've been 
watching the "behavior" of our mail-server for several years now and have not 
seen _any_ files with an extension other than .txt be successfully passed to 
subscribers. So clarifying this issue might be possible _if_ we could find a 
mail client for which we were certain of its labeling characteristics. Does 
your mail client label such files as text/plain?


>> The URL I offered earlier should have made the file available:
>> https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat
>> 
>> ... but if there is interest in having it in the Rhelp Archive, I can attach 
>> it.
>> 
>> 
> 
> Would be good if someone could ask the authors about what is going on. Better 
> if someone could check licencing issues and put the data somewhere permanent. 
> Still better: convert them to an R package.

It appeared to me that the authors lost their entire university-hosted website. 
I was unable to find another email address for sending a query to Paul 
Cowpertwait.


-- 
David. 


> 
>> 
>> -- 
>> David.
>>> 
>>> If it does not get through will do a dput as the file is only 7K
>>> 
>>> Regards
>>> 
>>> Duncan
>>> Duncan Mackay
>>> Department of Agronomy and Soil Science
>>> University of New England
>>> Armidale NSW 2351
>>> Email: home: mac...@northnet.com.au
>>> 
>>> -Original Message-
>>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
>>> Sent: Thursday, 15 December 2016 01:19
>>> To: Farshad Fathian; r-help
>>> Subject: Re: [R] data manipulation
>>> 
>>> Hello,
>>> 
>>> Please cc your mails to the list.
>>> As for your data, your url is wrong, you need to contact Massey or maybe 
>>> the source of your information and get a valid internet address.
>>> Without one there's not much we can do.
>>> 
>>> Rui Barradas
>>> 
>>> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>>>> Hello,
>>>> 
>>>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>>>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>>>> don't access to the input data from
>>>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>>>> 
>>>> Regards,
>>>> 
>>>> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
>>>> <mailto:ruipbarra...@sapo.pt>> wrote:
>>>> 
>>>>  Hello,
>>>> 
>>>>  What do you mean by "gives me something"?
>>>> 
>>>>  xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>>  <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>>>  Error in file(file, "rt") : cannot open the connection
>>>>  In addition: Warning message:
>>>>  In file(file, "rt") :
>>>> cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>>  <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
>>>>  Not Found'
>>>> 
>>>>  Rui Barrada

Re: [R] data manipulation

2016-12-18 Thread David Winsemius

> On Dec 18, 2016, at 5:39 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
> 
> 
> Hi David
> 
> Thanks for the info. 
> As a test I am attaching it anyway

Nothing allowed as an attachment by the server.

--
David.
> 
> Regards
> 
> Duncan
> 
> -Original Message-
> From: David Winsemius [mailto:dwinsem...@comcast.net] 
> Sent: Monday, 19 December 2016 05:36
> To: Duncan Mackay
> Cc: R
> Subject: Re: [R] data manipulation
> 
> 
>> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
>> 
>> Hi 
>> 
>> Coming late to the discussion  - I deleted the original message
>> I found that I have a cbe.dat that I downloaded some years ago from
>> cowpertwaite's site .
>> 
>> And have attached it
> 
> Experience has shown that when you attach a file that you hope to be
> distributed to the list it needs to have a .txt extension. Leaving it with a
> .csv, .tsv, or .dat extension will cause it to be dropped by the server,
> even if the contents of the file are ASCII text.
> 
> The URL I offered earlier should have made the file available:
> https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul
> -Cowpertwait/ts/cbe.dat
> 
> ... but if there is interest in having it in the Rhelp Archive, I can attach
> it.
> 
> 
> 
> -- 
> David.
>> 
>> If it does not get through will do a dput as the file is only 7K
>> 
>> Regards
>> 
>> Duncan
>> Duncan Mackay
>> Department of Agronomy and Soil Science
>> University of New England
>> Armidale NSW 2351
>> Email: home: mac...@northnet.com.au
>> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui
> Barradas
>> Sent: Thursday, 15 December 2016 01:19
>> To: Farshad Fathian; r-help
>> Subject: Re: [R] data manipulation
>> 
>> Hello,
>> 
>> Please cc your mails to the list.
>> As for your data, your url is wrong, you need to contact Massey or maybe 
>> the source of your information and get a valid internet address.
>> Without one there's not much we can do.
>> 
>> Rui Barradas
>> 
>> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>>> Hello,
>>> 
>>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>>> don't access to the input data from
>>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>>> 
>>> Regards,
>>> 
>>> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
>>> <mailto:ruipbarra...@sapo.pt>> wrote:
>>> 
>>>   Hello,
>>> 
>>>   What do you mean by "gives me something"?
>>> 
>>>   xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>>   Error in file(file, "rt") : cannot open the connection
>>>   In addition: Warning message:
>>>   In file(file, "rt") :
>>>  cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
>>>   Not Found'
>>> 
>>>   Rui Barradas
>>> 
>>> 
>>>   Em 14-12-2016 11:56, John Kane via R-help escreveu:
>>> 
>>>   xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>>   gives me something. Since we have no idea of what you are doing
>>>   I don't know if the data has downloaded correctly
>>> 
>>> On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
>>>   <farshad.fath...@gmail.com <mailto:farshad.fath...@gmail.com>>
>>>   wrote:
>>> 
>>> 
>>>  Hi,
>>> 
>>> 
>>> 
>>>   I couldn't access to data file about PSCoperwait by
>>>   http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>.
>>> 
>>> 
>>> 
>>>   Looking forward to hearing from you,
>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>>   __
>>>   R-help@r-project.org <mailto:R-hel

Re: [R] data manipulation

2016-12-18 Thread Duncan Mackay

Hi David

Thanks for the info. 
As a test I am attaching it anyway

Regards

Duncan

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Monday, 19 December 2016 05:36
To: Duncan Mackay
Cc: R
Subject: Re: [R] data manipulation


> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
> 
> Hi 
> 
> Coming late to the discussion  - I deleted the original message
> I found that I have a cbe.dat that I downloaded some years ago from
> cowpertwaite's site .
> 
> And have attached it

Experience has shown that when you attach a file that you hope to be
distributed to the list it needs to have a .txt extension. Leaving it with a
.csv, .tsv, or .dat extension will cause it to be dropped by the server,
even if the contents of the file are ASCII text.

The URL I offered earlier should have made the file available:
https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul
-Cowpertwait/ts/cbe.dat

... but if there is interest in having it in the Rhelp Archive, I can attach
it.



-- 
David.
> 
> If it does not get through will do a dput as the file is only 7K
> 
> Regards
> 
> Duncan
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui
Barradas
> Sent: Thursday, 15 December 2016 01:19
> To: Farshad Fathian; r-help
> Subject: Re: [R] data manipulation
> 
> Hello,
> 
> Please cc your mails to the list.
> As for your data, your url is wrong, you need to contact Massey or maybe 
> the source of your information and get a valid internet address.
> Without one there's not much we can do.
> 
> Rui Barradas
> 
> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>> Hello,
>> 
>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>> don't access to the input data from
>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>> 
>> Regards,
>> 
>> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
>> <mailto:ruipbarra...@sapo.pt>> wrote:
>> 
>>Hello,
>> 
>>What do you mean by "gives me something"?
>> 
>>xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>Error in file(file, "rt") : cannot open the connection
>>In addition: Warning message:
>>In file(file, "rt") :
>>   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
>>Not Found'
>> 
>>Rui Barradas
>> 
>> 
>>Em 14-12-2016 11:56, John Kane via R-help escreveu:
>> 
>>xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>gives me something. Since we have no idea of what you are doing
>>I don't know if the data has downloaded correctly
>> 
>>  On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
>><farshad.fath...@gmail.com <mailto:farshad.fath...@gmail.com>>
>>wrote:
>> 
>> 
>>   Hi,
>> 
>> 
>> 
>>I couldn't access to data file about PSCoperwait by
>>http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>.
>> 
>> 
>> 
>>Looking forward to hearing from you,
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>>__
>>R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>-- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>><https://stat.ethz.ch/mailman/listinfo/r-help>
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>><http://www.R-project.org/posting-guide.html>
>>and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>>__
>>R-help@r-project.org <mailt

Re: [R] data manipulation

2016-12-18 Thread peter dalgaard

> On 18 Dec 2016, at 19:36 , David Winsemius <dwinsem...@comcast.net> wrote:
> 
>> 
>> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
>> 
>> Hi 
>> 
>> Coming late to the discussion  - I deleted the original message
>> I found that I have a cbe.dat that I downloaded some years ago from
>> cowpertwaite's site .
>> 
>> And have attached it
> 
> Experience has shown that when you attach a file that you hope to be 
> distributed to the list it needs to have a .txt extension. Leaving it with a 
> .csv, .tsv, or .dat extension will cause it to be dropped by the server, even 
> if the contents of the file are ASCII text.

That's not the actual mechanism, as far as I understand. If the content-type is 
text/plain, the server will pass it through just fine. It's the mail program at 
the sender end that refuses to send .csv files and friends as text/plain, based 
on the extension. But the net result is essentially the same.

> The URL I offered earlier should have made the file available:
> https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat
> 
> ... but if there is interest in having it in the Rhelp Archive, I can attach 
> it.
> 
> 

Would be good if someone could ask the authors about what is going on. Better 
if someone could check licencing issues and put the data somewhere permanent. 
Still better: convert them to an R package.

> 
> -- 
> David.
>> 
>> If it does not get through will do a dput as the file is only 7K
>> 
>> Regards
>> 
>> Duncan
>> Duncan Mackay
>> Department of Agronomy and Soil Science
>> University of New England
>> Armidale NSW 2351
>> Email: home: mac...@northnet.com.au
>> 
>> -----Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
>> Sent: Thursday, 15 December 2016 01:19
>> To: Farshad Fathian; r-help
>> Subject: Re: [R] data manipulation
>> 
>> Hello,
>> 
>> Please cc your mails to the list.
>> As for your data, your url is wrong, you need to contact Massey or maybe 
>> the source of your information and get a valid internet address.
>> Without one there's not much we can do.
>> 
>> Rui Barradas
>> 
>> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>>> Hello,
>>> 
>>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>>> don't access to the input data from
>>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>>> 
>>> Regards,
>>> 
>>> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
>>> <mailto:ruipbarra...@sapo.pt>> wrote:
>>> 
>>>   Hello,
>>> 
>>>   What do you mean by "gives me something"?
>>> 
>>>   xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>>   Error in file(file, "rt") : cannot open the connection
>>>   In addition: Warning message:
>>>   In file(file, "rt") :
>>>  cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
>>>   Not Found'
>>> 
>>>   Rui Barradas
>>> 
>>> 
>>>   Em 14-12-2016 11:56, John Kane via R-help escreveu:
>>> 
>>>   xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>>   gives me something. Since we have no idea of what you are doing
>>>   I don't know if the data has downloaded correctly
>>> 
>>> On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
>>>   <farshad.fath...@gmail.com <mailto:farshad.fath...@gmail.com>>
>>>   wrote:
>>> 
>>> 
>>>  Hi,
>>> 
>>> 
>>> 
>>>   I couldn't access to data file about PSCoperwait by
>>>   http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>>>   <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>.
>>> 
>>> 
>>> 
>>>   Looking forward to hearing from you,
>>> 
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>>   _

Re: [R] data manipulation

2016-12-18 Thread David Winsemius
4650165.6   6679
4446198.6   6704
3061201.5   6724
2155170.7   6552
4274164.4   6427
4695179.7   7105
4362157 6869
4889168 7683
5370139.3   8082
5072138.6   8555
4985153.4   8386
3978138.9   7553
4139172.1   7398
3995198.4   7112
3025217.8   6886
1949173.7   7077
4357153.8   6820
4638175.6   7426
3994147.1   7143
6174160.3   8261
5656135.2   8240
4411148.8   8977
5504151 8991
4463148.2   8026
4458182.2   7911
4528189.2   7510
2830183.1   7381
1843170 7366
5042158.4   7414
5348176.1   7824
5257156.2   7524
6699153.2   8279
5388117.9   8707
6001149.8   9486
5966156.6   8973
4845166.7   8231
4507156.8   8206
4214158.6   7927
3460210.8   7999
1833203.6   7834
4978175.2   7521
6464168.7   8284
5820155.9   7999
6447147.3   8940
6191137 9381
6628141.1   10078
5452167.4   9796
5295160.2   8471
5080191.9   8572
5564174.4   8150
3965208.2   8168
2062159.4   8166
5099161.1   7903
6162172.1   8606
5529158.4   8071
6416114.6   9178
6382159.6   9873
5624159.7   10476
5785159.4   9296
4644160.7   8818
5331165.5   8697
5143205 8381
4596205.2   8293
2180141.6   7942
5786148.1   8001
5840184.9   8744
5666132.5   8397
6360137.3   9115
6219135.5   9773
6082121.7   10358
5653166.1   9849
5726146.8   9083
5049162.8   9143
5859186.8   8800
4091185.5   8741
2167151.5   8492
6480158.1   8795
7375143 9354
6583151.2   8796
7251147.6   10072
6730130.7   10174
6428137.5   11326
5228146.1   10744
4716133.6   9806
6101167.9   9740
5753181.9   9373
4000202 9244
2691166.5   9407
5898151.3   8827
6526146.2   9880
5840148.3   9364
6650144.7   10580
5717123.6   10899
7236151.6   11687
6523133.9   11280
5729137.4   10208
6004181.6   10212
5950182 9725
4690190 9721
3687161.2   9846
7791155.5   9407
7153141.9   10265
6434164.6   9970
7850136.2   10801
6809126.8   11246
8379152.5   12167
6914126.6   11578
6919150.1   10645
7265186.3   10613
6994147.5   10104
5503200.4   10348
3782177.2   10263
7502127.4   9973
8119177.1   10803
7292154.4   10409
6886135.2   11458
7049126.4   11845
7977147.3   12559
8519140.6   12070
6680152.3   11221
7994151.2   11338
7047172.2   10761
5782215.3   11012
3771154.1   10923
7906159.3   10790
8970160.4   11427
6077151.9   10788
7919148.4   11772
7340139.6   12104
7791148.2   12634
7368153.5   12772
8255145.1   11764
7816183.7   11956
7476210.5   11646
6696203.3   11750
4484153.3   11485
8274144.3   11198
8866169.6   12265
8572143.7   11704
9176160 12419
8645135.5   13259
8265141.7   13945
9558159.9   13839
7037145.6   12387
9101183.4   12546
8180198.1   12038
7072186.7   11977
3832171.9   12336
7253150.5   11793
8667163 12877
7658153.6   11923
8859152.8   13306
7291135.4   13988
7529148.3   14002
8715148.3   14338
8450133.5   12867
9085193.8   12761
8350208.4   12449
7080197 12658



> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
> 
> Hi 
> 
> Coming late to the discussion  - I deleted the original message
> I found that I have a cbe.dat that I downloaded some years ago from
> cowpertwaite's site .
> 
> And have attached it
> 
> If it does not get through will do a dput as the file is only 7K
> 
> Regards
> 
> Duncan
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
> Sent: Thursday, 15 December 2016 01:19
> To: Farshad Fathian; r-help
> Subject: Re: [R] data manipulation
> 
> Hello,
> 
> Please cc your mails to the list.
> As for your data, your url is wrong, you need to contact Massey or maybe 
> the source of your information and get a valid internet address.
> Without one there's not much we can do.
> 
> Rui Barradas
> 
> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>> Hello,
>> 
>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>> don't access to the input data from
>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>> 
>> Regards,
>> 
>>

Re: [R] data manipulation

2016-12-18 Thread David Winsemius

> On Dec 17, 2016, at 7:57 PM, Duncan Mackay <dulca...@bigpond.com> wrote:
> 
> Hi 
> 
> Coming late to the discussion  - I deleted the original message
> I found that I have a cbe.dat that I downloaded some years ago from
> cowpertwaite's site .
> 
> And have attached it

Experience has shown that when you attach a file that you hope to be 
distributed to the list it needs to have a .txt extension. Leaving it with a 
.csv, .tsv, or .dat extension will cause it to be dropped by the server, even 
if the contents of the file are ASCII text.

The URL I offered earlier should have made the file available:
https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat

... but if there is interest in having it in the Rhelp Archive, I can attach it.



-- 
David.
> 
> If it does not get through will do a dput as the file is only 7K
> 
> Regards
> 
> Duncan
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
> Sent: Thursday, 15 December 2016 01:19
> To: Farshad Fathian; r-help
> Subject: Re: [R] data manipulation
> 
> Hello,
> 
> Please cc your mails to the list.
> As for your data, your url is wrong, you need to contact Massey or maybe 
> the source of your information and get a valid internet address.
> Without one there's not much we can do.
> 
> Rui Barradas
> 
> Em 14-12-2016 12:16, Farshad Fathian escreveu:
>> Hello,
>> 
>> Thanks for your e-mail. I was reading "Introductory Time Series with R"
>> by PS. Cowperwait. I am going to run the R codes in this book, but I
>> don't access to the input data from
>> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>> 
>> Regards,
>> 
>> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
>> <mailto:ruipbarra...@sapo.pt>> wrote:
>> 
>>Hello,
>> 
>>What do you mean by "gives me something"?
>> 
>>xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>Error in file(file, "rt") : cannot open the connection
>>In addition: Warning message:
>>In file(file, "rt") :
>>   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
>>Not Found'
>> 
>>Rui Barradas
>> 
>> 
>>Em 14-12-2016 11:56, John Kane via R-help escreveu:
>> 
>>xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
>>gives me something. Since we have no idea of what you are doing
>>I don't know if the data has downloaded correctly
>> 
>>  On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
>><farshad.fath...@gmail.com <mailto:farshad.fath...@gmail.com>>
>>wrote:
>> 
>> 
>>   Hi,
>> 
>> 
>> 
>>I couldn't access to data file about PSCoperwait by
>>http://massey.ac.nz/~pscoperwait/ts/cbe.dat
>><http://massey.ac.nz/~pscoperwait/ts/cbe.dat>.
>> 
>> 
>> 
>>Looking forward to hearing from you,
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>>__
>>R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>-- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>><https://stat.ethz.ch/mailman/listinfo/r-help>
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>><http://www.R-project.org/posting-guide.html>
>>and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>>__
>>R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>-- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>><https://stat.ethz.ch/mailman/listinfo/r-help>
>>PLEASE do read the posting guide

Re: [R] data manipulation

2016-12-17 Thread Duncan Mackay
Hi 

Coming late to the discussion  - I deleted the original message
I found that I have a cbe.dat that I downloaded some years ago from
cowpertwaite's site .

And have attached it

If it does not get through will do a dput as the file is only 7K

Regards

Duncan
Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
Sent: Thursday, 15 December 2016 01:19
To: Farshad Fathian; r-help
Subject: Re: [R] data manipulation

Hello,

Please cc your mails to the list.
As for your data, your url is wrong, you need to contact Massey or maybe 
the source of your information and get a valid internet address.
Without one there's not much we can do.

Rui Barradas

Em 14-12-2016 12:16, Farshad Fathian escreveu:
> Hello,
>
> Thanks for your e-mail. I was reading "Introductory Time Series with R"
> by PS. Cowperwait. I am going to run the R codes in this book, but I
> don't access to the input data from
> ("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>") website.
>
> Regards,
>
> On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas <ruipbarra...@sapo.pt
> <mailto:ruipbarra...@sapo.pt>> wrote:
>
> Hello,
>
> What do you mean by "gives me something"?
>
> xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
> Error in file(file, "rt") : cannot open the connection
> In addition: Warning message:
> In file(file, "rt") :
>cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>': HTTP status was '404
> Not Found'
>
> Rui Barradas
>
>
> Em 14-12-2016 11:56, John Kane via R-help escreveu:
>
> xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>")
> gives me something. Since we have no idea of what you are doing
> I don't know if the data has downloaded correctly
>
>   On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
> <farshad.fath...@gmail.com <mailto:farshad.fath...@gmail.com>>
> wrote:
>
>
>Hi,
>
>
>
> I couldn't access to data file about PSCoperwait by
> http://massey.ac.nz/~pscoperwait/ts/cbe.dat
> <http://massey.ac.nz/~pscoperwait/ts/cbe.dat>.
>
>
>
> Looking forward to hearing from you,
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>  [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-15 Thread John Kane via R-help
That should read "now" instead of "not"
 

On Thursday, December 15, 2016 6:49 PM, John Kane  
wrote:
 

 It downloaded a file for me earlier but I am not getting the 404 error and I 
did not bother to save the download.  Shrug.
 

On Wednesday, December 14, 2016 6:57 AM, John Kane via R-help 
 wrote:
 

 xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
gives me something. Since we have no idea of what you are doing I don't know if 
the data has downloaded correctly 

    On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
 wrote:
 

 Hi,

 

I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.

 

Looking forward to hearing from you,


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   

   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data manipulation

2016-12-15 Thread John Kane via R-help
It downloaded a file for me earlier but I am not getting the 404 error and I 
did not bother to save the download.  Shrug.
 

On Wednesday, December 14, 2016 6:57 AM, John Kane via R-help 
 wrote:
 

 xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
gives me something. Since we have no idea of what you are doing I don't know if 
the data has downloaded correctly 

    On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
 wrote:
 

 Hi,

 

I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.

 

Looking forward to hearing from you,


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data manipulation

2016-12-14 Thread Rui Barradas

OK, to the op: don't use read.csv, use read.table. Like this:

URL <- 
"https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat;

xx <- read.table(URL, header = TRUE)
str(xx)

Hope this helps,

Rui Barradas

Em 14-12-2016 20:03, David Winsemius escreveu:



On Dec 14, 2016, at 7:17 AM, David L Carlson <dcarl...@tamu.edu> wrote:

It seems to be a data set for use with Introductory Time Series with R by P S 
Cowpertwait and A V Metcalfe. It is not just the file that is missing, the 
whole folder is missing:

The requested URL /~pscoperwait/ was not found on this server.

The Springer website for the book indicates the data sets are located at

http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/

but there is no server at staff.elena.aut.ac.nz. A web search turns up this 
link:

http://www.maths.adelaide.edu.au/andrew.metcalfe/

But the link to cbe.dat and the other data sets are dead.


There were images of them in the Wayback Machine. This appears to be the one 
originally sought.

https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat

Many others can be found with this search:

https://web.archive.org/web/*/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/*




-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
Sent: Wednesday, December 14, 2016 6:12 AM
To: John Kane; Farshad Fathian; r-h...@stat.math.ethz.ch
Subject: Re: [R] data manipulation

Hello,

What do you mean by "gives me something"?

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat': HTTP
status was '404 Not Found'

Rui Barradas


Em 14-12-2016 11:56, John Kane via R-help escreveu:

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
gives me something. Since we have no idea of what you are doing I don't know if 
the data has downloaded correctly

 On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
<farshad.fath...@gmail.com> wrote:


  Hi,



I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.



Looking forward to hearing from you,


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-14 Thread David Winsemius

> On Dec 14, 2016, at 7:17 AM, David L Carlson <dcarl...@tamu.edu> wrote:
> 
> It seems to be a data set for use with Introductory Time Series with R by P S 
> Cowpertwait and A V Metcalfe. It is not just the file that is missing, the 
> whole folder is missing:
> 
> The requested URL /~pscoperwait/ was not found on this server.
> 
> The Springer website for the book indicates the data sets are located at
> 
> http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/
> 
> but there is no server at staff.elena.aut.ac.nz. A web search turns up this 
> link:
> 
> http://www.maths.adelaide.edu.au/andrew.metcalfe/
> 
> But the link to cbe.dat and the other data sets are dead.

There were images of them in the Wayback Machine. This appears to be the one 
originally sought.

https://web.archive.org/web/20130501161812/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat

Many others can be found with this search:

https://web.archive.org/web/*/http://staff.elena.aut.ac.nz/Paul-Cowpertwait/*


> 
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
> 
> 
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
> Sent: Wednesday, December 14, 2016 6:12 AM
> To: John Kane; Farshad Fathian; r-h...@stat.math.ethz.ch
> Subject: Re: [R] data manipulation
> 
> Hello,
> 
> What do you mean by "gives me something"?
> 
> xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
> Error in file(file, "rt") : cannot open the connection
> In addition: Warning message:
> In file(file, "rt") :
>   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat': HTTP 
> status was '404 Not Found'
> 
> Rui Barradas
> 
> 
> Em 14-12-2016 11:56, John Kane via R-help escreveu:
>> xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
>> gives me something. Since we have no idea of what you are doing I don't know 
>> if the data has downloaded correctly
>> 
>> On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
>> <farshad.fath...@gmail.com> wrote:
>> 
>> 
>>  Hi,
>> 
>> 
>> 
>> I couldn't access to data file about PSCoperwait by
>> http://massey.ac.nz/~pscoperwait/ts/cbe.dat.
>> 
>> 
>> 
>> Looking forward to hearing from you,
>> 
>> 
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-14 Thread David L Carlson
It seems to be a data set for use with Introductory Time Series with R by P S 
Cowpertwait and A V Metcalfe. It is not just the file that is missing, the 
whole folder is missing:

The requested URL /~pscoperwait/ was not found on this server.

The Springer website for the book indicates the data sets are located at

http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/

but there is no server at staff.elena.aut.ac.nz. A web search turns up this 
link:

http://www.maths.adelaide.edu.au/andrew.metcalfe/

But the link to cbe.dat and the other data sets are dead.

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
Sent: Wednesday, December 14, 2016 6:12 AM
To: John Kane; Farshad Fathian; r-h...@stat.math.ethz.ch
Subject: Re: [R] data manipulation

Hello,

What do you mean by "gives me something"?

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat': HTTP 
status was '404 Not Found'

Rui Barradas


Em 14-12-2016 11:56, John Kane via R-help escreveu:
> xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
> gives me something. Since we have no idea of what you are doing I don't know 
> if the data has downloaded correctly
>
>  On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
> <farshad.fath...@gmail.com> wrote:
>
>
>   Hi,
>
>
>
> I couldn't access to data file about PSCoperwait by
> http://massey.ac.nz/~pscoperwait/ts/cbe.dat.
>
>
>
> Looking forward to hearing from you,
>
>
>  [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-14 Thread Rui Barradas

Hello,

Please cc your mails to the list.
As for your data, your url is wrong, you need to contact Massey or maybe 
the source of your information and get a valid internet address.

Without one there's not much we can do.

Rui Barradas

Em 14-12-2016 12:16, Farshad Fathian escreveu:

Hello,

Thanks for your e-mail. I was reading "Introductory Time Series with R"
by PS. Cowperwait. I am going to run the R codes in this book, but I
don't access to the input data from
("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
") website.

Regards,

On Wed, Dec 14, 2016 at 3:42 PM, Rui Barradas > wrote:

Hello,

What do you mean by "gives me something"?

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
   cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat
': HTTP status was '404
Not Found'

Rui Barradas


Em 14-12-2016 11:56, John Kane via R-help escreveu:

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat
")
gives me something. Since we have no idea of what you are doing
I don't know if the data has downloaded correctly

  On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian
>
wrote:


   Hi,



I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat
.



Looking forward to hearing from you,


  [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-14 Thread Rui Barradas

Hello,

What do you mean by "gives me something"?

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open URL 'http://massey.ac.nz/~pscoperwait/ts/cbe.dat': HTTP 
status was '404 Not Found'


Rui Barradas


Em 14-12-2016 11:56, John Kane via R-help escreveu:

xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
gives me something. Since we have no idea of what you are doing I don't know if 
the data has downloaded correctly

 On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
 wrote:


  Hi,



I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.



Looking forward to hearing from you,


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2016-12-14 Thread John Kane via R-help
xx <- read.csv("http://massey.ac.nz/~pscoperwait/ts/cbe.dat;)
gives me something. Since we have no idea of what you are doing I don't know if 
the data has downloaded correctly 

On Tuesday, December 13, 2016 1:38 PM, Farshad Fathian 
 wrote:
 

 Hi,

 

I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.

 

Looking forward to hearing from you,


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data manipulation

2016-12-13 Thread John McKown
On Tue, Dec 13, 2016 at 3:23 AM, Farshad Fathian 
wrote:

> Hi,
>
> I couldn't access to data file about PSCoperwait by
> http://massey.ac.nz/~pscoperwait/ts/cbe.dat.
>

​First off, this post is nearly useless. You don't tell us what you tried
to do. And you didn't tell us what error message you got.​


​From a fast test, the reason is that the above URL is invalid.. It gets a
404 error. That is "requested document does not exist." You can't read that
which does not exist. Why doesn't it exist? I don't know - ask Massey
University of New Zealand.


>
> Looking forward to hearing from you,
>
>
>


-- 
Heisenberg may have been here.

http://xkcd.com/1770/

Maranatha! <><
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data manipulation

2016-12-13 Thread Rui Barradas

Hello,

And what has your question to do with R?
Please read the posting guide before posting and when you do, post a 
question where at least the link is correct.


Rui Barradas


Em 13-12-2016 09:23, Farshad Fathian escreveu:

Hi,



I couldn't access to data file about PSCoperwait by
http://massey.ac.nz/~pscoperwait/ts/cbe.dat.



Looking forward to hearing from you,


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-21 Thread ioanna ioannou
Thank you very much. One further question. 

Assuming that for some points there is no classification for example:

A-data.frame(A=c(10,100,1000,30,50,60,300,3),

  B=c(0,1,1,1,0,0,0,0),

  C=c(0,0,0,0,1,1,0,0),

  D=c(1,0,0,0,0,0,1,0))

Is there an easy way to introduce an extra none option in the variable?

A-data.frame(A=c(10,100,1000,30,50,60,300,3),

  B=c(0,1,1,1,0,0,0,0),

  C=c(0,0,0,0,1,1,0,0),

  D=c(1,0,0,0,0,0,1,0),

   Variable=c(D,B,B,B,C,C,D,none))

Thanks in advance, 
IOanna

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: 21 February 2014 00:19
To: r-help@r-project.org
Cc: ioanna ioannou
Subject: Re: [R] Data manipulation in a data.frame

Also,
rownames(which(t(!!A[,-1]),arr.ind=TRUE))
A.K.




On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:

A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
A.K.



On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com
wrote:
Hello,





Assuming that I have a data frame 

A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1))



What I would like is to introduce a new column Variable such that:



A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1),

       Variable=c(D,B,B,B,C,C,D)) 



How can I do it?



Best 

IOanna


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-21 Thread Bert Gunter
This is easy to do in the approach I showed.

Instead of:

 names(A)[-1][as.matrix(A[,-1])%*%(seq_len(ncol(A)-1))]

modify it to:

 c(none,names(A)[-1])[as.matrix(A[,-1])%*%seq_len(ncol(A)-1)+1]

[1] DBBBCCDnone

Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Fri, Feb 21, 2014 at 1:44 AM, ioanna ioannou ii54...@msn.com wrote:
 Thank you very much. One further question.

 Assuming that for some points there is no classification for example:

 A-data.frame(A=c(10,100,1000,30,50,60,300,3),

   B=c(0,1,1,1,0,0,0,0),

   C=c(0,0,0,0,1,1,0,0),

   D=c(1,0,0,0,0,0,1,0))

 Is there an easy way to introduce an extra none option in the variable?

 A-data.frame(A=c(10,100,1000,30,50,60,300,3),

   B=c(0,1,1,1,0,0,0,0),

   C=c(0,0,0,0,1,1,0,0),

   D=c(1,0,0,0,0,0,1,0),

Variable=c(D,B,B,B,C,C,D,none))

 Thanks in advance,
 IOanna

 -Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: 21 February 2014 00:19
 To: r-help@r-project.org
 Cc: ioanna ioannou
 Subject: Re: [R] Data manipulation in a data.frame

 Also,
 rownames(which(t(!!A[,-1]),arr.ind=TRUE))
 A.K.




 On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
 Hi,
 May be this helps:

 A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
 A.K.



 On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com
 wrote:
 Hello,





 Assuming that I have a data frame

 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1))



 What I would like is to introduce a new column Variable such that:



 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1),

Variable=c(D,B,B,B,C,C,D))



 How can I do it?



 Best

 IOanna


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-21 Thread arun
Hi IOanna,

Do you have rows with multiple '1's?  If not, you could also try:
A$Variable - c(none,names(A)[-1])[1+with(A,B+2*C+3*D)]
A.K.




On Friday, February 21, 2014 4:44 AM, ioanna ioannou ii54...@msn.com wrote:
Thank you very much. One further question. 

Assuming that for some points there is no classification for example:

A-data.frame(A=c(10,100,1000,30,50,60,300,3),

              B=c(0,1,1,1,0,0,0,0),                        

              C=c(0,0,0,0,1,1,0,0),

              D=c(1,0,0,0,0,0,1,0))

Is there an easy way to introduce an extra none option in the variable?

A-data.frame(A=c(10,100,1000,30,50,60,300,3),

              B=c(0,1,1,1,0,0,0,0),                        

              C=c(0,0,0,0,1,1,0,0),

              D=c(1,0,0,0,0,0,1,0),

       Variable=c(D,B,B,B,C,C,D,none))

Thanks in advance, 
IOanna


-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: 21 February 2014 00:19
To: r-help@r-project.org
Cc: ioanna ioannou
Subject: Re: [R] Data manipulation in a data.frame

Also,
rownames(which(t(!!A[,-1]),arr.ind=TRUE))
A.K.




On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:

A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
A.K.



On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com
wrote:
Hello,





Assuming that I have a data frame 

A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1))



What I would like is to introduce a new column Variable such that:



A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1),

       Variable=c(D,B,B,B,C,C,D)) 



How can I do it?



Best 

IOanna


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-21 Thread Bert Gunter
This merely translates the matrix multiplication I used into explicit
arithmetic!

Nor does it generalize without extra manipulation to get the correct
arithmetic expression.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Fri, Feb 21, 2014 at 9:25 AM, arun smartpink...@yahoo.com wrote:
 Hi IOanna,

 Do you have rows with multiple '1's?  If not, you could also try:
 A$Variable - c(none,names(A)[-1])[1+with(A,B+2*C+3*D)]
 A.K.




 On Friday, February 21, 2014 4:44 AM, ioanna ioannou ii54...@msn.com wrote:
 Thank you very much. One further question.

 Assuming that for some points there is no classification for example:

 A-data.frame(A=c(10,100,1000,30,50,60,300,3),

   B=c(0,1,1,1,0,0,0,0),

   C=c(0,0,0,0,1,1,0,0),

   D=c(1,0,0,0,0,0,1,0))

 Is there an easy way to introduce an extra none option in the variable?

 A-data.frame(A=c(10,100,1000,30,50,60,300,3),

   B=c(0,1,1,1,0,0,0,0),

   C=c(0,0,0,0,1,1,0,0),

   D=c(1,0,0,0,0,0,1,0),

Variable=c(D,B,B,B,C,C,D,none))

 Thanks in advance,
 IOanna


 -Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: 21 February 2014 00:19
 To: r-help@r-project.org
 Cc: ioanna ioannou
 Subject: Re: [R] Data manipulation in a data.frame

 Also,
 rownames(which(t(!!A[,-1]),arr.ind=TRUE))
 A.K.




 On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
 Hi,
 May be this helps:

 A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
 A.K.



 On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com
 wrote:
 Hello,





 Assuming that I have a data frame

 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1))



 What I would like is to introduce a new column Variable such that:



 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1),

Variable=c(D,B,B,B,C,C,D))



 How can I do it?



 Best

 IOanna


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-21 Thread arun
Hi Ioanna,
If you need to paste the colnames if there are multiple 1's per row:
You could try:
A-data.frame(A=c(10,100,1000,30,50,60,300,3,4,2,20,35,45),B=c(0,1,1,1,0,0,0,0,0,1,0,0,1),C=c(0,0,0,0,1,1,0,0,0,0,1,1,1),D=c(1,0,0,0,0,0,1,0,0,1,NA,1,1))
apply(A[,-1],1,function(x) {x1 -paste(colnames(A[,-1])[x  
!is.na(x)],collapse=,); x1[x1=='']- none;x1})
#[1] D B B B C C D none  none 
#[10] B,D   C C,D   B,C,D



#or Bert's method with some modification:
 
c(none,names(A)[-1],B,D,C,D,B,C,D)[c(as.matrix(!!A[,-1]!is.na(A[,-1]))%*%seq_len(ncol(A)-1)+1)]
# [1] D B B B C C D none  none 
#[10] B,D   C C,D   B,C,D
  

But, in this case, you may need to check if the combinations are there or not 
in the dataset, Otherwise

For e.g.
 
c(none,names(A)[-1],apply(combn(LETTERS[2:4],2),2,paste,collapse=,),B,C,D)[c(as.matrix(!!A[,-1]!is.na(A[,-1]))%*%seq_len(ncol(A)-1)+1)]
# [1] D    B    B    B    C    C    D    none none B,C 
#[11] C    B,D  C,D 


A.K.




On Friday, February 21, 2014 4:20 PM, ioanna ioannou ii54...@msn.com wrote:
Hello Arun, 

Actually I do have rows with multiple 1s. Could you advise how to modify the
code then?

Thanks in advance, 

Best
IOanna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-20 Thread arun
Hi,
May be this helps:

A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
A.K.


On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com wrote:
Hello,





Assuming that I have a data frame 

A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1))



What I would like is to introduce a new column Variable such that:



A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1),

       Variable=c(D,B,B,B,C,C,D)) 



How can I do it?



Best 

IOanna


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-20 Thread arun
Also,
rownames(which(t(!!A[,-1]),arr.ind=TRUE))
A.K.




On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
Hi,
May be this helps:

A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
A.K.



On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com wrote:
Hello,





Assuming that I have a data frame 

A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1))



What I would like is to introduce a new column Variable such that:



A-data.frame(A=c(10,100,1000,30,50,60,300),

              B=c(0,1,1,1,0,0,0),                        

              C=c(0,0,0,0,1,1,0),

              D=c(1,0,0,0,0,0,1),

       Variable=c(D,B,B,B,C,C,D)) 



How can I do it?



Best 

IOanna


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in a data.frame

2014-02-20 Thread Bert Gunter
... and yet another approach (written for generalization)

  names(A)[-1][as.matrix(A[,-1])%*%(seq_len(ncol(A)-1))]

[1] D B B B C C D

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Feb 20, 2014 at 4:19 PM, arun smartpink...@yahoo.com wrote:
 Also,
 rownames(which(t(!!A[,-1]),arr.ind=TRUE))
 A.K.




 On Thursday, February 20, 2014 6:48 PM, arun smartpink...@yahoo.com wrote:
 Hi,
 May be this helps:

 A$Variable - rep(colnames(A[,-1]),nrow(A))[t(!!A[,-1])]
 A.K.



 On Thursday, February 20, 2014 5:55 PM, ioanna ioannou ii54...@msn.com 
 wrote:
 Hello,





 Assuming that I have a data frame

 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1))



 What I would like is to introduce a new column Variable such that:



 A-data.frame(A=c(10,100,1000,30,50,60,300),

   B=c(0,1,1,1,0,0,0),

   C=c(0,0,0,0,1,1,0),

   D=c(1,0,0,0,0,0,1),

Variable=c(D,B,B,B,C,C,D))



 How can I do it?



 Best

 IOanna


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread Patrick Burns

I think a list is the wrong structure,
a vector would be better since you can
use 'match':

# transform data structure:
neutralVec - unlist(neutral_classes)

names(neutralVec) - 
names(neutral_classes[rep(1:length(neutral_classes), 
sapply(neutral_classes, length))]


# get one or more results with 'match':
names(neutralVec[match(c(50, 20, 10, -4), neutralVec)])

# result:
# [1] B D D NA


Pat


On 22/11/2013 16:58, philippe massicotte wrote:

Hi everyone.
I have a list like this:
neutral_classes = list(A = 71:100, B = 46:70, C = 21:45, D = 0:20)
and I'm trying to return the letter of the named vector for with an integer 
belong. For example, B if I use the value 50.
Any help would be greatly appreciated.
Regards,Phil
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread arun
Hi,
You could use either:
names(which(sapply(lapply(neutral_classes,`%in%`,50),any)))
#[1] B


#or
vec1 -unlist(neutral_classes)
 names(vec1) - gsub(\\d+,,names(vec1))
 names(vec1)[vec1==50]
#[1] B



A.K.


Hi everyone. 
I have a list like this: 
neutral_classes = list(A = 71:100, B = 46:70, C = 21:45, D = 0:20) 
and I'm trying to return the letter of the named vector for with an integer 
belong. For example, B if I use the value 50. 
Any help would be greatly appreciated. 
Regards,Phil      

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread Patrick Burns

The final ) went missing in the command
starting 'names(neutralVec) - '.

On 22/11/2013 17:51, Patrick Burns wrote:

I think a list is the wrong structure,
a vector would be better since you can
use 'match':

# transform data structure:
neutralVec - unlist(neutral_classes)

names(neutralVec) -
names(neutral_classes[rep(1:length(neutral_classes),
sapply(neutral_classes, length))]

# get one or more results with 'match':
names(neutralVec[match(c(50, 20, 10, -4), neutralVec)])

# result:
# [1] B D D NA


Pat


On 22/11/2013 16:58, philippe massicotte wrote:

Hi everyone.
I have a list like this:
neutral_classes = list(A = 71:100, B = 46:70, C = 21:45, D = 0:20)
and I'm trying to return the letter of the named vector for with an
integer belong. For example, B if I use the value 50.
Any help would be greatly appreciated.
Regards,Phil
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread philippe massicotte
Thank you everyone for your suggestions.

 Date: Fri, 22 Nov 2013 18:04:16 +
 From: pbu...@pburns.seanet.com
 To: pmassico...@hotmail.com; r-help@r-project.org
 Subject: Re: [R] data manipulation
 
 The final ) went missing in the command
 starting 'names(neutralVec) - '.
 
 On 22/11/2013 17:51, Patrick Burns wrote:
  I think a list is the wrong structure,
  a vector would be better since you can
  use 'match':
 
  # transform data structure:
  neutralVec - unlist(neutral_classes)
 
  names(neutralVec) -
  names(neutral_classes[rep(1:length(neutral_classes),
  sapply(neutral_classes, length))]
 
  # get one or more results with 'match':
  names(neutralVec[match(c(50, 20, 10, -4), neutralVec)])
 
  # result:
  # [1] B D D NA
 
 
  Pat
 
 
  On 22/11/2013 16:58, philippe massicotte wrote:
  Hi everyone.
  I have a list like this:
  neutral_classes = list(A = 71:100, B = 46:70, C = 21:45, D = 0:20)
  and I'm trying to return the letter of the named vector for with an
  integer belong. For example, B if I use the value 50.
  Any help would be greatly appreciated.
  Regards,Phil
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Patrick Burns
 pbu...@pburns.seanet.com
 twitter: @burnsstat @portfolioprobe
 http://www.portfolioprobe.com/blog
 http://www.burns-stat.com
 (home of:
   'Impatient R'
   'The R Inferno'
   'Tao Te Programming')
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread philippe massicotte
Hi again.
I realized that the example I give is not valid, because the number I'm using 
is not an integer (ex. 50.1).
So I thought using 
is.between = function(x, a, b) {  (x - a)  *  (b - x)  0}
But I'm not sure how to use it with lapply to avoid looping in my code.
Regards,Phil
 From: pmassico...@hotmail.com
 To: r-help@r-project.org
 Date: Fri, 22 Nov 2013 18:17:03 +
 Subject: Re: [R] data manipulation
 
 Thank you everyone for your suggestions.
 
  Date: Fri, 22 Nov 2013 18:04:16 +
  From: pbu...@pburns.seanet.com
  To: pmassico...@hotmail.com; r-help@r-project.org
  Subject: Re: [R] data manipulation
  
  The final ) went missing in the command
  starting 'names(neutralVec) - '.
  
  On 22/11/2013 17:51, Patrick Burns wrote:
   I think a list is the wrong structure,
   a vector would be better since you can
   use 'match':
  
   # transform data structure:
   neutralVec - unlist(neutral_classes)
  
   names(neutralVec) -
   names(neutral_classes[rep(1:length(neutral_classes),
   sapply(neutral_classes, length))]
  
   # get one or more results with 'match':
   names(neutralVec[match(c(50, 20, 10, -4), neutralVec)])
  
   # result:
   # [1] B D D NA
  
  
   Pat
  
  
   On 22/11/2013 16:58, philippe massicotte wrote:
   Hi everyone.
   I have a list like this:
   neutral_classes = list(A = 71:100, B = 46:70, C = 21:45, D = 0:20)
   and I'm trying to return the letter of the named vector for with an
   integer belong. For example, B if I use the value 50.
   Any help would be greatly appreciated.
   Regards,Phil
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
  -- 
  Patrick Burns
  pbu...@pburns.seanet.com
  twitter: @burnsstat @portfolioprobe
  http://www.portfolioprobe.com/blog
  http://www.burns-stat.com
  (home of:
'Impatient R'
'The R Inferno'
'Tao Te Programming')
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-11-22 Thread David Carlson
You probably want to use cut(), but as currently stated, your
intervals leave gaps (between 20 and 21 for example):

set.seed(42)
values - runif(25)*100
values
 [1] 91.480604 93.707541 28.613953 83.044763 64.174552 51.909595
73.658831
 [8] 13.40 65.699229 70.506478 45.774178 71.911225 93.467225
25.542882
[15] 46.229282 94.001452 97.822643 11.748736 47.499708 56.033275
90.403139
[22] 13.871017 98.889173 94.666823  8.243756
 code - cut(values, breaks=c(-1, 20, 45, 70, 100),
labels=LETTERS[4:1])
 code
 [1] A A C A B B A D B A B A A C B A A D B B A D A A D
Levels: D C B A

The levels are defined as (-1,20], (20,45], (45,70], (70,100] so
the second interval includes anything larger than 20 up to and
including 45.

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of philippe
massicotte
Sent: Friday, November 22, 2013 12:27 PM
To: r-help@r-project.org
Subject: Re: [R] data manipulation

Hi again.
I realized that the example I give is not valid, because the
number I'm using is not an integer (ex. 50.1).
So I thought using 
is.between = function(x, a, b) {  (x - a)  *  (b - x)  0}
But I'm not sure how to use it with lapply to avoid looping in
my code.
Regards,Phil
 From: pmassico...@hotmail.com
 To: r-help@r-project.org
 Date: Fri, 22 Nov 2013 18:17:03 +
 Subject: Re: [R] data manipulation
 
 Thank you everyone for your suggestions.
 
  Date: Fri, 22 Nov 2013 18:04:16 +
  From: pbu...@pburns.seanet.com
  To: pmassico...@hotmail.com; r-help@r-project.org
  Subject: Re: [R] data manipulation
  
  The final ) went missing in the command
  starting 'names(neutralVec) - '.
  
  On 22/11/2013 17:51, Patrick Burns wrote:
   I think a list is the wrong structure,
   a vector would be better since you can
   use 'match':
  
   # transform data structure:
   neutralVec - unlist(neutral_classes)
  
   names(neutralVec) -
   names(neutral_classes[rep(1:length(neutral_classes),
   sapply(neutral_classes, length))]
  
   # get one or more results with 'match':
   names(neutralVec[match(c(50, 20, 10, -4), neutralVec)])
  
   # result:
   # [1] B D D NA
  
  
   Pat
  
  
   On 22/11/2013 16:58, philippe massicotte wrote:
   Hi everyone.
   I have a list like this:
   neutral_classes = list(A = 71:100, B = 46:70, C = 21:45,
D = 0:20)
   and I'm trying to return the letter of the named vector
for with an
   integer belong. For example, B if I use the value 50.
   Any help would be greatly appreciated.
   Regards,Phil
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained,
reproducible code.
  
  
  
  -- 
  Patrick Burns
  pbu...@pburns.seanet.com
  twitter: @burnsstat @portfolioprobe
  http://www.portfolioprobe.com/blog
  http://www.burns-stat.com
  (home of:
'Impatient R'
'The R Inferno'
'Tao Te Programming')
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation in R

2013-10-21 Thread Anamika Chaudhuri
Hi Arun:

Thanks for your help. Seperate files are being created by concatenating the
rows from the two files but I was looking to have them as columns rather
than text. This is the way it appears in Excel with row # at the beginning.
X Y1 Y2
1 1 4 0 20 17 1 20 52 15 18

Ideally I would like it to look like
X Y1 Y2
1 4 20
1 0 52
1 20 15
1 17 18
Thanks again!
Anamika

On Mon, Oct 21, 2013 at 1:11 AM, arun smartpink...@yahoo.com wrote:

 Hi,
 May be this helps:
 Y1 - read.table(text=V1 V2 V3 V4
 1 4 0 20 17
 2 4 0 15 17
 3 2 0 13 21,sep=,header=TRUE)

 Y2 - read.table(text=V1 V2 V3 V4
 1 20 52 15 18
 2 18 54 14 21
 3 18 51 13 21,sep=,header=TRUE)
  res - lapply(seq_len(nrow(Y1)),function(i) {dat -
 data.frame(X=i,Y1=unlist(Y1[i,]),Y2=unlist(Y2[i,])); row.names(dat) -
 1:nrow(dat);
 write.csv(dat,paste0(file,i,.csv),row.names=FALSE,quote=FALSE)})


 A.K.





 A.K.




 On Monday, October 21, 2013 12:24 AM, Anamika Chaudhuri 
 canam...@gmail.com wrote:
 Hi:

 I am looking for some help to manipulate data in R. I have two csv files.

 datasetY1
 V1 V2 V3 V4
 1 4 0 20 17
 2 4 0 15 17
 3 2 0 13 21

 datasetY2
 V1 V2 V3 V4
 1 20 52 15 18
 2 18 54 14 21
 3 18 51 13 21

 I want to be able to create separate csv files by taking the corresponding
 rows of dataset1 and dataset2, convert them into columns. So from the above
 example I would be creating 3 datasets (csvs), of which the first one would
 be
X Y1Y2  1 4 20  1 0
 52  1 20 15  1 17
 18
   Appreciate any help.

 Thanks
 Anamika

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation in R

2013-10-21 Thread arun
Hi,

I am getting this.


 res - lapply(seq_len(nrow(Y1)),function(i) {dat - 
data.frame(X=i,Y1=unlist(Y1[i,]),Y2=unlist(Y2[i,])); row.names(dat) - 
1:nrow(dat); 
write.csv(dat,paste0(Anam,i,.csv),row.names=FALSE,quote=FALSE)})


dat1 - read.csv(Anam1.csv,header=TRUE)

 dat1
  X Y1 Y2
1 1  4 20
2 1  0 52
3 1 20 15
4 1 17 18


Attaching one of the files generated.


A.K.




On Monday, October 21, 2013 1:55 PM, Anamika Chaudhuri canam...@gmail.com 
wrote:

Hi Arun:

Thanks for your help. Seperate files are being created by concatenating the 
rows from the two files but I was looking to have them as columns rather than 
text. This is the way it appears in Excel with row # at the beginning.
X Y1 Y2
1 1 4 0 20 17 1 20 52 15 18

Ideally I would like it to look like
X Y1 Y2
1 4 20
1 0 52
1 20 15
1 17 18

Thanks again!
Anamika


On Mon, Oct 21, 2013 at 1:11 AM, arun smartpink...@yahoo.com wrote:

Hi,
May be this helps:
Y1 - read.table(text=V1 V2 V3 V4

1 4 0 20 17
2 4 0 15 17
3 2 0 13 21,sep=,header=TRUE)

Y2 - read.table(text=V1 V2 V3 V4

1 20 52 15 18
2 18 54 14 21
3 18 51 13 21,sep=,header=TRUE)
 res - lapply(seq_len(nrow(Y1)),function(i) {dat - 
data.frame(X=i,Y1=unlist(Y1[i,]),Y2=unlist(Y2[i,])); row.names(dat) - 
1:nrow(dat); 
write.csv(dat,paste0(file,i,.csv),row.names=FALSE,quote=FALSE)})


A.K.





A.K.





On Monday, October 21, 2013 12:24 AM, Anamika Chaudhuri canam...@gmail.com 
wrote:
Hi:

I am looking for some help to manipulate data in R. I have two csv files.

datasetY1
V1 V2 V3 V4
1 4 0 20 17
2 4 0 15 17
3 2 0 13 21

datasetY2
V1 V2 V3 V4
1 20 52 15 18
2 18 54 14 21
3 18 51 13 21

I want to be able to create separate csv files by taking the corresponding
rows of dataset1 and dataset2, convert them into columns. So from the above
example I would be creating 3 datasets (csvs), of which the first one would
be
               X             Y1            Y2  1 4 20  1 0
52  1 20 15  1 17
18
  Appreciate any help.

Thanks
Anamika

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation in R

2013-10-20 Thread arun
Hi,
May be this helps:
Y1 - read.table(text=V1 V2 V3 V4
1 4 0 20 17
2 4 0 15 17
3 2 0 13 21,sep=,header=TRUE)

Y2 - read.table(text=V1 V2 V3 V4
1 20 52 15 18
2 18 54 14 21
3 18 51 13 21,sep=,header=TRUE)
 res - lapply(seq_len(nrow(Y1)),function(i) {dat - 
data.frame(X=i,Y1=unlist(Y1[i,]),Y2=unlist(Y2[i,])); row.names(dat) - 
1:nrow(dat); 
write.csv(dat,paste0(file,i,.csv),row.names=FALSE,quote=FALSE)})


A.K.





A.K.




On Monday, October 21, 2013 12:24 AM, Anamika Chaudhuri canam...@gmail.com 
wrote:
Hi:

I am looking for some help to manipulate data in R. I have two csv files.

datasetY1
V1 V2 V3 V4
1 4 0 20 17
2 4 0 15 17
3 2 0 13 21

datasetY2
V1 V2 V3 V4
1 20 52 15 18
2 18 54 14 21
3 18 51 13 21

I want to be able to create separate csv files by taking the corresponding
rows of dataset1 and dataset2, convert them into columns. So from the above
example I would be creating 3 datasets (csvs), of which the first one would
be
               X             Y1            Y2  1 4 20  1 0
52  1 20 15  1 17
18
  Appreciate any help.

Thanks
Anamika

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-05-15 Thread Berend Hasselman

On 15-05-2013, at 08:15, catalin roibu catalinro...@gmail.com wrote:

 Hello all!
 I have a problem with my data.
 My initial data is a list years and months (see below). I want to transpose
 my data (12 rows (months) and year values as column). I try t(spi3), but
 the year values do not appear as column names, but as values.

As expected since the column year is a column of numbers and is printed with 
the appropriate number of decimals for each column.
You can get what you want in two ways.

I've called your initital dataframe DF.

First transpose DF, then set the column names to the contents of the year row 
and finally delete the first row (the year row).

DF.t - t(DF)
colnames(DF.t) - DF.t[year,]
DF.t - DF.t[-1,]
DF.t

The second method is to put DF in the form you apparently want and then 
transpose.
So set the rownames of DF to the column year, delete the year column and then 
transpose.

rownames(DF) - DF[,year]
DF - DF[,-1]
t(DF)

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2013-05-15 Thread Jim Lemon

On 05/15/2013 04:15 PM, catalin roibu wrote:

Hello all!
I have a problem with my data.
My initial data is a list years and months (see below). I want to transpose
my data (12 rows (months) and year values as column). I try t(spi3), but
the year values do not appear as column names, but as values. Please help
me to solve this problem!


Hi catalin,
Try this:

spi4-data.frame(t(spi3[,-1]))
names(spi4)-spi3$year

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread John Kane
What zero values?  And are they acutall zeros or are the NA's, that is, missing 
values?

The code looks okay but without some sample data it is difficult to know 
exactly what you are doing. 

The easiest way to supply data  is to use the dput() function.  Example with 
your file named testfile: 
dput(testfile) 
Then copy the output and paste into your email.  For large data sets, you can 
just supply a representative sample.  Usually, 
dput(head(testfile, 100)) will be sufficient.   

 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Please supply some sample data. 
 

John Kane
Kingston ON Canada


 -Original Message-
 From: ii54...@msn.com
 Sent: Fri, 15 Mar 2013 12:40:54 +
 To: r-help@r-project.org
 Subject: [R] Data manipulation
 
 Hello all,
 
 
 
 I would appreciate your thoughts on a seemingly simple problem. I have a
 database, where each row represent a single record. I want to aggregate
 this
 database so I use the aggregate command :
 
 
 
 D-read.csv(C:\\Users\\test.csv)
 
 
 
 attach(D)
 
 
 
 by1-factor(Class)
 
 by2-factor(X)
 
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 The results I get following the form:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 5   3 0.2 4
 
 6   3 0.3 4
 
 
 
 
 
 However, what I really want is an aggregation which includes the zero
 values, i.e.:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 2 0.2 0
 
 5   3 0.2 4
 
 10.3 0
 
 20.3 0
 
 6   3 0.3 4
 
 
 
 
 
 How can I achieve what I want?
 
 
 
 Best regards,
 
 Ioanna
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread Blaser Nello
Is this what you want to do?

D2 - expand.grid(Class=unique(D$Class), X=unique(D$X))
D2 - merge(D2, D, all=TRUE)
D2$Count[is.na(D2$Count)] - 0

W - aggregate(D2$Count, list(D2$Class, D2$X), sum)
W

Best, 
Nello


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of IOANNA
Sent: Freitag, 15. März 2013 13:41
To: r-help@r-project.org
Subject: [R] Data manipulation

Hello all, 

 

I would appreciate your thoughts on a seemingly simple problem. I have a 
database, where each row represent a single record. I want to aggregate this 
database so I use the aggregate command :

 

D-read.csv(C:\\Users\\test.csv)

 

attach(D)

 

by1-factor(Class)

by2-factor(X)

W-aggregate(x=Count,by=list(by1,by2),FUN=sum)

 

The results I get following the form:

 

W

  Group.1 Group.2 x

1   1 0.1 4

2   2 0.1 7

3   3 0.1 1

4   1 0.2 3

5   3 0.2 4

6   3 0.3 4

 

 

However, what I really want is an aggregation which includes the zero values, 
i.e.:

 

W

  Group.1 Group.2 x

1   1 0.1 4

2   2 0.1 7

3   3 0.1 1

4   1 0.2 3

2 0.2 0

5   3 0.2 4

10.3 0

20.3 0

6   3 0.3 4

 

 

How can I achieve what I want?

 

Best regards, 

Ioanna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread John Kane
Hi IOANNA,
I got  the data but it is missing a value in Count (length 22 vs length 23 in 
the other two variable so I stuck in an extra 1. I hope this is correct.

There also was an attachement called winmail.dat that appears to be some kind 
of MicroSoft Mail note that is pure gibberish to me--I'm on a Linux box.

For some reason in neither posting does your example of the output you want 
come through.  Are you posting in html ?  R-help strips any html so is there a 
change it stripped out a table?

If i do this 
table(Class, X)
 X
Class 0.1 0.2 0.3
1   4   3   0
2   7   0   0
3   1   4   4
I see that you have two combinations of Class and X with no entries. Is this 
what you wanted to show  in W?  If so, it is not immediately apparent how to go 
about this.  

John Kane
Kingston ON Canada


 -Original Message-
 From: ii54...@msn.com
 Sent: Fri, 15 Mar 2013 13:11:48 +
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: RE: [R] Data manipulation
 
 
 Hello John,
 
 
 I thought I attached the file. So here we go:
 Class=c(1,1,1,1,  1,1,1,2,2,2,2,2,2,2,3,3,
 3,3,3,3,  3,3,3)
 X=c(0.1,0.1,0.1,  0.1,0.2,0.2,0.2,0.1,0.1,
 0.1,0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.3,0.3,0.3,  0.3)
 Count=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
 
 by1-factor(Class)
 by2-factor(X)
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 However, what I want is a table that also include lines for the Group.1
 and
 Group.2 values for which there are no records. In other words something
 like
 this:
 
 
 
 Thanks again. I hope its clearer now.
 Ioanna
 
 
 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: 15 March 2013 12:51
 To: IOANNA; r-help@r-project.org
 Subject: RE: [R] Data manipulation
 
 What zero values?  And are they acutall zeros or are the NA's, that is,
 missing values?
 
 The code looks okay but without some sample data it is difficult to know
 exactly what you are doing.
 
 The easiest way to supply data  is to use the dput() function.  Example
 with
 your file named testfile:
 dput(testfile)
 Then copy the output and paste into your email.  For large data sets, you
 can just supply a representative sample.  Usually,
 dput(head(testfile, 100)) will be sufficient.
 
 
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducibl
 e-example
 
 Please supply some sample data.
 
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: ii54...@msn.com
 Sent: Fri, 15 Mar 2013 12:40:54 +
 To: r-help@r-project.org
 Subject: [R] Data manipulation
 
 Hello all,
 
 
 
 I would appreciate your thoughts on a seemingly simple problem. I have
 a database, where each row represent a single record. I want to
 aggregate this database so I use the aggregate command :
 
 
 
 D-read.csv(C:\\Users\\test.csv)
 
 
 
 attach(D)
 
 
 
 by1-factor(Class)
 
 by2-factor(X)
 
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 The results I get following the form:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 5   3 0.2 4
 
 6   3 0.3 4
 
 
 
 
 
 However, what I really want is an aggregation which includes the zero
 values, i.e.:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 2 0.2 0
 
 5   3 0.2 4
 
 10.3 0
 
 20.3 0
 
 6   3 0.3 4
 
 
 
 
 
 How can I achieve what I want?
 
 
 
 Best regards,
 
 Ioanna
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 FREE ONLINE PHOTOSHARING - Share your photos online with your friends and
 family!
 Visit http://www.inbox.com/photosharing to find out more!


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread John Kane
Nice. That does look like it. IOANNA?

John Kane
Kingston ON Canada


 -Original Message-
 From: nbla...@ispm.unibe.ch
 Sent: Fri, 15 Mar 2013 14:27:03 +0100
 To: ii54...@msn.com, r-help@r-project.org
 Subject: Re: [R] Data manipulation
 
 Is this what you want to do?
 
 D2 - expand.grid(Class=unique(D$Class), X=unique(D$X))
 D2 - merge(D2, D, all=TRUE)
 D2$Count[is.na(D2$Count)] - 0
 
 W - aggregate(D2$Count, list(D2$Class, D2$X), sum)
 W
 
 Best,
 Nello
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of IOANNA
 Sent: Freitag, 15. März 2013 13:41
 To: r-help@r-project.org
 Subject: [R] Data manipulation
 
 Hello all,
 
 
 
 I would appreciate your thoughts on a seemingly simple problem. I have a
 database, where each row represent a single record. I want to aggregate
 this database so I use the aggregate command :
 
 
 
 D-read.csv(C:\\Users\\test.csv)
 
 
 
 attach(D)
 
 
 
 by1-factor(Class)
 
 by2-factor(X)
 
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 The results I get following the form:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 5   3 0.2 4
 
 6   3 0.3 4
 
 
 
 
 
 However, what I really want is an aggregation which includes the zero
 values, i.e.:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 2 0.2 0
 
 5   3 0.2 4
 
 10.3 0
 
 20.3 0
 
 6   3 0.3 4
 
 
 
 
 
 How can I achieve what I want?
 
 
 
 Best regards,
 
 Ioanna
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread IOANNA
Thanks a lot! 

-Original Message-
From: John Kane [mailto:jrkrid...@inbox.com] 
Sent: 15 March 2013 13:41
To: Blaser Nello; IOANNA; r-help@r-project.org
Subject: Re: [R] Data manipulation

Nice. That does look like it. IOANNA?

John Kane
Kingston ON Canada


 -Original Message-
 From: nbla...@ispm.unibe.ch
 Sent: Fri, 15 Mar 2013 14:27:03 +0100
 To: ii54...@msn.com, r-help@r-project.org
 Subject: Re: [R] Data manipulation
 
 Is this what you want to do?
 
 D2 - expand.grid(Class=unique(D$Class), X=unique(D$X))
 D2 - merge(D2, D, all=TRUE)
 D2$Count[is.na(D2$Count)] - 0
 
 W - aggregate(D2$Count, list(D2$Class, D2$X), sum) W
 
 Best,
 Nello
 
 
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of IOANNA
 Sent: Freitag, 15. März 2013 13:41
 To: r-help@r-project.org
 Subject: [R] Data manipulation
 
 Hello all,
 
 
 
 I would appreciate your thoughts on a seemingly simple problem. I have 
 a database, where each row represent a single record. I want to 
 aggregate this database so I use the aggregate command :
 
 
 
 D-read.csv(C:\\Users\\test.csv)
 
 
 
 attach(D)
 
 
 
 by1-factor(Class)
 
 by2-factor(X)
 
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 The results I get following the form:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 5   3 0.2 4
 
 6   3 0.3 4
 
 
 
 
 
 However, what I really want is an aggregation which includes the zero 
 values, i.e.:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 2 0.2 0
 
 5   3 0.2 4
 
 10.3 0
 
 20.3 0
 
 6   3 0.3 4
 
 
 
 
 
 How can I achieve what I want?
 
 
 
 Best regards,
 
 Ioanna
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 

webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread IOANNA

Hello John, 


I thought I attached the file. So here we go: 
Class=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,
3,3,3,3,3,3,3)
X=c(0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.1,0.1,
0.1,0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.3,0.3,0.3,0.3)
Count=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)

by1-factor(Class)
by2-factor(X)
W-aggregate(x=Count,by=list(by1,by2),FUN=sum)

 

However, what I want is a table that also include lines for the Group.1 and
Group.2 values for which there are no records. In other words something like
this:

 

Thanks again. I hope its clearer now. 
Ioanna


-Original Message-
From: John Kane [mailto:jrkrid...@inbox.com] 
Sent: 15 March 2013 12:51
To: IOANNA; r-help@r-project.org
Subject: RE: [R] Data manipulation

What zero values?  And are they acutall zeros or are the NA's, that is,
missing values?

The code looks okay but without some sample data it is difficult to know
exactly what you are doing. 

The easiest way to supply data  is to use the dput() function.  Example with
your file named testfile: 
dput(testfile)
Then copy the output and paste into your email.  For large data sets, you
can just supply a representative sample.  Usually, 
dput(head(testfile, 100)) will be sufficient.   

 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducibl
e-example

Please supply some sample data. 
 

John Kane
Kingston ON Canada


 -Original Message-
 From: ii54...@msn.com
 Sent: Fri, 15 Mar 2013 12:40:54 +
 To: r-help@r-project.org
 Subject: [R] Data manipulation
 
 Hello all,
 
 
 
 I would appreciate your thoughts on a seemingly simple problem. I have 
 a database, where each row represent a single record. I want to 
 aggregate this database so I use the aggregate command :
 
 
 
 D-read.csv(C:\\Users\\test.csv)
 
 
 
 attach(D)
 
 
 
 by1-factor(Class)
 
 by2-factor(X)
 
 W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
 The results I get following the form:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 5   3 0.2 4
 
 6   3 0.3 4
 
 
 
 
 
 However, what I really want is an aggregation which includes the zero 
 values, i.e.:
 
 
 
 W
 
   Group.1 Group.2 x
 
 1   1 0.1 4
 
 2   2 0.1 7
 
 3   3 0.1 1
 
 4   1 0.2 3
 
 2 0.2 0
 
 5   3 0.2 4
 
 10.3 0
 
 20.3 0
 
 6   3 0.3 4
 
 
 
 
 
 How can I achieve what I want?
 
 
 
 Best regards,
 
 Ioanna
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



family!
[[elided Hotmail spam]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread David L Carlson
Wouldn't this do the same thing?

xtabs(Count~Class+X, D)

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of IOANNA
 Sent: Friday, March 15, 2013 8:51 AM
 To: 'John Kane'; 'Blaser Nello'; r-help@r-project.org
 Subject: Re: [R] Data manipulation
 
 Thanks a lot!
 
 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: 15 March 2013 13:41
 To: Blaser Nello; IOANNA; r-help@r-project.org
 Subject: Re: [R] Data manipulation
 
 Nice. That does look like it. IOANNA?
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: nbla...@ispm.unibe.ch
  Sent: Fri, 15 Mar 2013 14:27:03 +0100
  To: ii54...@msn.com, r-help@r-project.org
  Subject: Re: [R] Data manipulation
 
  Is this what you want to do?
 
  D2 - expand.grid(Class=unique(D$Class), X=unique(D$X))
  D2 - merge(D2, D, all=TRUE)
  D2$Count[is.na(D2$Count)] - 0
 
  W - aggregate(D2$Count, list(D2$Class, D2$X), sum) W
 
  Best,
  Nello
 
 
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org]
  On Behalf Of IOANNA
  Sent: Freitag, 15. März 2013 13:41
  To: r-help@r-project.org
  Subject: [R] Data manipulation
 
  Hello all,
 
 
 
  I would appreciate your thoughts on a seemingly simple problem. I
 have
  a database, where each row represent a single record. I want to
  aggregate this database so I use the aggregate command :
 
 
 
  D-read.csv(C:\\Users\\test.csv)
 
 
 
  attach(D)
 
 
 
  by1-factor(Class)
 
  by2-factor(X)
 
  W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
 
 
 
  The results I get following the form:
 
 
 
  W
 
Group.1 Group.2 x
 
  1   1 0.1 4
 
  2   2 0.1 7
 
  3   3 0.1 1
 
  4   1 0.2 3
 
  5   3 0.2 4
 
  6   3 0.3 4
 
 
 
 
 
  However, what I really want is an aggregation which includes the zero
  values, i.e.:
 
 
 
  W
 
Group.1 Group.2 x
 
  1   1 0.1 4
 
  2   2 0.1 7
 
  3   3 0.1 1
 
  4   1 0.2 3
 
  2 0.2 0
 
  5   3 0.2 4
 
  10.3 0
 
  20.3 0
 
  6   3 0.3 4
 
 
 
 
 
  How can I achieve what I want?
 
 
 
  Best regards,
 
  Ioanna
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 
 webmails
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2013-03-15 Thread David L Carlson
I was too quick on the Send button. Xtabs produces a table. If you want a 
data.frame, it would be data.frame(xtabs(Count~Class+X, D)):

# Match John's summary table and generate Counts
 set.seed(42)
 Count - sample(1:50, 23)
 Class - c(rep(1, 4), rep(2, 7), 3, rep(1, 3), rep(3, 4), rep(3, 4))
 X - c(rep(.1, 12), rep(.2, 7), rep(.3, 4))
 D - data.frame(Class=factor(Class), X=factor(X), Count)
 table(D$Class, D$X)
   
0.1 0.2 0.3
  1   4   3   0
  2   7   0   0
  3   1   4   4

# Create the table/data.frame
 D.table - xtabs(Count~Class+X)
 D.table
 X
Class 0.1 0.2 0.3
1 150  63   0
2 169   0   0
3  41  98 114
 D.df - data.frame(D.table)
 D.df
  Class   X Freq
1 1 0.1  150
2 2 0.1  169
3 3 0.1   41
4 1 0.2   63
5 2 0.20
6 3 0.2   98
7 1 0.30
8 2 0.30
9 3 0.3  114

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of David L Carlson
 Sent: Friday, March 15, 2013 9:23 AM
 To: 'IOANNA'; 'John Kane'; 'Blaser Nello'; r-help@r-project.org
 Subject: Re: [R] Data manipulation
 
 Wouldn't this do the same thing?
 
 xtabs(Count~Class+X, D)
 
 --
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77843-4352
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of IOANNA
  Sent: Friday, March 15, 2013 8:51 AM
  To: 'John Kane'; 'Blaser Nello'; r-help@r-project.org
  Subject: Re: [R] Data manipulation
 
  Thanks a lot!
 
  -Original Message-
  From: John Kane [mailto:jrkrid...@inbox.com]
  Sent: 15 March 2013 13:41
  To: Blaser Nello; IOANNA; r-help@r-project.org
  Subject: Re: [R] Data manipulation
 
  Nice. That does look like it. IOANNA?
 
  John Kane
  Kingston ON Canada
 
 
   -Original Message-
   From: nbla...@ispm.unibe.ch
   Sent: Fri, 15 Mar 2013 14:27:03 +0100
   To: ii54...@msn.com, r-help@r-project.org
   Subject: Re: [R] Data manipulation
  
   Is this what you want to do?
  
   D2 - expand.grid(Class=unique(D$Class), X=unique(D$X))
   D2 - merge(D2, D, all=TRUE)
   D2$Count[is.na(D2$Count)] - 0
  
   W - aggregate(D2$Count, list(D2$Class, D2$X), sum) W
  
   Best,
   Nello
  
  
   -Original Message-
   From: r-help-boun...@r-project.org
   [mailto:r-help-boun...@r-project.org]
   On Behalf Of IOANNA
   Sent: Freitag, 15. März 2013 13:41
   To: r-help@r-project.org
   Subject: [R] Data manipulation
  
   Hello all,
  
  
  
   I would appreciate your thoughts on a seemingly simple problem. I
  have
   a database, where each row represent a single record. I want to
   aggregate this database so I use the aggregate command :
  
  
  
   D-read.csv(C:\\Users\\test.csv)
  
  
  
   attach(D)
  
  
  
   by1-factor(Class)
  
   by2-factor(X)
  
   W-aggregate(x=Count,by=list(by1,by2),FUN=sum)
  
  
  
   The results I get following the form:
  
  
  
   W
  
 Group.1 Group.2 x
  
   1   1 0.1 4
  
   2   2 0.1 7
  
   3   3 0.1 1
  
   4   1 0.2 3
  
   5   3 0.2 4
  
   6   3 0.3 4
  
  
  
  
  
   However, what I really want is an aggregation which includes the
 zero
   values, i.e.:
  
  
  
   W
  
 Group.1 Group.2 x
  
   1   1 0.1 4
  
   2   2 0.1 7
  
   3   3 0.1 1
  
   4   1 0.2 3
  
   2 0.2 0
  
   5   3 0.2 4
  
   10.3 0
  
   20.3 0
  
   6   3 0.3 4
  
  
  
  
  
   How can I achieve what I want?
  
  
  
   Best regards,
  
   Ioanna
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  
  GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 
  webmails
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] data manipulation between vector and matrix

2012-12-05 Thread C W
The only solution I found was
x-t(mu)

Is there a better way?
Mike

On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:

 Dear list,
 I was curious how to subtract a vector from matrix?

 Say, I have

 mat - matrix(1:40, nrow=20, ncol=2)

 x -c(1,2)

 I want,

 x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
 of x against column elements in mat.

 But x-mat won't do it.

 Thanks,

 Mike



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread Sarah Goslee
Hi,

On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
 Dear list,
 I was curious how to subtract a vector from matrix?

 Say, I have

 mat - matrix(1:40, nrow=20, ncol=2)

 x -c(1,2)

Thanks for the actual reproducible example.

 I want,

 x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
 of x against column elements in mat.

 But x-mat won't do it.

This will (note the modification to get x - mat):
 sweep(-mat, 2, x, +)
  [,1] [,2]
 [1,]0  -19
 [2,]   -1  -20
 [3,]   -2  -21
 [4,]   -3  -22
 [5,]   -4  -23
etc.

--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread C W
Thanks, Sarah.  First time heard about sweep(), it worked just the way I
wanted.
Mike

On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi,

 On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
  Dear list,
  I was curious how to subtract a vector from matrix?
 
  Say, I have
 
  mat - matrix(1:40, nrow=20, ncol=2)
 
  x -c(1,2)

 Thanks for the actual reproducible example.

  I want,
 
  x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
  of x against column elements in mat.
 
  But x-mat won't do it.

 This will (note the modification to get x - mat):
  sweep(-mat, 2, x, +)
   [,1] [,2]
  [1,]0  -19
  [2,]   -1  -20
  [3,]   -2  -21
  [4,]   -3  -22
  [5,]   -4  -23
 etc.

 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread C W
thanks, I knew about apply, but did not you you can put plus signs with
quotes.  That's a cool tricky,
Mike

On Wed, Dec 5, 2012 at 4:05 PM, arun smartpink...@yahoo.com wrote:

 HI,
 In addition to ?sweep(), you can use

 apply(-mat,1,`+`,x)

 #or
 library(plyr)
 aaply(-mat,1,+,x)


 A.K.






 - Original Message -
 From: C W tmrs...@gmail.com
 To: Sarah Goslee sarah.gos...@gmail.com
 Cc: r-help r-help@r-project.org
 Sent: Wednesday, December 5, 2012 1:51 PM
 Subject: Re: [R] data manipulation between vector and matrix

 Thanks, Sarah.  First time heard about sweep(), it worked just the way I
 wanted.
 Mike

 On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

  Hi,
 
  On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
   Dear list,
   I was curious how to subtract a vector from matrix?
  
   Say, I have
  
   mat - matrix(1:40, nrow=20, ncol=2)
  
   x -c(1,2)
 
  Thanks for the actual reproducible example.
 
   I want,
  
   x-mat[1,] and x-mat[2,], and so on... Basically, subtract column
 elements
   of x against column elements in mat.
  
   But x-mat won't do it.
 
  This will (note the modification to get x - mat):
   sweep(-mat, 2, x, +)
[,1] [,2]
   [1,]0  -19
   [2,]   -1  -20
   [3,]   -2  -21
   [4,]   -3  -22
   [5,]   -4  -23
  etc.
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread arun
HI,
In addition to ?sweep(), you can use

apply(-mat,1,`+`,x) 

#or
library(plyr)
aaply(-mat,1,+,x) 


A.K.






- Original Message -
From: C W tmrs...@gmail.com
To: Sarah Goslee sarah.gos...@gmail.com
Cc: r-help r-help@r-project.org
Sent: Wednesday, December 5, 2012 1:51 PM
Subject: Re: [R] data manipulation between vector and matrix

Thanks, Sarah.  First time heard about sweep(), it worked just the way I
wanted.
Mike

On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi,

 On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
  Dear list,
  I was curious how to subtract a vector from matrix?
 
  Say, I have
 
  mat - matrix(1:40, nrow=20, ncol=2)
 
  x -c(1,2)

 Thanks for the actual reproducible example.

  I want,
 
  x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
  of x against column elements in mat.
 
  But x-mat won't do it.

 This will (note the modification to get x - mat):
  sweep(-mat, 2, x, +)
       [,1] [,2]
  [1,]    0  -19
  [2,]   -1  -20
  [3,]   -2  -21
  [4,]   -3  -22
  [5,]   -4  -23
 etc.

 --
 Sarah Goslee
 http://www.functionaldiversity.org


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread C W
Thanks for the benchmark.  I actually wanted to go with the winner, except
the x-t(mat) output is very different than the others.
Mike

On Wed, Dec 5, 2012 at 4:40 PM, arun smartpink...@yahoo.com wrote:

 Hi,

 By comparing the different methods:
 set.seed(5)
  mat1-matrix(sample(1:1e6,1e6,replace=TRUE),ncol=1)
  set.seed(25)
  x-sample(1:1e6,1,replace=TRUE)
  system.time(z1-sweep(-mat1,2,x,+))
 #   user  system elapsed
  # 0.076   0.000   0.069
  system.time(z2-apply(-mat1,1,`+`,x))
  #  user  system elapsed
  # 0.036   0.000   0.031
  system.time(z3-aaply(-mat1,1,`+`,x))
 #   user  system elapsed
 #  1.880   0.000   1.704
  system.time(z4- x-t(mat1))  #winner
 #   user  system elapsed
  # 0.004   0.000   0.007
  system.time(z5- t(x-t(mat1)))
 #   user  system elapsed
 #  0.008   0.000   0.009


 A.K.





 
 From: C W tmrs...@gmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org; Sarah Goslee sarah.gos...@gmail.com
 Sent: Wednesday, December 5, 2012 4:11 PM
 Subject: Re: [R] data manipulation between vector and matrix


 thanks, I knew about apply, but did not you you can put plus signs with
 quotes.  That's a cool tricky,
 Mike


 On Wed, Dec 5, 2012 at 4:05 PM, arun smartpink...@yahoo.com wrote:

 HI,
 In addition to ?sweep(), you can use
 
 apply(-mat,1,`+`,x)
 
 #or
 library(plyr)
 aaply(-mat,1,+,x)
 
 
 A.K.
 
 
 
 
 
 
 
 - Original Message -
 From: C W tmrs...@gmail.com
 To: Sarah Goslee sarah.gos...@gmail.com
 Cc: r-help r-help@r-project.org
 Sent: Wednesday, December 5, 2012 1:51 PM
 Subject: Re: [R] data manipulation between vector and matrix
 
 Thanks, Sarah.  First time heard about sweep(), it worked just the way I
 wanted.
 Mike
 
 On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
 
  Hi,
 
  On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
   Dear list,
   I was curious how to subtract a vector from matrix?
  
   Say, I have
  
   mat - matrix(1:40, nrow=20, ncol=2)
  
   x -c(1,2)
 
  Thanks for the actual reproducible example.
 
   I want,
  
   x-mat[1,] and x-mat[2,], and so on... Basically, subtract column
 elements
   of x against column elements in mat.
  
   But x-mat won't do it.
 
  This will (note the modification to get x - mat):
   sweep(-mat, 2, x, +)
[,1] [,2]
   [1,]0  -19
   [2,]   -1  -20
   [3,]   -2  -21
   [4,]   -3  -22
   [5,]   -4  -23
  etc.
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread arun
Hi,

By comparing the different methods:
set.seed(5)
 mat1-matrix(sample(1:1e6,1e6,replace=TRUE),ncol=1)
 set.seed(25)
 x-sample(1:1e6,1,replace=TRUE)
 system.time(z1-sweep(-mat1,2,x,+))
#   user  system elapsed 
 # 0.076   0.000   0.069 
 system.time(z2-apply(-mat1,1,`+`,x))
 #  user  system elapsed 
 # 0.036   0.000   0.031 
 system.time(z3-aaply(-mat1,1,`+`,x))
#   user  system elapsed 
#  1.880   0.000   1.704 
 system.time(z4- x-t(mat1))  #winner
#   user  system elapsed 
 # 0.004   0.000   0.007 
 system.time(z5- t(x-t(mat1)))
#   user  system elapsed 
#  0.008   0.000   0.009 


A.K.






From: C W tmrs...@gmail.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org; Sarah Goslee sarah.gos...@gmail.com 
Sent: Wednesday, December 5, 2012 4:11 PM
Subject: Re: [R] data manipulation between vector and matrix


thanks, I knew about apply, but did not you you can put plus signs with quotes. 
 That's a cool tricky,
Mike


On Wed, Dec 5, 2012 at 4:05 PM, arun smartpink...@yahoo.com wrote:

HI,
In addition to ?sweep(), you can use

apply(-mat,1,`+`,x)

#or
library(plyr)
aaply(-mat,1,+,x)


A.K.







- Original Message -
From: C W tmrs...@gmail.com
To: Sarah Goslee sarah.gos...@gmail.com
Cc: r-help r-help@r-project.org
Sent: Wednesday, December 5, 2012 1:51 PM
Subject: Re: [R] data manipulation between vector and matrix

Thanks, Sarah.  First time heard about sweep(), it worked just the way I
wanted.
Mike

On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi,

 On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
  Dear list,
  I was curious how to subtract a vector from matrix?
 
  Say, I have
 
  mat - matrix(1:40, nrow=20, ncol=2)
 
  x -c(1,2)

 Thanks for the actual reproducible example.

  I want,
 
  x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
  of x against column elements in mat.
 
  But x-mat won't do it.

 This will (note the modification to get x - mat):
  sweep(-mat, 2, x, +)
       [,1] [,2]
  [1,]    0  -19
  [2,]   -1  -20
  [3,]   -2  -21
  [4,]   -3  -22
  [5,]   -4  -23
 etc.

 --
 Sarah Goslee
 http://www.functionaldiversity.org


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation between vector and matrix

2012-12-05 Thread arun
HI,
The option z5 takes care of it.
z5-t(x-t(mat)) #still faster than ?sweep()
 dim(z5)
[1] 20  2
 identical(sweep(-mat,2,x,+),z5)
#[1] TRUE


A.K.






From: C W tmrs...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, December 5, 2012 5:09 PM
Subject: Re: [R] data manipulation between vector and matrix


Hi Arun,
Sorry, I might be a little unclear with my words.

The dimensions are different. This is what I got:
 x-t(mat)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    0   -1   -2   -3   -4   -5   -6   -7   -8    -9   -10   -11   -12   -13
[2,]  -19  -20  -21  -22  -23  -24  -25  -26  -27   -28   -29   -30   -31   -32
     [,15] [,16] [,17] [,18] [,19] [,20]
[1,]   -14   -15   -16   -17   -18   -19
[2,]   -33   -34   -35   -36   -37   -38
 sweep(-mat, 2, x, +)
      [,1] [,2]
 [1,]    0  -19
 [2,]   -1  -20
 [3,]   -2  -21
 [4,]   -3  -22
 [5,]   -4  -23
 [6,]   -5  -24
 [7,]   -6  -25
 [8,]   -7  -26
 [9,]   -8  -27
[10,]   -9  -28
[11,]  -10  -29
[12,]  -11  -30
[13,]  -12  -31
[14,]  -13  -32
[15,]  -14  -33
[16,]  -15  -34
[17,]  -16  -35
[18,]  -17  -36
[19,]  -18  -37
[20,]  -19  -38
 dim(x-t(mat))
[1]  2 20
 dim(sweep(-mat, 2, x, +))
[1] 20  2

On Wed, Dec 5, 2012 at 4:55 PM, arun smartpink...@yahoo.com wrote:

HI Mike,
I didn't understand except the x-t(mat) output is very different than the 
others.  Are you saying that it needs to be transposed?  BTW, that was z5.

A.K.








From: C W tmrs...@gmail.com
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org; Sarah Goslee sarah.gos...@gmail.com
Sent: Wednesday, December 5, 2012 4:47 PM

Subject: Re: [R] data manipulation between vector and matrix


Thanks for the benchmark.  I actually wanted to go with the winner, except the 
x-t(mat) output is very different than the others.
Mike


On Wed, Dec 5, 2012 at 4:40 PM, arun smartpink...@yahoo.com wrote:

Hi,

By comparing the different methods:
set.seed(5)
 mat1-matrix(sample(1:1e6,1e6,replace=TRUE),ncol=1)
 set.seed(25)
 x-sample(1:1e6,1,replace=TRUE)
 system.time(z1-sweep(-mat1,2,x,+))
#   user  system elapsed
 # 0.076   0.000   0.069
 system.time(z2-apply(-mat1,1,`+`,x))
 #  user  system elapsed
 # 0.036   0.000   0.031
 system.time(z3-aaply(-mat1,1,`+`,x))
#   user  system elapsed
#  1.880   0.000   1.704
 system.time(z4- x-t(mat1))  #winner
#   user  system elapsed
 # 0.004   0.000   0.007
 system.time(z5- t(x-t(mat1)))
#   user  system elapsed
#  0.008   0.000   0.009


A.K.







From: C W tmrs...@gmail.com
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org; Sarah Goslee sarah.gos...@gmail.com
Sent: Wednesday, December 5, 2012 4:11 PM

Subject: Re: [R] data manipulation between vector and matrix


thanks, I knew about apply, but did not you you can put plus signs with 
quotes.  That's a cool tricky,
Mike


On Wed, Dec 5, 2012 at 4:05 PM, arun smartpink...@yahoo.com wrote:

HI,
In addition to ?sweep(), you can use

apply(-mat,1,`+`,x)

#or
library(plyr)
aaply(-mat,1,+,x)


A.K.







- Original Message -
From: C W tmrs...@gmail.com
To: Sarah Goslee sarah.gos...@gmail.com
Cc: r-help r-help@r-project.org
Sent: Wednesday, December 5, 2012 1:51 PM
Subject: Re: [R] data manipulation between vector and matrix

Thanks, Sarah.  First time heard about sweep(), it worked just the way I
wanted.
Mike

On Wed, Dec 5, 2012 at 1:42 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi,

 On Wed, Dec 5, 2012 at 1:30 PM, C W tmrs...@gmail.com wrote:
  Dear list,
  I was curious how to subtract a vector from matrix?
 
  Say, I have
 
  mat - matrix(1:40, nrow=20, ncol=2)
 
  x -c(1,2)

 Thanks for the actual reproducible example.

  I want,
 
  x-mat[1,] and x-mat[2,], and so on... Basically, subtract column elements
  of x against column elements in mat.
 
  But x-mat won't do it.

 This will (note the modification to get x - mat):
  sweep(-mat, 2, x, +)
       [,1] [,2]
  [1,]    0  -19
  [2,]   -1  -20
  [3,]   -2  -21
  [4,]   -3  -22
  [5,]   -4  -23
 etc.

 --
 Sarah Goslee
 http://www.functionaldiversity.org


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation with aggregate

2012-07-04 Thread arun
Hi,

Try this:
myData = data.frame(Name = c('a', 'a', 'b', 'b'), length = c(1,2,3,4), type= 
c('x','x','y','z'))

z-aggregate(length~Name,myData,mean)
z1-aggregate(length~type,myData,mean)
merge(z,merge(z,z1),all=TRUE)
  Name length type
1    a    1.5    x
2    b    3.5 NA

A.K.




- Original Message -
From: Filoche pmassico...@hotmail.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, July 3, 2012 12:04 PM
Subject: [R] Data manipulation with aggregate

Hi everyone.

I have these data :

myData = data.frame(Name = c('a', 'a', 'b', 'b'), length = c(1,2,3,4), type
= c('x','x','y','z'))

which gives me:

  Name length type
1    a      1    x
2    a      2    x
3    b      3    y
4    b      4   z

I would group (mean) this DF using 'Name' as grouping factor. However, I
have a field ('type') which is a string. I would like to use the unique
value of this field when possible (i.e. when all the 'type' values are the
same for each group) or replace with NA when 'type' has multiple values.

In fact, I would like to obtain this:

  Name length type
1    a      1.5    x
2    b      3.5    NA

For instance, I was using this command:

aggregate(list(myData$length, myData$type), list(myData$Name), FUN = mean)

But it can't deal with string data.

I hope I have been clear enough.

With regards,
Phil

--
View this message in context: 
http://r.789695.n4.nabble.com/Data-manipulation-with-aggregate-tp4635298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation with aggregate

2012-07-03 Thread jim holtman
try this:

 myData = data.frame(Name = c('a', 'a', 'b', 'b'), length = c(1,2,3,4), type
+ = c('x','x','y','z'))

 result - do.call(rbind, lapply(split(myData, myData$Name), function(.name){
+ data.frame(Name = .name$Name[1L]
+ , length = mean(.name$length)
+ , type = if (all(.name$type[1L] == .name$type)) .name$type[1L] else NA
+ )
+ })
+ )
 result
  Name length type
aa1.5x
bb3.5 NA




On Tue, Jul 3, 2012 at 12:04 PM, Filoche pmassico...@hotmail.com wrote:
 Hi everyone.

 I have these data :

 myData = data.frame(Name = c('a', 'a', 'b', 'b'), length = c(1,2,3,4), type
 = c('x','x','y','z'))

 which gives me:

   Name length type
 1    a      1    x
 2    a      2    x
 3    b      3    y
 4    b      4   z

 I would group (mean) this DF using 'Name' as grouping factor. However, I
 have a field ('type') which is a string. I would like to use the unique
 value of this field when possible (i.e. when all the 'type' values are the
 same for each group) or replace with NA when 'type' has multiple values.

 In fact, I would like to obtain this:

   Name length type
 1    a      1.5    x
 2    b      3.5    NA

 For instance, I was using this command:

 aggregate(list(myData$length, myData$type), list(myData$Name), FUN = mean)

 But it can't deal with string data.

 I hope I have been clear enough.

 With regards,
 Phil

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-manipulation-with-aggregate-tp4635298.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation - make diagonal matrix of each element of a matrix

2011-12-16 Thread Clemontina Alexander
Thank you, that is much simpler!



On Thu, Dec 15, 2011 at 2:04 PM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 I believe I can help, or at least, my code is simpler.
 First, look at your first line:

 idd - length(diag(1,tt))   # length of intercept matrix
 #
 not needed: diag(tt) would do the job but it's not needed,
    why call 2 functions, and one of them, 'diag', uses memory(*), if the
    result is tt squared? It's much simpler!
    (*)like you say, larger and larger amounts of it

 My solution to your problem is as follows (as a function, and yours).

 fun2 - function(n, tt, numco){
    M.Unit - matrix(rep(diag(1,tt),n), ncol=tt, byrow=TRUE)
    M - NULL
    for(i in 1:numco) M - cbind(M, M.Unit*rep(x[,i], each=tt))
    M
 }

 fun1 - function(n, tt, numco){
    idd - length(diag(1,tt))    # length of intercept matrix
    X - matrix(numeric(n*numco*idd),ncol=tt*numco)
    for(i in 1:numco){
          X[,((i-1)*tt+1):(i*tt)] - matrix(
            c(matrix(rep(diag(1,tt),n),ncol=tt, byrow=TRUE))*
                rep(rep(x[,i],each=tt),tt)
           , ncol=tt)
    }
    X
 }

 I' ve tested the two with larger values of 'n', 'tt' and 'numco'
 using the following timing instructions


 n  - 1000
 tt - 50
 numco - 15
 set.seed(1)
 x - matrix(round(rnorm(n*numco),2), ncol=numco)   # the actual covariates

 Runs - 10^1

 t1 - system.time(for(i in 1:Runs) a1 - fun1(n, tt, numco))[c(1,3)]
 t2 - system.time(for(i in 1:Runs) a2 - fun2(n, tt, numco))[c(1,3)]

 rbind(t1, t2, t1/t2)

      user.self     elapsed
 t1 23.21   31.06
 t2 14.97   22.54
     1.550434    1.377995

 As you can see, it's not a great speed improvement.
 I hope it's at least usefull.

 Rui Barradas


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-Manipulation-make-diagonal-matrix-of-each-element-of-a-matrix-tp4200321p4201305.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation - make diagonal matrix of each element of a matrix

2011-12-15 Thread Clemontina Alexander
I'm sorry, the indices of my X matrix are wrong.
It should be:

 X = x11  0  0 x12  0  0
  0   x11  00   x12  0
  0  0   x110  0   x12
   x21  0  0 x22  0  0
  0   x21  00   x22  0
  0  0   x210  0   x22
   ...
   xn1  0  0 x52  0  0
  0   xn1  00   x52  0
  0  0   xn10  0   x52

or

 X = -0.630  0-0.82 0 0
  0   -0.630  0 0-0.82 0
  0  0   -0.630 0  0   -0.82
   0.180  0 0.49  0 0
  0 0.18  0 0  0.49 0
  0 0  0.18 0  0 0.49
   ...
0.33  0  0-0.31  0  0
  00.33  0  0-0.31  0
  00  0.33  0 0 -0.31

Sorry for the confusion.
Tina








On Thu, Dec 15, 2011 at 10:02 AM, Clemontina Alexander
ckale...@ncsu.edu wrote:
 Dear R list,
 I have the following data:

 set.seed(1)
 n  - 5     # number of subjects
 tt - 3     # number of repeated observation per subject
 numco - 2  # number of covariates
 x - matrix(round(rnorm(n*numco),2), ncol=numco)   # the actual covariates
 x
 x
      [,1]  [,2]
 [1,] -0.63 -0.82
 [2,]  0.18  0.49
 [3,] -0.84  0.74
 [4,]  1.60  0.58
 [5,]  0.33 -0.31

 I need to form a matrix X such that
 X =      x11      0      0     x21      0      0
              0   x11      0        0   x21      0
              0      0   x11        0      0   x21
           x12      0      0     x22      0      0
              0   x12      0        0   x22      0
              0      0   x12        0      0   x22
                       ...
           x15      0      0     x25      0      0
              0   x15      0        0   x25      0
              0      0   x15        0      0   x25
 where both tt and numco can change. (So if tt=5 and numco=4, then X
 needs to have 20 columns and n*tt rows. Each diagonal matrix should be
 5x5 and there will be 4 of them for the 4 covariates.) I wrote this
 funky for loop:

 idd - length(diag(1,tt))    # length of intercept matrix
 X - matrix(numeric(n*numco*idd),ncol=tt*numco)
 for(i in 1:numco){
      X[,((i-1)*tt+1):(i*tt)] - matrix(
        c(matrix(rep(diag(1,tt),n),ncol=tt, byrow=TRUE))   *
 rep(rep(x[,i],each=tt),tt)
       , ncol=tt)
 }
 X

 It works fine, but is there an easier way when n, tt, and numco get
 larger and larger?
 Thanks,
 Tina


 --
 Clemontina Alexander
 Ph.D Student
 Department of Statistics
 NC State University
 Email: ckale...@ncsu.com



-- 
Clemontina Alexander
Ph.D Student
Department of Statistics
NC State University
Raleigh, NC 27695
Phone: (850) 322-6878
Email: ckale...@ncsu.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation - make diagonal matrix of each element of a matrix

2011-12-15 Thread Rui Barradas
Hello,

I believe I can help, or at least, my code is simpler.
First, look at your first line:

idd - length(diag(1,tt))   # length of intercept matrix
#
not needed: diag(tt) would do the job but it's not needed,
why call 2 functions, and one of them, 'diag', uses memory(*), if the
result is tt squared? It's much simpler!
(*)like you say, larger and larger amounts of it

My solution to your problem is as follows (as a function, and yours).

fun2 - function(n, tt, numco){
M.Unit - matrix(rep(diag(1,tt),n), ncol=tt, byrow=TRUE)
M - NULL
for(i in 1:numco) M - cbind(M, M.Unit*rep(x[,i], each=tt))
M
}

fun1 - function(n, tt, numco){
idd - length(diag(1,tt))# length of intercept matrix
X - matrix(numeric(n*numco*idd),ncol=tt*numco)
for(i in 1:numco){
  X[,((i-1)*tt+1):(i*tt)] - matrix(
c(matrix(rep(diag(1,tt),n),ncol=tt, byrow=TRUE))*
rep(rep(x[,i],each=tt),tt)
   , ncol=tt)
}
X
}

I' ve tested the two with larger values of 'n', 'tt' and 'numco'
using the following timing instructions


n  - 1000
tt - 50
numco - 15
set.seed(1)
x - matrix(round(rnorm(n*numco),2), ncol=numco)   # the actual covariates

Runs - 10^1

t1 - system.time(for(i in 1:Runs) a1 - fun1(n, tt, numco))[c(1,3)]
t2 - system.time(for(i in 1:Runs) a2 - fun2(n, tt, numco))[c(1,3)]

rbind(t1, t2, t1/t2)

  user.self elapsed
t1 23.21   31.06
t2 14.97   22.54
 1.5504341.377995

As you can see, it's not a great speed improvement.
I hope it's at least usefull.

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Data-Manipulation-make-diagonal-matrix-of-each-element-of-a-matrix-tp4200321p4201305.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-27 Thread jim holtman
Factors are you friend here:

 myData
   mydate gender mygroup id mygrp.f
1  2012-03-25  F   A  1   1
2  2005-05-23  F   B  2   2
3  2005-09-08  F   B  2   2
4  2005-12-07  F   B  2   2
5  2006-02-26  F   C  2   3
6  2006-05-13  F   C  2   3
7  2006-09-01  F   C  2   3
8  2006-12-12  F   D  2   4
9  2006-02-19  F   D  2   4
10 2006-05-03  F   D  2   4
11 2006-04-23  F   D  2   4
12 2007-12-08  F   D  2   4
13 2011-03-19  F   D  2   4
14 2007-12-20  M   A  3   1
15 2008-06-15  M   A  3   1
16 2008-12-16  M   A  3   1
17 2009-06-07  M   B  3   2
18 2009-10-09  M   B  3   2
19 2010-01-28  M   B  3   2
20 2007-06-05  M   A  4   1
 # change 'mygroup' to a factor so you can use 'diff' to count the changes
 myData$mygrp.f - as.integer(factor(myData$mygroup))
 # count the changes for each 'id'
 changes - tapply(myData$mygrp.f, myData$id, function(x){
+ sum(diff(x) != 0)
+ })


 changes
1 2 3 4
0 2 1 0



On Wed, Aug 24, 2011 at 12:48 PM, Juliet Hannah juliet.han...@gmail.com wrote:
 I have a data set with about 6 million rows and 50 columns. It is a
 mixture of dates, factors, and numerics.

 What I am trying to accomplish can be seen with the following
 simplified data, which is given as dput output below.

 head(myData)
      mydate gender mygroup id
 1 2012-03-25      F       A  1
 2 2005-05-23      F       B  2
 3 2005-09-08      F       B  2
 4 2005-12-07      F       B  2
 5 2006-02-26      F       C  2
 6 2006-05-13      F       C  2

 For each id, I want to count the number of changes of the variable
 'mygroup' that occur. For example, id=1 has 0 changes because it is
 observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
 need to calculate the total observation time for each id using the
 variable mydate.  In the end, I am trying to have a new data set in
 which each row has an id, days observed, number of changes, and
 gender.

 I made some simple summaries using data.table and plyr, but I'm stuck
 on this reformatting.

 Thanks for your help.

 myData - structure(list(mydate = c(2012-03-25, 2005-05-23, 2005-09-08,
 2005-12-07, 2006-02-26, 2006-05-13, 2006-09-01, 2006-12-12,
 2006-02-19, 2006-05-03, 2006-04-23, 2007-12-08, 2011-03-19,
 2007-12-20, 2008-06-15, 2008-12-16, 2009-06-07, 2009-10-09,
 2010-01-28, 2007-06-05), gender = c(F, F, F, F, F,
 F, F, F, F, F, F, F, F, M, M, M, M, M,
 M, M), mygroup = c(A, B, B, B, C, C, C, D,
 D, D, D, D, D, A, A, A, B, B, B, A),
    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c(mydate, gender,
 mygroup, id), class = data.frame, row.names = c(NA, -20L
 ))

 sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Dennis Murphy
Hi Juliet:

Here's a Q  D solution:

# (1) plyr
 f - function(d) length(unique(d$mygroup)) - 1
 ddply(myData, .(id), f)
  id V1
1  1  0
2  2  2
3  3  1
4  4  0

# (2) data.table

myDT - data.table(myData, key = 'id')
myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id']

If one can switch back and forth between levels more than once, then
the above is clearly not appropriate. A more robust method would be to
employ rle() [run length encoding]:

g - function(d) length(rle(d$mygroup)$lengths) - 1
ddply(myData, .(id), g)# gives the same answer as above
myDT[, list(nswitch = length(rle(mygroup)$lengths) - 1), by = 'id']   # ditto


HTH,
Dennis

On Wed, Aug 24, 2011 at 9:48 AM, Juliet Hannah juliet.han...@gmail.com wrote:
 I have a data set with about 6 million rows and 50 columns. It is a
 mixture of dates, factors, and numerics.

 What I am trying to accomplish can be seen with the following
 simplified data, which is given as dput output below.

 head(myData)
      mydate gender mygroup id
 1 2012-03-25      F       A  1
 2 2005-05-23      F       B  2
 3 2005-09-08      F       B  2
 4 2005-12-07      F       B  2
 5 2006-02-26      F       C  2
 6 2006-05-13      F       C  2

 For each id, I want to count the number of changes of the variable
 'mygroup' that occur. For example, id=1 has 0 changes because it is
 observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
 need to calculate the total observation time for each id using the
 variable mydate.  In the end, I am trying to have a new data set in
 which each row has an id, days observed, number of changes, and
 gender.

 I made some simple summaries using data.table and plyr, but I'm stuck
 on this reformatting.

 Thanks for your help.

 myData - structure(list(mydate = c(2012-03-25, 2005-05-23, 2005-09-08,
 2005-12-07, 2006-02-26, 2006-05-13, 2006-09-01, 2006-12-12,
 2006-02-19, 2006-05-03, 2006-04-23, 2007-12-08, 2011-03-19,
 2007-12-20, 2008-06-15, 2008-12-16, 2009-06-07, 2009-10-09,
 2010-01-28, 2007-06-05), gender = c(F, F, F, F, F,
 F, F, F, F, F, F, F, F, M, M, M, M, M,
 M, M), mygroup = c(A, B, B, B, C, C, C, D,
 D, D, D, D, D, A, A, A, B, B, B, A),
    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c(mydate, gender,
 mygroup, id), class = data.frame, row.names = c(NA, -20L
 ))

 sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
Thanks Dennis! I'll check this out.

Just to clarify, I need the total number of switches/changes
regardless of if that state
had occurred in the past. So A-A-B-A, would have 2 changes: A to B and B to A.

Thanks again.


On Wed, Aug 24, 2011 at 1:28 PM, Dennis Murphy djmu...@gmail.com wrote:
 Hi Juliet:

 Here's a Q  D solution:

 # (1) plyr
 f - function(d) length(unique(d$mygroup)) - 1
 ddply(myData, .(id), f)
  id V1
 1  1  0
 2  2  2
 3  3  1
 4  4  0

 # (2) data.table

 myDT - data.table(myData, key = 'id')
 myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id']

 If one can switch back and forth between levels more than once, then
 the above is clearly not appropriate. A more robust method would be to
 employ rle() [run length encoding]:

 g - function(d) length(rle(d$mygroup)$lengths) - 1
 ddply(myData, .(id), g)    # gives the same answer as above
 myDT[, list(nswitch = length(rle(mygroup)$lengths) - 1), by = 'id']   # ditto


 HTH,
 Dennis

 On Wed, Aug 24, 2011 at 9:48 AM, Juliet Hannah juliet.han...@gmail.com 
 wrote:
 I have a data set with about 6 million rows and 50 columns. It is a
 mixture of dates, factors, and numerics.

 What I am trying to accomplish can be seen with the following
 simplified data, which is given as dput output below.

 head(myData)
      mydate gender mygroup id
 1 2012-03-25      F       A  1
 2 2005-05-23      F       B  2
 3 2005-09-08      F       B  2
 4 2005-12-07      F       B  2
 5 2006-02-26      F       C  2
 6 2006-05-13      F       C  2

 For each id, I want to count the number of changes of the variable
 'mygroup' that occur. For example, id=1 has 0 changes because it is
 observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
 need to calculate the total observation time for each id using the
 variable mydate.  In the end, I am trying to have a new data set in
 which each row has an id, days observed, number of changes, and
 gender.

 I made some simple summaries using data.table and plyr, but I'm stuck
 on this reformatting.

 Thanks for your help.

 myData - structure(list(mydate = c(2012-03-25, 2005-05-23, 2005-09-08,
 2005-12-07, 2006-02-26, 2006-05-13, 2006-09-01, 2006-12-12,
 2006-02-19, 2006-05-03, 2006-04-23, 2007-12-08, 2011-03-19,
 2007-12-20, 2008-06-15, 2008-12-16, 2009-06-07, 2009-10-09,
 2010-01-28, 2007-06-05), gender = c(F, F, F, F, F,
 F, F, F, F, F, F, F, F, M, M, M, M, M,
 M, M), mygroup = c(A, B, B, B, C, C, C, D,
 D, D, D, D, D, A, A, A, B, B, B, A),
    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c(mydate, gender,
 mygroup, id), class = data.frame, row.names = c(NA, -20L
 ))

 sessionInfo()
 R version 2.13.1 (2011-07-08)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2011-02-12 Thread Johannes Huesing
mathijsdevaan mathijsdev...@gmail.com [Sat, Feb 12, 2011 at 03:00:18PM CET]:
 
 Hi,
 
 I have a dataset with info on individuals (B) that have been involved in
 projects (A) during multiple years (C). The dataset contains three columns:
 A, B, C. Example:
A  B  C
 1 1  a  1999
 2 1  b  1999
 3 1  c  1999
 4 2  c  2001
 5 2  d  2001
 6 3  a  2004
 7 3  b  2004
 
 I am interested in the average tenure of all individuals for each project
 (assuming that the tenure of an individual = 0 in the first project this
 individual is involved in). So based on the data above:
   A  D
 1 1  0
 2 2  1
 3 3  5
 
 where D = average project tenure. How do I do this?
 

I am not getting how you arrive at D calculating an average.
Could you write down the arithmetic operations involved?


-- 
Johannes Hüsing   There is something fascinating about science. 
  One gets such wholesale returns of conjecture 
mailto:johan...@huesing.name  from such a trifling investment of fact.  
  
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2011-02-12 Thread jim holtman
Will this do it for you:

 x - read.table(textConnection( A  B  C
+ 1 1  a  1999
+ 2 1  b  1999
+ 3 1  c  1999
+ 4 2  c  2001
+ 5 2  d  2001
+ 6 3  a  2004
+ 7 3  b  2004), header = TRUE)
 closeAllConnections()
 # add a tenure column
 x$tenure - ave(x$C, x$B, FUN = function(yr) yr - min(yr))
 x
  A BC tenure
1 1 a 1999  0
2 1 b 1999  0
3 1 c 1999  0
4 2 c 2001  2
5 2 d 2001  0
6 3 a 2004  5
7 3 b 2004  5
 # compute tenure on project
 aggregate(x$tenure, list(project = x$A), mean)
  project x
1   1 0
2   2 1
3   3 5


On Sat, Feb 12, 2011 at 9:00 AM, mathijsdevaan mathijsdev...@gmail.com wrote:

 Hi,

 I have a dataset with info on individuals (B) that have been involved in
 projects (A) during multiple years (C). The dataset contains three columns:
 A, B, C. Example:
   A  B  C
 1 1  a  1999
 2 1  b  1999
 3 1  c  1999
 4 2  c  2001
 5 2  d  2001
 6 3  a  2004
 7 3  b  2004

 I am interested in the average tenure of all individuals for each project
 (assuming that the tenure of an individual = 0 in the first project this
 individual is involved in). So based on the data above:
  A  D
 1 1  0
 2 2  1
 3 3  5

 where D = average project tenure. How do I do this?

 Your help is very much appreciated. Thanks!
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-manipulation-tp3302717p3302717.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation

2011-02-12 Thread mathijsdevaan

That worked great! Thanks!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Data-manipulation-tp3302717p3303001.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in R

2010-09-24 Thread Dennis Murphy
Hi:

Please provide a minimal reproducible example that resembles your real data
so that people can try it out and provide potential solutions for you. Show
what you tried that failed, and what you expect. A number of people on this
list are very adept in data summarization, but most of them are loath to
provide abstract solutions unless the problem is crystal clear.

On Thu, Sep 23, 2010 at 8:41 PM, Thomas Parr thomasbp...@gmail.com wrote:

 If this has already been answered, my apologies in advance I am relatively
 new to this aspect of [R]. it is a bit of a basic question.



 I have 4 columns of data (site, Date, measurement type, value) in a tab
 delimited text file.  Site is a site where measurements were collected,
 Date is a date in DD/MM/ format, measurement is a code for the type of
 measurement made, and value just the value observed.



 So each site has multiple dates on which it was sampled and each date has
 multiple measurement types (fortunately only one value per measurement type
 per day).



 I want to know how I can separate this into multiple columns by measurement
 type averaged over the range of dates available.  The output would have a
 single averaged measurement value per site.


 This suggests you may need to reshape your data first.


 Site, Measurement 1, measurement2, measurement3, etc.


 Matrices are OK, data frames are usually better.


 I have been reading it in as a matrix as.matrix(read.table(myfile.txt,
 headers=TRUE)), but I don't quite know what to do with it afterward.

 There are several functions/packages that are more than capable of solving
your problem, but it would be a lot easier and more productive if you
provide a concrete example.

HTH,
Dennis



 Thanks












[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-09-10 Thread Joshua Wiley
Hi,

Look at the table() function.  Here is an example with your data:


dat - read.table(textConnection(
Study
A
A
B
B
B
A
C
C
D), header = TRUE)
closeAllConnections()

table(dat)


Hope that helps,

Josh

On Fri, Sep 10, 2010 at 8:53 AM, dfong df...@medicine.umaryland.edu wrote:

 Hi,

 I just started using R and need some guidance.

 I need to create a time series chart in R, but the problem is the data is
 not numeric.
 The data is in the following format

 Study
 A
 A
 B
 B
 B
 A
 C
 C
 D

 Then there is also another column with dates. How can I manipulate this in
 order to have something that will count the number of unique entries and
 group them.
 Say A = 3 B= 3 C=2 D=1

 Thanks
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-Manipulation-tp2534662p2534662.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-09-10 Thread dfong

I'm actually importing it from a CSV, so I already have that in a table. But
i Can't make a graph with text. I assume I need to do some counting in order
to draw the graph?
Any example of this?

thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Data-Manipulation-tp2534662p2534690.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-09-10 Thread Joshua Wiley
Hi,

Yes, the table() function is not to read the data, but to do the
frequency counts of each level.  I just included the read.table() part
so that you could copy and paste my code, but I did not include the R
output from table(dat).

 table(dat)
dat
A B C D
3 3 2 1

It nicely tallies for you.  Also, you can look at a simple plot:

# you will have to run this in your R
# because I do not know an easy way to include graphs
plot(table(dat))

You can also save the results in a new variable and then access portions of it:

 my.table - table(dat)
 my.table # the full table
dat
A B C D
3 3 2 1
 my.table[2] # just extract the second element
B
3


Cheers,

Josh

On Fri, Sep 10, 2010 at 9:11 AM, dfong df...@medicine.umaryland.edu wrote:

 I'm actually importing it from a CSV, so I already have that in a table. But
 i Can't make a graph with text. I assume I need to do some counting in order
 to draw the graph?
 Any example of this?

 thanks
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-Manipulation-tp2534662p2534690.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-09-10 Thread Erik Iverson

Hello,

This is definitely possible with R, there a lots of package to make
good graphics.

However, the easiest way for us to help you is if you give us a small
reproducible example, as you started to with your initial post.

If you have an object in your R session that you'd like help with
you can use ?dput to create a text version of it to share with the list.

The table class in R is separate from a data.frame, which is
probably what you have now...

dfong wrote:

I'm actually importing it from a CSV, so I already have that in a table. But
i Can't make a graph with text. I assume I need to do some counting in order
to draw the graph?
Any example of this?

thanks


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation search

2010-08-11 Thread Erik Iverson

?match, look at the %in% operator.

Mestat wrote:

Hi listers,
I made some search, but i didn`t find in the forum.
I have a data set.
I would like to make a search (conditon) on my data set.

x-c(1,2,3,4,5,6,7,8,9,10)
count-0
if (CONDITON){count-1}else{count-0}

My CONDITION would be: is there number 5 in my data set?

Thanks in advance,
Marcio


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-09 Thread moleps
In the end after going at it from scratch...This worked out allright...


##set up data
age.cat-seq(0,100,10)
 year-(1953:(1953+55))
 dat.vec-sample(1:10,(length(age.cat)*length(year)))
 dat.matrix-matrix(dat.vec,c(length(age.cat),length(year)))
 rownames(dat.matrix)-age.cat
 colnames(dat.matrix)-year
 year.int-seq(1950,2010,5)
 age.div-cut(year,year.int,include.lowest=T)
 
##summarise by another variable

 a-do.call(cbind,by(t(dat.matrix),age.div,function(x)colSums(x)));a
 
//M









On 6. apr. 2010, at 21.41, David Winsemius wrote:

 
 On Apr 6, 2010, at 3:30 PM, David Winsemius wrote:
 
 
 On Apr 6, 2010, at 9:56 AM, moleps islon wrote:
 
 OK... next question.. Which is still a data manipulation problem so I
 believe the heading is still OK.
 
 ##So now I read my population data from excel.
 
 No, you read it from a text file and providing the first ten lines of that 
 text file should have been really easy. Read the Posting Guide for advice 
 about offering datasets either as structure() objects with dput or dump or 
 as attached files with *.txt extension (not .csv). Just change the file 
 name with your file browser.
 
 pop-read.csv(pop.csv)
 
 typeof(pop) ## yields a list
 
 Really? I would have guessed it to yield just list.
 
 where I have age-specific population rows
 and a yearly column population, where the years are suffixed by X
 
 And had you used class(pop) you would have learned it was a dataframe and 
 even more informative would have been str(pop).
 
 c-(1953:2008)
 
 No, no, no. Do not use variable names that are important function names. The 
 R interpreter can (usually) keep things straight but it is our brains that 
 experience problems.  Other  function names to avoid: data, df, cut, mean, 
 sd, list, vector, matrix
 
 names(pop)-c
 c.div-cut(c,break=seq(1950,2010,by=5)
 
 (You should have gotten an error here.) After fixing the error, did you you 
 notice that there were only 3 of the first level???
 
 Watch out for cut(). It uses the default convention of ( , ] , i.e. open 
 interval at right
  er,  
   ^left^
 
 which is backwards to what some (most?) of us think natural. Because of that 
 the lowest level gets dropped unless you take special precautions.  That is 
 undoubtedly why Harrell set up his Hmisc::cut2 to have the default be [ , )
 
 Aggregating across columns? Certainly possible, but maybe not as natural a 
 fit to functions like split as would occur with working across rows. I 
 suppose you could use something like this untested (because _still_ no 
 sample dataset provided) code:
 
 apply(pop, 1,# this works a row a time
   function(x) tapply(x, list(c.div), sum) ) )  # or use aggregate which uses 
 tapply
 
 I'm not sure it will work, since I don't know if the column names would get 
 carried over into x by apply(). You might need to create a separate index 
 that used the numeric positions of the columns rather than their names. 
 Perhaps use c.div -  seq(0,(2008-1953)) %/% 5  or some such inside tapply.
 
 
 Now I'd like to sum the agespecific population over the individual
 levels of -c.div- and generate a new table for this with agespecific
 rows and columns containing the 5-year bins instead of the original
 yearly data. Do I have to program this from scratch or is it possible
 to use an already existing function?
 
 I think you ought to read more introductory material (and the Posting Guide 
 regarding how to offer example datasets). In this case there are many 
 functions that do data aggregation and most of them should be illustrated in 
 a good introductory text.
 
 -- 
 David.
 
 
 //M
 
 qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
 TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE
 
 On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:
 
 Thx Erik,
 I have no idea what went wrong with the other code snippet, but this one 
 works.. Appreciate it.
 
 qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = 
 TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))
 
 M
 
 
 On 5. apr. 2010, at 21.45, Erik Iverson wrote:
 
 I don't know what your data are like, since you haven't given a 
 reproducible example. I was imagining something like:
 
 ## generate fake data
 age - sample(20:90, 100, replace = TRUE)
 year - sample(1950:2000, 100, replace = TRUE)
 
 ##look at big table
 table(age, year)
 
 ## categorize data
 ## see include.lowest and right arguments to cut
 age.factor - cut(age, breaks = seq(20, 90, by = 10),
   include.lowest = TRUE)
 
 year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
include.lowest = TRUE)
 
 table(age.factor, year.factor)
 
 moleps wrote:
 I already did try the regression modeling approach. However the 
 epidemiologists (referee) turns out to be quite fond of comparing the 
 incidence rates to different standard populations, hence the need 

Re: [R] Data manipulation problem

2010-04-09 Thread Dieter Menne


Bert Gunter wrote:
 
 Yes. Don't do this.
 
 (what you probably really want to do is fit a model with age as a factor,
 which can be done statistically e.g. by logistic regression; or
 graphically
 using conditioning plots, e.g. via trellis graphics (the lattice package).
 This avoids the arbitrariness and discontinuities of binning by age
 range.)
 
 

Moleps' reply: the reviewer wants it.

Dieter: Sigh. Too often have received such a request, asking for all
pairwise tests of each age groups. Applying the most generic Bonferroni
correction often ends the debate quickly.

Dieter


-- 
View this message in context: 
http://n4.nabble.com/Data-manipulation-problem-tp1751932p1819579.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-07 Thread moleps islon
So.. here we try again.

##generate dataset
age.cat-seq(0,100,10)
year-(1953:(1953+55))
data.vec-sample(1:1,(age.cat*year))
data.matrix-matrix(data.vec,c(length(age.cat),length(year))
rownames(data.matrix)-age.cat
colnames(data.matrix)-year

##divide into 5 year periods
age.div-cut(year,seq(1950,2010,6),include.lowest=T) ##interval is
beyond my datainterval so I doubt the include.lowest matters

Now what I'd like to do is summarise the rows within the 5-year intervals.

I did read about apply in its different variants and Dahlgaard, but I
do not know understand how it could be applied in this setting.

I tried making an array and summarise by that (used the vector and
applied it into a
length(age.cat)*max(vector(table(age.div)*length(age.div) array. It
worked but required a bit of tweaking (inserting null columns) and I
find myself in this situation quite often whereby I need to add
multiple columns based on another vector so I'd be very interested in
another more general approach.

//M



On Tue, Apr 6, 2010 at 9:41 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Apr 6, 2010, at 3:30 PM, David Winsemius wrote:


 On Apr 6, 2010, at 9:56 AM, moleps islon wrote:

 OK... next question.. Which is still a data manipulation problem so I
 believe the heading is still OK.

 ##So now I read my population data from excel.

 No, you read it from a text file and providing the first ten lines of that
 text file should have been really easy. Read the Posting Guide for advice
 about offering datasets either as structure() objects with dput or dump or
 as attached files with *.txt extension (not .csv). Just change the file
 name with your file browser.

 pop-read.csv(pop.csv)

 typeof(pop) ## yields a list

 Really? I would have guessed it to yield just list.

 where I have age-specific population rows
 and a yearly column population, where the years are suffixed by X

 And had you used class(pop) you would have learned it was a dataframe and
 even more informative would have been str(pop).

 c-(1953:2008)

 No, no, no. Do not use variable names that are important function names.
 The R interpreter can (usually) keep things straight but it is our brains
 that experience problems.  Other  function names to avoid: data, df, cut,
 mean, sd, list, vector, matrix

 names(pop)-c
 c.div-cut(c,break=seq(1950,2010,by=5)

 (You should have gotten an error here.) After fixing the error, did you
 you notice that there were only 3 of the first level???

 Watch out for cut(). It uses the default convention of ( , ] , i.e. open
 interval at right

  er,
^left^

 which is backwards to what some (most?) of us think natural. Because of
 that the lowest level gets dropped unless you take special precautions.
  That is undoubtedly why Harrell set up his Hmisc::cut2 to have the default
 be [ , )

 Aggregating across columns? Certainly possible, but maybe not as natural a
 fit to functions like split as would occur with working across rows. I
 suppose you could use something like this untested (because _still_ no
 sample dataset provided) code:

 apply(pop, 1,# this works a row a time
   function(x) tapply(x, list(c.div), sum) ) )  # or use aggregate which
 uses tapply

 I'm not sure it will work, since I don't know if the column names would
 get carried over into x by apply(). You might need to create a separate
 index that used the numeric positions of the columns rather than their
 names. Perhaps use c.div -  seq(0,(2008-1953)) %/% 5  or some such inside
 tapply.


 Now I'd like to sum the agespecific population over the individual
 levels of -c.div- and generate a new table for this with agespecific
 rows and columns containing the 5-year bins instead of the original
 yearly data. Do I have to program this from scratch or is it possible
 to use an already existing function?

 I think you ought to read more introductory material (and the Posting
 Guide regarding how to offer example datasets). In this case there are many
 functions that do data aggregation and most of them should be illustrated in
 a good introductory text.

 --
 David.


 //M

 qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
 TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

 On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:

 Thx Erik,
 I have no idea what went wrong with the other code snippet, but this one
 works.. Appreciate it.

 qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
 TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))

 M


 On 5. apr. 2010, at 21.45, Erik Iverson wrote:

 I don't know what your data are like, since you haven't given a
 reproducible example. I was imagining something like:

 ## generate fake data
 age - sample(20:90, 100, replace = TRUE)
 year - sample(1950:2000, 100, replace = TRUE)

 ##look at big table
 table(age, year)

 ## categorize data
 ## see include.lowest and 

Re: [R] Data manipulation problem

2010-04-07 Thread David Winsemius
That code throws multiple errors. Can you at least test your code  
before posting?


(And, again, please avoid using function names as names for your  
objects.)


-- David.

On Apr 7, 2010, at 8:54 AM, moleps islon wrote:


So.. here we try again.

##generate dataset
age.cat-seq(0,100,10)
year-(1953:(1953+55))
data.vec-sample(1:1,(age.cat*year))
data.matrix-matrix(data.vec,c(length(age.cat),length(year))
rownames(data.matrix)-age.cat
colnames(data.matrix)-year

##divide into 5 year periods
age.div-cut(year,seq(1950,2010,6),include.lowest=T) ##interval is
beyond my datainterval so I doubt the include.lowest matters

Now what I'd like to do is summarise the rows within the 5-year  
intervals.


I did read about apply in its different variants and Dahlgaard, but I
do not know understand how it could be applied in this setting.

I tried making an array and summarise by that (used the vector and
applied it into a
length(age.cat)*max(vector(table(age.div)*length(age.div) array. It
worked but required a bit of tweaking (inserting null columns) and I
find myself in this situation quite often whereby I need to add
multiple columns based on another vector so I'd be very interested in
another more general approach.

//M



On Tue, Apr 6, 2010 at 9:41 PM, David Winsemius dwinsem...@comcast.net 
 wrote:


On Apr 6, 2010, at 3:30 PM, David Winsemius wrote:



On Apr 6, 2010, at 9:56 AM, moleps islon wrote:

OK... next question.. Which is still a data manipulation problem  
so I

believe the heading is still OK.

##So now I read my population data from excel.


No, you read it from a text file and providing the first ten lines  
of that
text file should have been really easy. Read the Posting Guide for  
advice
about offering datasets either as structure() objects with dput or  
dump or
as attached files with *.txt extension (not .csv). Just change  
the file

name with your file browser.


pop-read.csv(pop.csv)

typeof(pop) ## yields a list


Really? I would have guessed it to yield just list.


where I have age-specific population rows
and a yearly column population, where the years are suffixed by X


And had you used class(pop) you would have learned it was a  
dataframe and

even more informative would have been str(pop).


c-(1953:2008)


No, no, no. Do not use variable names that are important function  
names.
The R interpreter can (usually) keep things straight but it is our  
brains
that experience problems.  Other  function names to avoid: data,  
df, cut,

mean, sd, list, vector, matrix


names(pop)-c
c.div-cut(c,break=seq(1950,2010,by=5)


(You should have gotten an error here.) After fixing the error,  
did you

you notice that there were only 3 of the first level???

Watch out for cut(). It uses the default convention of ( , ] ,  
i.e. open

interval at right


 
er,

  ^left^

which is backwards to what some (most?) of us think natural.  
Because of
that the lowest level gets dropped unless you take special  
precautions.
That is undoubtedly why Harrell set up his Hmisc::cut2 to have the  
default

be [ , )

Aggregating across columns? Certainly possible, but maybe not as  
natural a
fit to functions like split as would occur with working across  
rows. I
suppose you could use something like this untested (because  
_still_ no

sample dataset provided) code:

apply(pop, 1,# this works a row a time
 function(x) tapply(x, list(c.div), sum) ) )  # or use aggregate  
which

uses tapply

I'm not sure it will work, since I don't know if the column names  
would
get carried over into x by apply(). You might need to create a  
separate
index that used the numeric positions of the columns rather than  
their
names. Perhaps use c.div -  seq(0,(2008-1953)) %/% 5  or some  
such inside

tapply.



Now I'd like to sum the agespecific population over the individual
levels of -c.div- and generate a new table for this with  
agespecific

rows and columns containing the 5-year bins instead of the original
yearly data. Do I have to program this from scratch or is it  
possible

to use an already existing function?


I think you ought to read more introductory material (and the  
Posting
Guide regarding how to offer example datasets). In this case there  
are many
functions that do data aggregation and most of them should be  
illustrated in

a good introductory text.

--
David.



//M

qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:


Thx Erik,
I have no idea what went wrong with the other code snippet, but  
this one

works.. Appreciate it.

qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))

M


On 5. apr. 2010, at 21.45, Erik Iverson wrote:


I don't know what your data are like, since you haven't given a

Re: [R] Data manipulation problem

2010-04-06 Thread moleps islon
OK... next question.. Which is still a data manipulation problem so I
believe the heading is still OK.

##So now I read my population data from excel.
pop-read.csv(pop.csv)

typeof(pop) ## yields a list where I have age-specific population rows
and a yearly column population, where the years are suffixed by X

c-(1953:2008)
names(pop)-c
c.div-cut(c,break=seq(1950,2010,by=5)

Now I'd like to sum the agespecific population over the individual
levels of -c.div- and generate a new table for this with agespecific
rows and columns containing the 5-year bins instead of the original
yearly data. Do I have to program this from scratch or is it possible
to use an already existing function?


//M






qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:

 Thx Erik,
 I have no idea what went wrong with the other code snippet, but this one 
 works.. Appreciate it.

 qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = 
 TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))

 M


 On 5. apr. 2010, at 21.45, Erik Iverson wrote:

 I don't know what your data are like, since you haven't given a reproducible 
 example. I was imagining something like:

 ## generate fake data
 age - sample(20:90, 100, replace = TRUE)
 year - sample(1950:2000, 100, replace = TRUE)

 ##look at big table
 table(age, year)

 ## categorize data
 ## see include.lowest and right arguments to cut
 age.factor - cut(age, breaks = seq(20, 90, by = 10),
                  include.lowest = TRUE)

 year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
                   include.lowest = TRUE)

 table(age.factor, year.factor)

 moleps wrote:
 I already did try the regression modeling approach. However the 
 epidemiologists (referee) turns out to be quite fond of comparing the 
 incidence rates to different standard populations, hence the need for this 
 labourius approach. And trying the cutting approach I ended up with :
 table (age5)
 age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
 (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  
 (80,85] (85,100]       35       34       33       47       51      109      
 157      231      362      511      745      926     1002      866      547 
      247       82       18
 table (yr5)
 yr5
 (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
 (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009]     
       3           5           5           5           5           5         
   5           5           5           5           5           3
 table (yr5,age5)
 Error in table(yr5, age5) : all arguments must have the same length
 Sincerely,
 M
 On 5. apr. 2010, at 20.59, Bert Gunter wrote:
 You have tempted, and being weak, I yield to temptation:

 Any good ideas?

 Yes. Don't do this.

 (what you probably really want to do is fit a model with age as a factor,
 which can be done statistically e.g. by logistic regression; or graphically
 using conditioning plots, e.g. via trellis graphics (the lattice package).
 This avoids the arbitrariness and discontinuities of binning by age range.)

 Bert Gunter
 Genentech Nonclinical Biostatistics

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of moleps
 Sent: Monday, April 05, 2010 11:46 AM
 To: r-help@r-project.org
 Subject: [R] Data manipulation problem

 Dear R´ers.

 I´ve got a dataset with age and year of diagnosis. In order to
 age-standardize the incidence I need to transform the data into a matrix
 with age-groups (divided in 5 or 10 years) along one axis and year divided
 into 5 years along the other axis. Each cell should contain the number of
 cases for that age group and for that period.
 I.e.
 My data format now is
 ID-age (to one decimal)-year(yearly data).

 What I´d like is

 age 1960-1965 1966-1970 etc...
 0-5 3 8 10 15
 6-10 2 5 8 13
 etc..


 Any good ideas?

 Regards,
 M

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-06 Thread David Winsemius


On Apr 6, 2010, at 9:56 AM, moleps islon wrote:


OK... next question.. Which is still a data manipulation problem so I
believe the heading is still OK.

##So now I read my population data from excel.


No, you read it from a text file and providing the first ten lines of  
that text file should have been really easy. Read the Posting Guide  
for advice about offering datasets either as structure() objects with  
dput or dump or as attached files with *.txt extension (not .csv).  
Just change the file name with your file browser.



pop-read.csv(pop.csv)

typeof(pop) ## yields a list


Really? I would have guessed it to yield just list.


where I have age-specific population rows
and a yearly column population, where the years are suffixed by X


And had you used class(pop) you would have learned it was a dataframe  
and even more informative would have been str(pop).


c-(1953:2008)


No, no, no. Do not use variable names that are important function  
names. The R interpreter can (usually) keep things straight but it is  
our brains that experience problems.  Other  function names to avoid:  
data, df, cut, mean, sd, list, vector, matrix



names(pop)-c
c.div-cut(c,break=seq(1950,2010,by=5)


(You should have gotten an error here.) After fixing the error, did  
you you notice that there were only 3 of the first level???


Watch out for cut(). It uses the default convention of ( , ] , i.e.  
open interval at right which is backwards to what some (most?) of us  
think natural. Because of that the lowest level gets dropped unless  
you take special precautions.  That is undoubtedly why Harrell set up  
his Hmisc::cut2 to have the default be [ , )


Aggregating across columns? Certainly possible, but maybe not as  
natural a fit to functions like split as would occur with working  
across rows. I suppose you could use something like this untested  
(because _still_ no sample dataset provided) code:


apply(pop, 1,# this works a row a time
function(x) tapply(x, list(c.div), sum) ) )  # aggregate which  
uses tapply


I'm not sure it will work, since I don't know if the column names  
would get carried over into x by apply(). You might need to create a  
separate index that used the numeric positions of the columns rather  
than their names. Perhaps use c.div -  seq(0,(2008-1953)) %/% 5  or  
some such inside tapply.




Now I'd like to sum the agespecific population over the individual
levels of -c.div- and generate a new table for this with agespecific
rows and columns containing the 5-year bins instead of the original
yearly data. Do I have to program this from scratch or is it possible
to use an already existing function?


I think you ought to read more introductory material (and the Posting  
Guide regarding how to offer example datasets). In this case there are  
many functions that do data aggregation and most of them should be  
illustrated in a good introductory text.


--
David.



//M

qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:


Thx Erik,
I have no idea what went wrong with the other code snippet, but  
this one works.. Appreciate it.


qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =  
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))


M


On 5. apr. 2010, at 21.45, Erik Iverson wrote:

I don't know what your data are like, since you haven't given a  
reproducible example. I was imagining something like:


## generate fake data
age - sample(20:90, 100, replace = TRUE)
year - sample(1950:2000, 100, replace = TRUE)

##look at big table
table(age, year)

## categorize data
## see include.lowest and right arguments to cut
age.factor - cut(age, breaks = seq(20, 90, by = 10),
include.lowest = TRUE)

year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
 include.lowest = TRUE)

table(age.factor, year.factor)

moleps wrote:
I already did try the regression modeling approach. However the  
epidemiologists (referee) turns out to be quite fond of comparing  
the incidence rates to different standard populations, hence the  
need for this labourius approach. And trying the cutting  
approach I ended up with :

table (age5)

age5
 (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]   
(35,40]  (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  
(70,75]  (75,80]  (80,85] (85,100]   35   34
33   47   51  109  157  231  362   
511745  926 1002  866  547  247
82   18

table (yr5)

yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975]  
(1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000]  
(2000,2005] (2005,2009]   3   5
5   5   5   5   5
5 5   5   5   3

table (yr5,age5)

Error 

Re: [R] Data manipulation problem

2010-04-06 Thread David Winsemius


On Apr 6, 2010, at 3:30 PM, David Winsemius wrote:



On Apr 6, 2010, at 9:56 AM, moleps islon wrote:


OK... next question.. Which is still a data manipulation problem so I
believe the heading is still OK.

##So now I read my population data from excel.


No, you read it from a text file and providing the first ten lines  
of that text file should have been really easy. Read the Posting  
Guide for advice about offering datasets either as structure()  
objects with dput or dump or as attached files with *.txt  
extension (not .csv). Just change the file name with your file  
browser.



pop-read.csv(pop.csv)

typeof(pop) ## yields a list


Really? I would have guessed it to yield just list.


where I have age-specific population rows
and a yearly column population, where the years are suffixed by X


And had you used class(pop) you would have learned it was a  
dataframe and even more informative would have been str(pop).


c-(1953:2008)


No, no, no. Do not use variable names that are important function  
names. The R interpreter can (usually) keep things straight but it  
is our brains that experience problems.  Other  function names to  
avoid: data, df, cut, mean, sd, list, vector, matrix



names(pop)-c
c.div-cut(c,break=seq(1950,2010,by=5)


(You should have gotten an error here.) After fixing the error, did  
you you notice that there were only 3 of the first level???


Watch out for cut(). It uses the default convention of ( , ] , i.e.  
open interval at right
   
er,^left^


which is backwards to what some (most?) of us think natural. Because  
of that the lowest level gets dropped unless you take special  
precautions.  That is undoubtedly why Harrell set up his Hmisc::cut2  
to have the default be [ , )


Aggregating across columns? Certainly possible, but maybe not as  
natural a fit to functions like split as would occur with working  
across rows. I suppose you could use something like this untested  
(because _still_ no sample dataset provided) code:


apply(pop, 1,# this works a row a time
   function(x) tapply(x, list(c.div), sum) ) )  # or use aggregate  
which uses tapply


I'm not sure it will work, since I don't know if the column names  
would get carried over into x by apply(). You might need to create  
a separate index that used the numeric positions of the columns  
rather than their names. Perhaps use c.div -  seq(0,(2008-1953)) %/ 
% 5  or some such inside tapply.




Now I'd like to sum the agespecific population over the individual
levels of -c.div- and generate a new table for this with agespecific
rows and columns containing the 5-year bins instead of the original
yearly data. Do I have to program this from scratch or is it possible
to use an already existing function?


I think you ought to read more introductory material (and the  
Posting Guide regarding how to offer example datasets). In this case  
there are many functions that do data aggregation and most of them  
should be illustrated in a good introductory text.


--
David.



//M

qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE

On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote:


Thx Erik,
I have no idea what went wrong with the other code snippet, but  
this one works.. Appreciate it.


qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest =  
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))


M


On 5. apr. 2010, at 21.45, Erik Iverson wrote:

I don't know what your data are like, since you haven't given a  
reproducible example. I was imagining something like:


## generate fake data
age - sample(20:90, 100, replace = TRUE)
year - sample(1950:2000, 100, replace = TRUE)

##look at big table
table(age, year)

## categorize data
## see include.lowest and right arguments to cut
age.factor - cut(age, breaks = seq(20, 90, by = 10),
   include.lowest = TRUE)

year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
include.lowest = TRUE)

table(age.factor, year.factor)

moleps wrote:
I already did try the regression modeling approach. However the  
epidemiologists (referee) turns out to be quite fond of  
comparing the incidence rates to different standard populations,  
hence the need for this labourius approach. And trying the  
cutting approach I ended up with :

table (age5)

age5
(0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]   
(35,40]  (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  
(70,75]  (75,80]  (80,85] (85,100]   35   34
33   47   51  109  157  231  362   
511745  926 1002  866  547  247
82   18

table (yr5)

yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975]  
(1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000]  
(2000,2005] (2005,2009]   3  

Re: [R] Data manipulation problem

2010-04-05 Thread Erik Iverson

?cut to create categories
?table to make the table

moleps wrote:

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. 


I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 



age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-05 Thread Bert Gunter
You have tempted, and being weak, I yield to temptation:

Any good ideas?

Yes. Don't do this.

(what you probably really want to do is fit a model with age as a factor,
which can be done statistically e.g. by logistic regression; or graphically
using conditioning plots, e.g. via trellis graphics (the lattice package).
This avoids the arbitrariness and discontinuities of binning by age range.)

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of moleps
Sent: Monday, April 05, 2010 11:46 AM
To: r-help@r-project.org
Subject: [R] Data manipulation problem

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to
age-standardize the incidence I need to transform the data into a matrix
with age-groups (divided in 5 or 10 years) along one axis and year divided
into 5 years along the other axis. Each cell should contain the number of
cases for that age group and for that period. 

I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 


age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-05 Thread moleps
I already did try the regression modeling approach. However the epidemiologists 
(referee) turns out to be quite fond of comparing the incidence rates to 
different standard populations, hence the need for this labourius approach. 
And trying the cutting approach I ended up with :

 table (age5)
age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
(40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  (80,85] 
(85,100] 
  35   34   33   47   51  109  157  231  
362  511  745  926 1002  866  547  247   82 
  18 
 table (yr5)
yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
(1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 
  3   5   5   5   5   5 
  5   5   5   5   5   3 
 table (yr5,age5)
Error in table(yr5, age5) : all arguments must have the same length

Sincerely,
M





On 5. apr. 2010, at 20.59, Bert Gunter wrote:

 You have tempted, and being weak, I yield to temptation:
 
 Any good ideas?
 
 Yes. Don't do this.
 
 (what you probably really want to do is fit a model with age as a factor,
 which can be done statistically e.g. by logistic regression; or graphically
 using conditioning plots, e.g. via trellis graphics (the lattice package).
 This avoids the arbitrariness and discontinuities of binning by age range.)
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of moleps
 Sent: Monday, April 05, 2010 11:46 AM
 To: r-help@r-project.org
 Subject: [R] Data manipulation problem
 
 Dear R´ers.
 
 I´ve got a dataset with age and year of diagnosis. In order to
 age-standardize the incidence I need to transform the data into a matrix
 with age-groups (divided in 5 or 10 years) along one axis and year divided
 into 5 years along the other axis. Each cell should contain the number of
 cases for that age group and for that period. 
 
 I.e.
 My data format now is
 ID-age (to one decimal)-year(yearly data).
 
 What I´d like is 
 
 
 age 1960-1965 1966-1970 etc...
 0-5 3 8 10 15
 6-10 2 5 8 13
 etc..
 
 
 Any good ideas?
 
 Regards,
 M
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-05 Thread Erik Iverson
I don't know what your data are like, since you haven't given a 
reproducible example. I was imagining something like:


## generate fake data
age - sample(20:90, 100, replace = TRUE)
year - sample(1950:2000, 100, replace = TRUE)

##look at big table
table(age, year)

## categorize data
## see include.lowest and right arguments to cut
age.factor - cut(age, breaks = seq(20, 90, by = 10),
  include.lowest = TRUE)

year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
   include.lowest = TRUE)

table(age.factor, year.factor)

moleps wrote:
I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. 
And trying the cutting approach I ended up with :



table (age5)

age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  (80,85] (85,100] 
  35   34   33   47   51  109  157  231  362  511  745  926 1002  866  547  247   82   18 

table (yr5)

yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 
  3   5   5   5   5   5   5   5   5   5   5   3 

table (yr5,age5)

Error in table(yr5, age5) : all arguments must have the same length

Sincerely,
M





On 5. apr. 2010, at 20.59, Bert Gunter wrote:


You have tempted, and being weak, I yield to temptation:

Any good ideas?

Yes. Don't do this.

(what you probably really want to do is fit a model with age as a factor,
which can be done statistically e.g. by logistic regression; or graphically
using conditioning plots, e.g. via trellis graphics (the lattice package).
This avoids the arbitrariness and discontinuities of binning by age range.)

Bert Gunter
Genentech Nonclinical Biostatistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of moleps
Sent: Monday, April 05, 2010 11:46 AM
To: r-help@r-project.org
Subject: [R] Data manipulation problem

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to
age-standardize the incidence I need to transform the data into a matrix
with age-groups (divided in 5 or 10 years) along one axis and year divided
into 5 years along the other axis. Each cell should contain the number of
cases for that age group and for that period. 


I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 



age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation problem

2010-04-05 Thread moleps

Thx Erik,
I have no idea what went wrong with the other code snippet, but this one 
works.. Appreciate it.

qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = 
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))

M


On 5. apr. 2010, at 21.45, Erik Iverson wrote:

 I don't know what your data are like, since you haven't given a reproducible 
 example. I was imagining something like:
 
 ## generate fake data
 age - sample(20:90, 100, replace = TRUE)
 year - sample(1950:2000, 100, replace = TRUE)
 
 ##look at big table
 table(age, year)
 
 ## categorize data
 ## see include.lowest and right arguments to cut
 age.factor - cut(age, breaks = seq(20, 90, by = 10),
  include.lowest = TRUE)
 
 year.factor - cut(year, breaks = seq(1950, 2000, by = 10),
   include.lowest = TRUE)
 
 table(age.factor, year.factor)
 
 moleps wrote:
 I already did try the regression modeling approach. However the 
 epidemiologists (referee) turns out to be quite fond of comparing the 
 incidence rates to different standard populations, hence the need for this 
 labourius approach. And trying the cutting approach I ended up with :
 table (age5)
 age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
 (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  
 (80,85] (85,100]   35   34   33   47   51  109  
 157  231  362  511  745  926 1002  866  547  
 247   82   18 
 table (yr5)
 yr5
 (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
 (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009]  
  3   5   5   5   5   5   
 5   5   5   5   5   3 
 table (yr5,age5)
 Error in table(yr5, age5) : all arguments must have the same length
 Sincerely,
 M
 On 5. apr. 2010, at 20.59, Bert Gunter wrote:
 You have tempted, and being weak, I yield to temptation:
 
 Any good ideas?
 
 Yes. Don't do this.
 
 (what you probably really want to do is fit a model with age as a factor,
 which can be done statistically e.g. by logistic regression; or graphically
 using conditioning plots, e.g. via trellis graphics (the lattice package).
 This avoids the arbitrariness and discontinuities of binning by age range.)
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of moleps
 Sent: Monday, April 05, 2010 11:46 AM
 To: r-help@r-project.org
 Subject: [R] Data manipulation problem
 
 Dear R´ers.
 
 I´ve got a dataset with age and year of diagnosis. In order to
 age-standardize the incidence I need to transform the data into a matrix
 with age-groups (divided in 5 or 10 years) along one axis and year divided
 into 5 years along the other axis. Each cell should contain the number of
 cases for that age group and for that period. 
 I.e.
 My data format now is
 ID-age (to one decimal)-year(yearly data).
 
 What I´d like is 
 
 age 1960-1965 1966-1970 etc...
 0-5 3 8 10 15
 6-10 2 5 8 13
 etc..
 
 
 Any good ideas?
 
 Regards,
 M
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-26 Thread Peter Rote

I still struggling with this:

 error massage:

  by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
 ,,a$Industry[1]),.txt,sep=)
+ print(filename)
+ write.table(a[,3,drop=FALSE],quote=FALSE,col.names=FALSE,row.names=FALSE)
+ }
+  )  

[1] C:/ab/AccidentHealthInsurance.txt
Error in `[.data.frame`(a, , 3, drop = FALSE) :
  undefined columns selected

Best,
Peter 
-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1290191.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-26 Thread Peter Dalgaard
Peter Rote wrote:
 I still struggling with this:
 
  error massage:
 
  by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
 ,,a$Industry[1]),.txt,sep=)
 + print(filename)
 + write.table(a[,3,drop=FALSE],quote=FALSE,col.names=FALSE,row.names=FALSE)   
  
 + }
 +  )  
 
 [1] C:/ab/AccidentHealthInsurance.txt
 Error in `[.data.frame`(a, , 3, drop = FALSE) :
   undefined columns selected

The message says that you haven't got three columns in a, so try
inserting print(dim(a)). Perhaps what you showed earlier was rownames
plus two columns?

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-26 Thread Jim Lemon

On 01/26/2010 09:15 PM, Peter Rote wrote:


I still struggling with this:

  error massage:


  by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
,,a$Industry[1]),.txt,sep=)

+ print(filename)
+ write.table(a[,3,drop=FALSE],quote=FALSE,col.names=FALSE,row.names=FALSE)
+ }
+  )

[1] C:/ab/AccidentHealthInsurance.txt
Error in `[.data.frame`(a, , 3, drop = FALSE) :
   undefined columns selected


Hi Peter,
I would suggest that you print the first say 10 rows of a and see if 
it has three columns.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-22 Thread Don MacQueen

Does this example help?



 a - matrix(letters[1:12], ncol=3)
 a

 [,1] [,2] [,3]
[1,] a  e  i
[2,] b  f  j
[3,] c  g  k
[4,] d  h  l


 write.table(a[,3,drop=FALSE],quote=FALSE,col.names=FALSE,row.names=FALSE)

i
j
k
l



At 4:11 PM -0800 1/21/10, Peter Rote wrote:

Thank you Dieter and Rolf,

I have solved the slash Problem, but I still struggling  with the output
files.

I have tried this
 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
,,a$Industry[1]),.txt,sep=)
print(filename)
	write.table(a,file=filename,col.names = FALSE)
}

 )

and this

 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
,,a$Industry[1]),.txt,sep=)
print(filename)
	write(as.character(a),file=filename)
}
 ) 



I want in each file just the ticker with out any quotations mark.

CMM
FMCN
IPG
MWW

Thanks in advance,
Peter

--
View this message in context: 
http://*n4.nabble.com/Data-Manipulation-tp1018249p1073567.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-22 Thread Peter Rote

Thank you Don for the code, 

but I get the following error massage:

  by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
 ,,a$Industry[1]),.txt,sep=)
+ print(filename)
+ write.table(a[,3,drop=FALSE],quote=FALSE,col.names=FALSE,row.names=FALSE) 
+ }
+  )  

[1] C:/ab/AccidentHealthInsurance.txt
Error in `[.data.frame`(a, , 3, drop = FALSE) : 
  undefined columns selected

Best,
Peter
-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1100168.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-21 Thread Peter Rote

Thank you Dieter and Rolf,

I have solved the slash Problem, but I still struggling  with the output
files.

I have tried this
 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
,,a$Industry[1]),.txt,sep=)
print(filename)
write.table(a,file=filename,col.names = FALSE) 
}
 )

and this 

 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
,,a$Industry[1]),.txt,sep=)
print(filename)
write(as.character(a),file=filename) 
}
 )  


I want in each file just the ticker with out any quotations mark.

CMM
FMCN
IPG
MWW 

Thanks in advance, 
Peter
 
-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1073567.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-20 Thread Dieter Menne


Peter Rote wrote:
 
 I would like to to group the Ticker by Industry and create file names from
 the
 Industry Factor  and export to a txt file.
 
 I have tried the folowing 
 
 ind=finvizAllexETF$Industry
 
 ind is then  Aluminum  Business Services Regional Airlines
 
 ind2=gsub(  ,,ind)
  ind3
 [1] Aluminum BusinessServices RegionalAirlines
 
 for (i in 1:3) ind3[i]- AllexETF$Ticker[AllexETF$Industry==ind2[i]]
 
 Warning messages:
 1: In ind3[i] - finvizAllexETF$Ticker[AllexETF$Industry == ind2[i]] :
   number of items to replace is not a multiple of replacement length
 
 

If this happens, try to do a 

finvizAllexETF$Ticker[AllexETF$Industry == ind2[i]] 

You will note that it returns not one, but many items, and assigning it to
ind[i] will fail. Sometimes, it helps to add a [1] at the end, but there is
another problem that these are factors and you want strings.

The example below shows on method:

set.seed(4711)
AlexETF = 
 data.frame(Industry=sample(c(Business Services, Aluminium,Regional
Airlines),10,TRUE),Price = rnorm(10,10))
by(AlexETF,AlexETF$Industry,function(a) {
 filename = paste(gsub( ,,a$Industry[1]),.txt,sep=)
 print(filename)
 write.table(a,file=filename)
   }
)

 
Dieter

 





-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1018269.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-20 Thread Peter Rote

Thank you Dieter, 

but i  still have a problem to write to file. The problem is the slash in
file names (Aerospace/Defense Products  Services ). If i want  it to C:/ab/
so C:/ab/AdvertisingAgencies.txt is ok but
C:/ab/Aerospace/Defense-MajorDiversified.txt is not

 head(AlexETF)
AlexETF.Industry AlexETF.Ticker
1Scientific  Technical Instruments A
2  Aluminum   AA
3 Business Services   AAC
4   Credit Services   AACC
5 Regional Airlines  AAI
6 Aerospace/Defense Products  Services AAII

 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(gsub(
 ,,a$Industry[1]),.txt,sep=)
+  print(filename)
+  
+}
+ ) 
[1] AccidentHealthInsurance.txt
[1] AdvertisingAgencies.txt
[1] Aerospace/Defense-MajorDiversified.txt
[1] Aerospace/DefenseProductsServices.txt
[1] AgriculturalChemicals.txt
[1] AirDeliveryFreightServices.txt



 by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ab/,gsub(
 ,,a$Industry[1]),.txt,sep=)
+  write.table(a,file=filename,col.names = FALSE) 
+}
+ ) 
Error in file(file, ifelse(append, a, w)) : 
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, a, w)) :
  cannot open file 'C:/ab/Aerospace/Defense-MajorDiversified.txt': No such
file or directory



Thanks in advance,

Peter 

-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1032029.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-20 Thread Peter Rote

by the way how do i change the output

1016 Advertising Agencies CMM
1803 Advertising Agencies FMCN
2427 Advertising Agencies IPG
3093 Advertising Agencies MWW
3372 Advertising Agencies OMC
4809 Advertising Agencies VCLK
4832 Advertising Agencies VISN
5005 Advertising Agencies WPPGY
5089 Advertising Agencies XSEL

to just

CMM
FMCN
IPG
MWW

Peter
-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1032753.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-20 Thread Rolf Turner


A name such as ``Aerospace/Defense etc.'' is certainly not a legal
file name under unix-alike systems, and I suspect it would not
be even under Windoze.  Even if it is, you shouldn't use it!

Change the name to ``Aerospace-Defense Products  Services''
or something like that, for goodness sake.

cheers,

Rolf Turner


On 21/01/2010, at 2:19 PM, Peter Rote wrote:



Thank you Dieter,

but i  still have a problem to write to file. The problem is the  
slash in
file names (Aerospace/Defense Products  Services ). If i want  it  
to C:/ab/

so C:/ab/AdvertisingAgencies.txt is ok but
C:/ab/Aerospace/Defense-MajorDiversified.txt is not


head(AlexETF)

AlexETF.Industry AlexETF.Ticker
1Scientific  Technical Instruments A
2  Aluminum   AA
3 Business Services   AAC
4   Credit Services   AACC
5 Regional Airlines  AAI
6 Aerospace/Defense Products  Services AAII


by(AlexETF,AlexETF$Industry,function(a) {filename = paste(gsub(
,,a$Industry[1]),.txt,sep=)

+  print(filename)
+
+}
+ )
[1] AccidentHealthInsurance.txt
[1] AdvertisingAgencies.txt
[1] Aerospace/Defense-MajorDiversified.txt
[1] Aerospace/DefenseProductsServices.txt
[1] AgriculturalChemicals.txt
[1] AirDeliveryFreightServices.txt



by(AlexETF,AlexETF$Industry,function(a) {filename = paste(C:/ 
ab/,gsub(

,,a$Industry[1]),.txt,sep=)

+  write.table(a,file=filename,col.names = FALSE)
+}
+ )
Error in file(file, ifelse(append, a, w)) :
  cannot open the connection
In addition: Warning message:
In file(file, ifelse(append, a, w)) :
  cannot open file 'C:/ab/Aerospace/Defense-MajorDiversified.txt':  
No such

file or directory



Thanks in advance,

Peter

--
View this message in context: http://n4.nabble.com/Data- 
Manipulation-tp1018249p1032029.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation

2010-01-20 Thread Dieter Menne


Peter Rote wrote:
 
 
 but i  still have a problem to write to file. The problem is the slash in
 file names (Aerospace/Defense Products  Services ). If i want  it to
 C:/ab/
 so C:/ab/AdvertisingAgencies.txt is ok but
 C:/ab/Aerospace/Defense-MajorDiversified.txt is not
 
 

As Rolf said, the slash is not legal in a file name, it is treated like a
backslash (\\) when run under Windows. Use create.dir to created
Aerospace, or change the slash to something else.

Dieter


-- 
View this message in context: 
http://n4.nabble.com/Data-Manipulation-tp1018249p1049554.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation/subsetting and relation matrix

2009-12-08 Thread jim holtman
try this:

myDat - read.table(textConnection(group id
1 101
1 201
1 301
2 401
2 501
2 601
3 701
3 801
3 901),header=TRUE)
closeAllConnections()
corr_mat -as.matrix(read.table(textConnection(1 1   .5  0   0   0   0
0   0   0
2 .5   1  0   0   0   0   0   0   0
3 00  1.0   0   0   0   0   0   0
4 00  0   1   .5  .5  0   0   0
5 00  0   .5  1.5  0   0   0
6 00  0   .5  .5   1 00   0
7 00  0   00   0  1   0  0
8 0   0   0   00   0   0  1  .5
9 0   0   0   0   00   0  .5 1),header=FALSE))
closeAllConnections()
corr_mat - corr_mat[,-1]
colnames(corr_mat) - myDat$id
rownames(corr_mat) - myDat$id
# split out the groups
groups - split(as.character(myDat$id), myDat$group)
# process each subgroup
result - lapply(groups, function(.grp){
subgroup - corr_mat[.grp, .grp]
output - NULL
# zero the diag
diag(subgroup) - 0
same - apply(subgroup, 1, function(x) any(x != 0))
if (any(same)){  # some match, choose one
output - sample(same[same], 1)
}
if (any(!same)){  # get all that don't correlate
output - c(output, same[!same])
}
output
})
# output as matrix
do.call(rbind, lapply(names(result), function(x) cbind(x,
names(result[[x]]



On Mon, Dec 7, 2009 at 7:38 PM, Juliet Hannah juliet.han...@gmail.comwrote:

 Hi List,

 Here is some example data.

 myDat - read.table(textConnection(group id
 1 101
 1 201
 1 301
 2 401
 2 501
 2 601
 3 701
 3 801
 3 901),header=TRUE)
 closeAllConnections()

 corr_mat -read.table(textConnection(1 1   .5  0   0   0   0   0   0   0
 2 .5   1  0   0   0   0   0   0   0
 3 00  1.0   0   0   0   0   0   0
 4 00  0   1   .5  .5  0   0   0
 5 00  0   .5  1.5  0   0   0
 6 00  0   .5  .5   1 00   0
 7 00  0   00   0  1   0  0
 8 0   0   0   00   0   0  1  .5
 9 0   0   0   0   00   0  .5 1),header=FALSE)
 closeAllConnections()

 corr_mat - corr_mat[,-1]
 colnames(corr_mat) - myDat$id
 rownames(corr_mat) - myDat$id

 I need to subset this data such that observations within a group are not
 related, which is indicated by a 0 in corr_mat.

 For example, within group 1, 101 and 201 are related, so one of these
 has to be selected, say
 101. 301 is not related to 101 or 201, so the final set for group 1
 consists of 101 and 301. There will always be at least 2 members in
 each group. I need to carry this task on all groups.

 One possible final data set looks like:

  group  id
 1 1 101
 3 1 301
 4 2 401
 7 3 701
 8 3 801

 Any suggestions? Thanks!

 Juliet

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation Question

2009-12-04 Thread Barry Rowlingson
On Thu, Dec 3, 2009 at 9:52 PM, John Filben johnfil...@yahoo.com wrote:
 Can R support data manipulation programming that is available in the SAS 
 datastep?  Specifically, can R support the following:
 -  Read multiple dataset one record at a time and compare values from 
 each; then base on if-then logic write to multiple output files
 -  Load a lookup table and then process a different file; based on 
 if-then logic, access and lookup values in the table
 -  Support modular “gosub”programming
 -  Sort files
 -  Date math and conversions
 -  Would it be able to support the following type of logic:
 o   Start
 §  Read Record from File 1
 §  Read Record from File 2
 §  Match
 · If Key 1  Key 2 and Key 1  Key 2, Write to output file A
 · If Key 1 = Key 2, Write to output file B
 · If Key 1  Key 2 and Key 1  Key 2, Write to output file C§  Goto 
 Start until File 1 Done
  John Filben

I'll expand on Hadley Wickham's Yes, to say Yes, and it wouldn't be
much of a 'system for statistical computation and graphics' if it
couldn't do that.

Remember R uses the 'S' and C programming languages and is Open
Source. If it _cant_ do something you want it to do, you can write
code that does it. Like the date math and conversions. Originally,
maybe wy back in R version 0.something, it didn't have that. But
someone wrote it, and wisely contributed it, and the community saw
that it was good. And now we have date math and conversions. And
nobody has to write any date math or conversion codes ever again.

  Now tell me how to get something into the SAS core code.

Barry

P.S. I see a very obvious optimisation you can do on this line:

  If Key 1  Key 2 and Key 1  Key 2, Write to output file A

but maybe that's some kind of weird SASism

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation Question

2009-12-04 Thread Gray Calhoun
This is probably far more discussion than the question warranted, but...

On Thu, Dec 3, 2009 at 11:14 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Dec 3, 2009, at 10:52 PM, Gray Calhoun wrote:

 The data import/export manual can elaborate on a lot of these; this is
 all straightforward, although many people would prefer to use a
 relational database for some of the things you mentioned.

 See Wickham's pithy response to this.

Sure.  My (indirect) point is that representing query results as
separate files is usually not the right approach, regardless of
statistical language/package one uses.


 I'm not
 aware of a goto command in R, though (although I could be wrong).

 In fairness to the OP, he did not ask if there were a go-to construct, but
 rather whether there were a gosub construct that supported modular
 programming. My response would have been that calling modular functions
 (i.e., subroutines with defined arguments) is fundamental to R and the key
 to understanding how to use it with grace and efficiency. I would say that
 the concept of functional programming is to a much greater extent supported
 by R than by SAS, whose datastep mechanisms (as I remember them from earlier
 incarnation) in no way supported modular programming. I suspect that S and R
 arose precisely because of the mental straightjackets imposed by SAS.

From the original: Goto Start until File 1 Done.  But, yes, probably
unfair and certainly less informative than your response.


 --
 David.


 --Gray

 On Thu, Dec 3, 2009 at 1:52 PM, John Filben johnfil...@yahoo.com wrote:

 Can R support data manipulation programming that is available in the SAS
 datastep?  Specifically, can R support the following:
 -          Read multiple dataset one record at a time and compare values
 from each; then base on if-then logic write to multiple output files
 -          Load a lookup table and then process a different file; based
 on if-then logic, access and lookup values in the table
 -          Support modular “gosub”programming
 -          Sort files
 -          Date math and conversions
 -          Would it be able to support the following type of logic:
 o   Start
 §  Read Record from File 1
 §  Read Record from File 2
 §  Match
 ·         If Key 1  Key 2 and Key 1  Key 2, Write to output file A
 ·         If Key 1 = Key 2, Write to output file B
 ·         If Key 1  Key 2 and Key 1  Key 2, Write to output file C§
  Goto Start until File 1 Done
  John Filben
 Cell Phone - 773.401.2822
 Email - johnfil...@yahoo.com



       [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2009-12-03 Thread jim holtman
try this:

 x - c('v2FfaPre15','v2FfaPre10','v2FfaPre5','v2Ffa2',
 'v2Ffa3','v2Ffa4')
 sub(^.*?([0-9]+)$, \\1, x, perl=TRUE)
[1] 15 10 5  2  3  4



On Thu, Dec 3, 2009 at 9:00 AM, oscar linares wins...@gmail.com wrote:
 Dear Wiza[R]ds,

 I have a data.frame header that looks like this:

 v2FfaPre15    v2FfaPre10    v2FfaPre5    v2Ffa2    v2Ffa3    v2Ffa4

 I need it to look like this,

 15    10    5    2    3     4

 i.e., with v2FfaPre and  v2Ffa stripped off

 Any suggestions,

 Thanks in advance!

 --
 Oscar
 Oscar A. Linares, MD
 Translational Medicine Unit
 LaPlaisance Bay, Bolles Harbor
 Monroe, Michigan 48161

 Department of Medicine,
 University of Toledo College of Medicine
 Toledo, OH 43606-3390

 Department of Internal Medicine,
 The Detroit Medical Center (DMC)
 Harper University Hospital
 Wayne State University School of Medicine
 Detroit, Michigan 48201

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2009-12-03 Thread Henrique Dallazuanna
Try this:

gsub(.*[^0-9], , header)


On Thu, Dec 3, 2009 at 12:00 PM, oscar linares wins...@gmail.com wrote:
 Dear Wiza[R]ds,

 I have a data.frame header that looks like this:

 v2FfaPre15    v2FfaPre10    v2FfaPre5    v2Ffa2    v2Ffa3    v2Ffa4

 I need it to look like this,

 15    10    5    2    3     4

 i.e., with v2FfaPre and  v2Ffa stripped off

 Any suggestions,

 Thanks in advance!

 --
 Oscar
 Oscar A. Linares, MD
 Translational Medicine Unit
 LaPlaisance Bay, Bolles Harbor
 Monroe, Michigan 48161

 Department of Medicine,
 University of Toledo College of Medicine
 Toledo, OH 43606-3390

 Department of Internal Medicine,
 The Detroit Medical Center (DMC)
 Harper University Hospital
 Wayne State University School of Medicine
 Detroit, Michigan 48201

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation

2009-12-03 Thread Gabor Grothendieck
Try this where [0-9]+ matches one or more digits and $ matches the end of
string.  See http://gsubfn.googlecode.com for more.

library(gsubfn)
x - c(v2FfaPre15, v2FfaPre10, v2FfaPre5, v2Ffa2, v2Ffa3,
v2Ffa4)

strapply(x, [0-9]+$, c, simplify = TRUE)


# or if you want a numeric result:
strapply(x, [0-9]+$, as.numeric, simplify = TRUE)

On Thu, Dec 3, 2009 at 9:00 AM, oscar linares wins...@gmail.com wrote:

 Dear Wiza[R]ds,

 I have a data.frame header that looks like this:

 v2FfaPre15v2FfaPre10v2FfaPre5v2Ffa2v2Ffa3v2Ffa4

 I need it to look like this,

 1510523 4

 i.e., with v2FfaPre and  v2Ffa stripped off

 Any suggestions,

 Thanks in advance!

 --
 Oscar
 Oscar A. Linares, MD
 Translational Medicine Unit
 LaPlaisance Bay, Bolles Harbor
 Monroe, Michigan 48161

 Department of Medicine,
 University of Toledo College of Medicine
 Toledo, OH 43606-3390

 Department of Internal Medicine,
 The Detroit Medical Center (DMC)
 Harper University Hospital
 Wayne State University School of Medicine
 Detroit, Michigan 48201

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >