Re: [R] find unique and summerize

2018-02-04 Thread Val
Thank you so much Rui!

On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas  wrote:

> Hello,
>
> Please always cc the list.
>
> As for the question, I believe the following does it.
>
> a <- strsplit(mydata$ID, "[[:alpha:]]+")
> b <- strsplit(mydata$ID, "[[:digit:]]+")
>
> a <- sapply(a, `[`, 1)
> c <- sapply(a, `[`, 2)
> b <- sapply(b, function(x) x[x != ""])
>
> c2 <- sprintf("%010d", as.integer(c))
>
> newID <- paste0(a, b, c2)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 2/4/2018 2:01 AM, Val wrote:
>
>> Thank you so much again for your help!
>>
>> I have one more question related to this.
>>
>> 1. How do I further split  this "358USA1540165 " into three parts.
>> a) 358
>> b) USA
>> c) 1540165
>>
>> I want to add leading zeros to the third part  like "0001540165"
>> and then combine   b and c  to get this USA1540165
>> so USA1540165  changed to USA1540165
>>
>> The other one is that the data set has several country codes and if I
>> want to limit my data set to only certain country codes , how do I do that.
>>
>> Thank you again
>>
>>
>>
>>
>> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas > > wrote:
>>
>> Hello,
>>
>> As for the first question, instead of writing a xlsx file, maybe it
>> is easier to write a csv file and then open it with Excel.
>>
>> tbl2 <- addmargins(tbl1)
>> write.csv(tbl2, "tt1.csv")
>>
>> As for the second question, the following does it.
>>
>> inx <- apply(tbl1, 1, function(x) all(x != 0))
>> tbl1b <- addmargins(tbl1[inx, ])
>> tbl1b
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> On 2/3/2018 4:42 PM, Val wrote:
>>
>> Thank you so much Rui.
>>
>> 1. How do I export this table to excel file?
>> I used this
>> tbl1 <- table(Country, IDNum)
>> tbl2=addmargins(tbl1)
>> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary",
>> row.names=FALSE)
>> The above did not give me that table.
>>
>>
>> 2. I want select those unique Ids that do have records in all
>> countries.
>>From the above data set, this ID  "FIN1540166"  should be
>> excluded from the summary table and the table looks like as follow
>>
>> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526
>> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065
>>
>> Thank you again
>>
>>
>> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas
>> mailto:ruipbarra...@sapo.pt>
>> >>
>> wrote:
>>
>>  Hello,
>>
>>  Thanks for the reproducible example.
>>  See if the following does what you want.
>>
>>  IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
>>  Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>>
>>  tbl1 <- table(Country, IDNum)
>>  addmargins(tbl1)
>>
>>  tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
>>  addmargins(tbl2)
>>
>>
>>  Hope this helps,
>>
>>  Rui Barradas
>>
>>
>>  On 2/3/2018 3:00 AM, Val wrote:
>>
>>  Hi all,
>>
>>  I have a data set  need to be summarized by unique ID
>> (count and
>>  sum of a
>>  variable)
>>  A unique individual ID (country name  Abbreviation
>>followed by
>>  an integer
>>  numbers)  may  have observation in several countries.
>> Then the ID was
>>  changed by adding the country code as a prefix  and
>>new ID was
>>  constructed
>>  or recorded like (country code, + the original unique
>> ID  Example
>>  original ID   "CAN1540164" , if this ID has an
>> observation in
>>  CANADA then
>>  the ID was changed to"1CAN1540164".   From this new
>> ID I
>>  want get out
>>  the country code  get the  original unique ID  and
>>  summarize
>>  the data by
>>  unique ID and country code
>>
>>  The data set look like
>>  mydata <- read.table(textConnection("GR ID iflag Y
>>  A 1CAN1540164 1 20
>>  A 1CAN1540164 1 12
>>  A 1CAN1540164 1 15
>>  A 44CAN1540164 1 30
>>  A 44CAN1540164 1 24
>>  A 44CAN1540164 1 25
>>  A 44CAN1540164 1 11
>>  A 33CAN1540164 1 12
>>  A 33CAN1540164 1 23
>>  A 33CAN1540164 1 65
>>  A 33CAN1540164 1 41
>>  A 358CAN1540164 1 28
>>  A 358CAN1540164 1 32
>>  A 358CAN1540164 1 41
>>  A 358CAN1540164 1 54
>>  A 358CAN1540164 1 29
>>  

Re: [R] find unique and summerize

2018-02-03 Thread Rui Barradas

Hello,

As for the first question, instead of writing a xlsx file, maybe it is 
easier to write a csv file and then open it with Excel.


tbl2 <- addmargins(tbl1)
write.csv(tbl2, "tt1.csv")

As for the second question, the following does it.

inx <- apply(tbl1, 1, function(x) all(x != 0))
tbl1b <- addmargins(tbl1[inx, ])
tbl1b


Hope this helps,

Rui Barradas

On 2/3/2018 4:42 PM, Val wrote:

Thank you so much Rui.

1. How do I export this table to excel file?
I used this
   tbl1 <- table(Country, IDNum)
   tbl2=addmargins(tbl1)
   write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE)
The above did not give me that table.


2. I want select those unique Ids that do have records in all countries.
  From the above data set, this ID  "FIN1540166"  should be excluded 
from the summary table and the table looks like as follow


IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 USA1540165 
290 757 321 171 1539 Sum 337 898 569 261 2065


Thank you again


On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas > wrote:


Hello,

Thanks for the reproducible example.
See if the following does what you want.

IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
Country <- sub("^\\d+(.*)", "\\1", mydata$ID)

tbl1 <- table(Country, IDNum)
addmargins(tbl1)

tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
addmargins(tbl2)


Hope this helps,

Rui Barradas


On 2/3/2018 3:00 AM, Val wrote:

Hi all,

I have a data set  need to be summarized by unique ID (count and
sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by
an integer
numbers)  may  have observation in several countries. Then the 
ID was

changed by adding the country code as a prefix  and  new ID was
constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in
CANADA then
the ID was changed to    "1CAN1540164".   From this new ID I
want get out
the country code  get the  original unique ID  and   summarize
the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)

  From the above data there are three unique IDs and  four
country codes (1,
44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
                            1   44   33   358     TOT
CAN1540164     3    4     4      6        17
USA1540165      2   1      3     1          7
FIN1540166       -     -       -      1         1
             TOT         5    5      7      8       25


Table 2  Sum of Y variable by unique ID and country. code

                            1       44       33      358      TOT
CAN1540164    47     90      141      248       526
USA1540165   290   171      757      321     1539
FIN1540166        -        -         -         225       225
              TOT      337     261      898    794     2290


How do I do it in R?

   The first step is to get the unique country codes unique ID
by splitting
the new ID

Thank you in advance

         [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, m

Re: [R] find unique and summerize

2018-02-03 Thread Val
Thank you so much Rui.

1. How do I export this table to excel file?
I used this
  tbl1 <- table(Country, IDNum)
  tbl2=addmargins(tbl1)
  write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE)
The above did not give me that table.


2. I want select those unique Ids that do have records in all countries.
 From the above data set, this ID  "FIN1540166"  should be excluded from
the summary table and the table looks like as follow

IDNum
Country 1   33  358   44  Sum
  CAN1540164   47  141  248   90  526
  USA1540165  290  757  321  171 1539
  Sum 337  898  569  261 2065

Thank you again


On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas  wrote:

> Hello,
>
> Thanks for the reproducible example.
> See if the following does what you want.
>
> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
> Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>
> tbl1 <- table(Country, IDNum)
> addmargins(tbl1)
>
> tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
> addmargins(tbl2)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 2/3/2018 3:00 AM, Val wrote:
>
>> Hi all,
>>
>> I have a data set  need to be summarized by unique ID (count and sum of a
>> variable)
>> A unique individual ID (country name  Abbreviation  followed by an integer
>> numbers)  may  have observation in several countries. Then the  ID was
>> changed by adding the country code as a prefix  and  new ID was
>> constructed
>> or recorded like (country code, + the original unique ID  Example
>> original ID   "CAN1540164" , if this ID has an observation in CANADA then
>> the ID was changed to"1CAN1540164".   From this new ID I want get out
>> the country code  get the  original unique ID  and   summarize the data by
>> unique ID and country code
>>
>> The data set look like
>> mydata <- read.table(textConnection("GR ID iflag Y
>> A 1CAN1540164 1 20
>> A 1CAN1540164 1 12
>> A 1CAN1540164 1 15
>> A 44CAN1540164 1 30
>> A 44CAN1540164 1 24
>> A 44CAN1540164 1 25
>> A 44CAN1540164 1 11
>> A 33CAN1540164 1 12
>> A 33CAN1540164 1 23
>> A 33CAN1540164 1 65
>> A 33CAN1540164 1 41
>> A 358CAN1540164 1 28
>> A 358CAN1540164 1 32
>> A 358CAN1540164 1 41
>> A 358CAN1540164 1 54
>> A 358CAN1540164 1 29
>> A 358CAN1540164 1 64
>> B 1USA1540165 1 125
>> B 1USA1540165 1 165
>> B 44USA1540165 1 171
>> B 33USA1540165 1 254
>> B 33USA1540165 1 241
>> B 33USA1540165 1 262
>> B 358USA1540165 1 321
>> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>>
>>  From the above data there are three unique IDs and  four country codes
>> (1,
>> 44, 33 and 358)
>>
>> I want the following two tables
>>
>> Table 1. count  the  unique ID by country code
>>1   44   33   358 TOT
>> CAN1540164 34 4  617
>> USA1540165  2   1  3 1  7
>> FIN1540166   - -   -  1 1
>> TOT 55  7  8   25
>>
>>
>> Table 2  Sum of Y variable by unique ID and country. code
>>
>>1   44   33  358  TOT
>> CAN154016447 90  141  248   526
>> USA1540165   290   171  757  321 1539
>> FIN1540166-- - 225   225
>>  TOT  337 261  898794 2290
>>
>>
>> How do I do it in R?
>>
>>   The first step is to get the unique country codes unique ID by splitting
>> the new ID
>>
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] find unique and summerize

2018-02-02 Thread Rui Barradas

Hello,

Thanks for the reproducible example.
See if the following does what you want.

IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
Country <- sub("^\\d+(.*)", "\\1", mydata$ID)

tbl1 <- table(Country, IDNum)
addmargins(tbl1)

tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
addmargins(tbl2)


Hope this helps,

Rui Barradas

On 2/3/2018 3:00 AM, Val wrote:

Hi all,

I have a data set  need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by an integer
numbers)  may  have observation in several countries. Then the  ID was
changed by adding the country code as a prefix  and  new ID was constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in CANADA then
the ID was changed to"1CAN1540164".   From this new ID I want get out
the country code  get the  original unique ID  and   summarize the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)

 From the above data there are three unique IDs and  four country codes (1,
44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
   1   44   33   358 TOT
CAN1540164 34 4  617
USA1540165  2   1  3 1  7
FIN1540166   - -   -  1 1
TOT 55  7  8   25


Table 2  Sum of Y variable by unique ID and country. code

   1   44   33  358  TOT
CAN154016447 90  141  248   526
USA1540165   290   171  757  321 1539
FIN1540166-- - 225   225
 TOT  337 261  898794 2290


How do I do it in R?

  The first step is to get the unique country codes unique ID by splitting
the new ID

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] find unique and summerize

2018-02-02 Thread Val
Hi all,

I have a data set  need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by an integer
numbers)  may  have observation in several countries. Then the  ID was
changed by adding the country code as a prefix  and  new ID was constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in CANADA then
the ID was changed to"1CAN1540164".   From this new ID I want get out
the country code  get the  original unique ID  and   summarize the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)

>From the above data there are three unique IDs and  four country codes (1,
44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
  1   44   33   358 TOT
CAN1540164 34 4  617
USA1540165  2   1  3 1  7
FIN1540166   - -   -  1 1
   TOT 55  7  8   25


Table 2  Sum of Y variable by unique ID and country. code

  1   44   33  358  TOT
CAN154016447 90  141  248   526
USA1540165   290   171  757  321 1539
FIN1540166-- - 225   225
TOT  337 261  898794 2290


How do I do it in R?

 The first step is to get the unique country codes unique ID by splitting
the new ID

Thank you in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.