Re: [R] turning comma separated string from multiple choices into

2016-10-11 Thread Bob Rudis
Take a look at tidyr::separate()

On Fri, Oct 7, 2016 at 12:57 PM, silvia giussani
 wrote:
> Hi all,
>
>
>
> could you please tell me if you find a solution to this problem (in
> Subject)?
>
>
>
> June Kim wrote:
>
>>* Hello,*
>
>>
>
>>* I use google docs' Forms to conduct surveys online. Multiple choices*
>
>>* questions are coded as comma separated values.*
>
>>
>
>>* For example,*
>
>>
>
>>* if the question is like:*
>
>>
>
>>* 1. What magazines do you currently subscribe to? (you can choose*
>
>>* multiple choices)*
>
>>* 1) Fast Company*
>
>>* 2) Havard Business Review*
>
>>* 3) Business Week*
>
>>* 4) The Economist*
>
>>
>
>>* And if the subject chose 1) and 3), the data is coded as a cell in a*
>
>>* spreadsheet as,*
>
>>
>
>>* "Fast Company, Business Week"*
>
>>
>
>>* I read the data with read.csv into R. To analyze the data, I have to*
>
>>* change that string into something like flags(indicator variables?).*
>
>>* That is, there should be 4 variables, of which values are either 1 or*
>
>>* 0, indicating chosen or not-chosen respectively.*
>
>>
>
>>* Suppose the data is something like,*
>
>>
>
>>
>
>>>* survey1*
>
>>>
>
>>*   agefavorite_magazine*
>
>>* 1  29 Fast Company*
>
>>* 2  31  Fast Company, Business Week*
>
>>* 3  32 Havard Business Review, Business Week, The Economist*
>
>>
>
>>
>
>>* Then I have to chop the string in favorite_magazine column to turn*
>
>>* that data into something like,*
>
>>
>
>>
>
>>>* survey1transformed*
>
>>>
>
>>*   age Fast Company Havard Business Review Business Week The Economist*
>
>>* 1  291  0 0 0*
>
>>* 2  311  0 1 0*
>
>>* 3  320  1 1 1*
>
>>
>
>>
>
>>* Actually I have many more multiple choice questions in the survey.*
>
>>
>
>>* What is the easy elegant and natural way in R to do the job?*
>
>>
>
>
>
> I'd look into something like as.data.frame(lapply(strings, grep,
>
> x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
>
> "Havard Business Review", ...).
>
>
>
> (I take it that the mechanism is such that you can rely on at least
>
> having everything misspelled in the same way? If it is alternatingly
>
> "Havard" and "Harvard", then things get a bit trickier.)
>
>
>
> Thank you and regards,
>
> Silvia Giussani
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] turning comma separated string from multiple choices into flags

2008-09-29 Thread Henrique Dallazuanna
Try this:

table(rep(x$age, unlist(lapply(strsplit(x$favorite_magazine, ","), length))),
unlist(strsplit(x$favorite_magazine, ",")))

On Mon, Sep 29, 2008 at 11:45 AM, June Kim <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I use google docs' Forms to conduct surveys online. Multiple choices
> questions are coded as comma separated values.
>
> For example,
>
> if the question is like:
>
> 1. What magazines do you currently subscribe to? (you can choose
> multiple choices)
> 1) Fast Company
> 2) Havard Business Review
> 3) Business Week
> 4) The Economist
>
> And if the subject chose 1) and 3), the data is coded as a cell in a
> spreadsheet as,
>
> "Fast Company, Business Week"
>
> I read the data with read.csv into R. To analyze the data, I have to
> change that string into something like flags(indicator variables?).
> That is, there should be 4 variables, of which values are either 1 or
> 0, indicating chosen or not-chosen respectively.
>
> Suppose the data is something like,
>
>> survey1
>  agefavorite_magazine
> 1  29 Fast Company
> 2  31  Fast Company, Business Week
> 3  32 Havard Business Review, Business Week, The Economist
>>
>
> Then I have to chop the string in favorite_magazine column to turn
> that data into something like,
>
>> survey1transformed
>  age Fast Company Havard Business Review Business Week The Economist
> 1  291  0 0 0
> 2  311  0 1 0
> 3  320  1 1 1
>>
>
> Actually I have many more multiple choice questions in the survey.
>
> What is the easy elegant and natural way in R to do the job?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] turning comma separated string from multiple choices into flags

2008-09-29 Thread Peter Dalgaard
June Kim wrote:
> Thank you. The misspelling of Harvard wasn't intended. The data are
> spelled consistently.
>   
OK. One other potential problem: If the strings are substrings of
eachother (as in "Science" and "Statistical Science") then you may need
more care.

And I misremembered: It is probably better to use regexpr() != -1
than grep() for this purpose because the latter returns indices
rather than a value for each element.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] turning comma separated string from multiple choices into flags

2008-09-29 Thread June Kim
Thank you. The misspelling of Harvard wasn't intended. The data are
spelled consistently.

2008/9/30 Peter Dalgaard <[EMAIL PROTECTED]>:
> June Kim wrote:
>> Hello,
>>
>> I use google docs' Forms to conduct surveys online. Multiple choices
>> questions are coded as comma separated values.
>>
>> For example,
>>
>> if the question is like:
>>
>> 1. What magazines do you currently subscribe to? (you can choose
>> multiple choices)
>> 1) Fast Company
>> 2) Havard Business Review
>> 3) Business Week
>> 4) The Economist
>>
>> And if the subject chose 1) and 3), the data is coded as a cell in a
>> spreadsheet as,
>>
>> "Fast Company, Business Week"
>>
>> I read the data with read.csv into R. To analyze the data, I have to
>> change that string into something like flags(indicator variables?).
>> That is, there should be 4 variables, of which values are either 1 or
>> 0, indicating chosen or not-chosen respectively.
>>
>> Suppose the data is something like,
>>
>>
>>> survey1
>>>
>>   agefavorite_magazine
>> 1  29 Fast Company
>> 2  31  Fast Company, Business Week
>> 3  32 Havard Business Review, Business Week, The Economist
>>
>>
>> Then I have to chop the string in favorite_magazine column to turn
>> that data into something like,
>>
>>
>>> survey1transformed
>>>
>>   age Fast Company Havard Business Review Business Week The Economist
>> 1  291  0 0 0
>> 2  311  0 1 0
>> 3  320  1 1 1
>>
>>
>> Actually I have many more multiple choice questions in the survey.
>>
>> What is the easy elegant and natural way in R to do the job?
>>
>
> I'd look into something like as.data.frame(lapply(strings, grep,
> x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
> "Havard Business Review", ...).
>
> (I take it that the mechanism is such that you can rely on at least
> having everything misspelled in the same way? If it is alternatingly
> "Havard" and "Harvard", then things get a bit trickier.)
>
> --
>   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
> ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907
>
>
>
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] turning comma separated string from multiple choices into flags

2008-09-29 Thread Peter Dalgaard
June Kim wrote:
> Hello,
>
> I use google docs' Forms to conduct surveys online. Multiple choices
> questions are coded as comma separated values.
>
> For example,
>
> if the question is like:
>
> 1. What magazines do you currently subscribe to? (you can choose
> multiple choices)
> 1) Fast Company
> 2) Havard Business Review
> 3) Business Week
> 4) The Economist
>
> And if the subject chose 1) and 3), the data is coded as a cell in a
> spreadsheet as,
>
> "Fast Company, Business Week"
>
> I read the data with read.csv into R. To analyze the data, I have to
> change that string into something like flags(indicator variables?).
> That is, there should be 4 variables, of which values are either 1 or
> 0, indicating chosen or not-chosen respectively.
>
> Suppose the data is something like,
>
>   
>> survey1
>> 
>   agefavorite_magazine
> 1  29 Fast Company
> 2  31  Fast Company, Business Week
> 3  32 Havard Business Review, Business Week, The Economist
>   
>
> Then I have to chop the string in favorite_magazine column to turn
> that data into something like,
>
>   
>> survey1transformed
>> 
>   age Fast Company Havard Business Review Business Week The Economist
> 1  291  0 0 0
> 2  311  0 1 0
> 3  320  1 1 1
>   
>
> Actually I have many more multiple choice questions in the survey.
>
> What is the easy elegant and natural way in R to do the job?
>   

I'd look into something like as.data.frame(lapply(strings, grep,
x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
"Havard Business Review", ...).

(I take it that the mechanism is such that you can rely on at least
having everything misspelled in the same way? If it is alternatingly
"Havard" and "Harvard", then things get a bit trickier.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.