[R] I'm trying to parse 1 column of a dataframe into 3 seperate columns

2013-01-14 Thread Joel Pulliam
I have a factor called 'utm_medium' in the dataframe 'data'

 str(data$utm_medium)

Factor w/ 396925 levels
,affiliateID=sessionID=821850667323ec6ae6cffd28f380etag=,..:
366183 355880 357141 20908 357513 365348 368088 360827 31704 364767 ...

 

The data in this factor is delimited with ''. I basically want the
affiliateID, sessionID and etag data separated. Ex.

 data$utm_medium[1:10]

[1]
affiliateID=4f3ac4b695e7dsessionID=993f4c447e68dfc36ed692223349f2e3eta
g=

[2]
affiliateID=4f3ac4b695e7dsessionID=209dd9986ace55d50a450afeba62b78feta
g=

[3]
affiliateID=4f3ac4b695e7dsessionID=2efdb8e1e1f5ac9c0d5baec355c78f85eta
g=

[4] affiliateID=sessionID=5a6ca9d41148f30ce694628427af7991etag=


 [5]
affiliateID=4f3ac4b695e7dsessionID=331fbcdf1f3d5e7bac0d92c12e19f63deta
g=

[6]
affiliateID=4f3ac4b695e7dsessionID=8fc27c8478e9bd30043ea4d3c7ddb29ceta
g=

[7]
affiliateID=4f3ac4b695e7dsessionID=af467d480addffca43ffbdbce1edfdb4eta
g=

[8]
affiliateID=4f3ac4b695e7dsessionID=598645e05a187ee63ff922a36360f021eta
g=

[9] affiliateID=sessionID=8895e21d0842ed45063ba8328dc3bc61etag=


[10]
affiliateID=4f3ac4b695e7dsessionID=88ca2998c5a91b6efbece0c4f79caeb7eta
g=

396925 Levels:  ...
affiliateID=50bfbbbeed918sessionID=5c49c142cbf1b149c6a4647d1a4fc97beta
g=

 

I've parsed it via:

test -as.character(data$utm_medium)

test - strsplit(test, )

 

which results in a list, which I 'unlisted':

test2 - unlist(test)

 

and then attempted to extract into separate vectors:

a - vector(mode = character, length = length(test2))

s - vector(mode = character, length = length(test2))

e - vector(mode = character, length = length(test2))

i - 1

j - 1

 

  for (i in 1:length(test2))

  {

a[j] - test2[i]

s[j] - test2[i+1]

e[j] - test2[i+2]

i - i + 3

j - j + 1

  }

 

This code runs, but I'm indexing it incorrectly and I can't figure out
why. I'll sleep on it tonight and probably figure it out, but I can't
help thinking that there's a much easier way to parse this data. Help!
Please!

 

joel

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I'm trying to parse 1 column of a dataframe into 3 seperate columns

2013-01-14 Thread David L Carlson
How about

a - sapply(test, function(x) x[1])
s - sapply(test, function(x) x[2])
e - sapply(test, function(x) x[3])

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Joel Pulliam
 Sent: Monday, January 14, 2013 4:30 PM
 To: r-help@r-project.org
 Cc: pullia...@gmail.com
 Subject: [R] I'm trying to parse 1 column of a dataframe into 3
 seperate columns
 
 I have a factor called 'utm_medium' in the dataframe 'data'
 
  str(data$utm_medium)
 
 Factor w/ 396925 levels
 ,affiliateID=sessionID=821850667323ec6ae6cffd28f380etag=,..:
 366183 355880 357141 20908 357513 365348 368088 360827 31704 364767 ...
 
 
 
 The data in this factor is delimited with ''. I basically want the
 affiliateID, sessionID and etag data separated. Ex.
 
  data$utm_medium[1:10]
 
 [1]
 affiliateID=4f3ac4b695e7dsessionID=993f4c447e68dfc36ed692223349f2e3et
 a
 g=
 
 [2]
 affiliateID=4f3ac4b695e7dsessionID=209dd9986ace55d50a450afeba62b78fet
 a
 g=
 
 [3]
 affiliateID=4f3ac4b695e7dsessionID=2efdb8e1e1f5ac9c0d5baec355c78f85et
 a
 g=
 
 [4] affiliateID=sessionID=5a6ca9d41148f30ce694628427af7991etag=
 
 
  [5]
 affiliateID=4f3ac4b695e7dsessionID=331fbcdf1f3d5e7bac0d92c12e19f63det
 a
 g=
 
 [6]
 affiliateID=4f3ac4b695e7dsessionID=8fc27c8478e9bd30043ea4d3c7ddb29cet
 a
 g=
 
 [7]
 affiliateID=4f3ac4b695e7dsessionID=af467d480addffca43ffbdbce1edfdb4et
 a
 g=
 
 [8]
 affiliateID=4f3ac4b695e7dsessionID=598645e05a187ee63ff922a36360f021et
 a
 g=
 
 [9] affiliateID=sessionID=8895e21d0842ed45063ba8328dc3bc61etag=
 
 
 [10]
 affiliateID=4f3ac4b695e7dsessionID=88ca2998c5a91b6efbece0c4f79caeb7et
 a
 g=
 
 396925 Levels:  ...
 affiliateID=50bfbbbeed918sessionID=5c49c142cbf1b149c6a4647d1a4fc97bet
 a
 g=
 
 
 
 I've parsed it via:
 
 test -as.character(data$utm_medium)
 
 test - strsplit(test, )
 
 
 
 which results in a list, which I 'unlisted':
 
 test2 - unlist(test)
 
 
 
 and then attempted to extract into separate vectors:
 
 a - vector(mode = character, length = length(test2))
 
 s - vector(mode = character, length = length(test2))
 
 e - vector(mode = character, length = length(test2))
 
 i - 1
 
 j - 1
 
 
 
   for (i in 1:length(test2))
 
   {
 
 a[j] - test2[i]
 
 s[j] - test2[i+1]
 
 e[j] - test2[i+2]
 
 i - i + 3
 
 j - j + 1
 
   }
 
 
 
 This code runs, but I'm indexing it incorrectly and I can't figure out
 why. I'll sleep on it tonight and probably figure it out, but I can't
 help thinking that there's a much easier way to parse this data. Help!
 Please!
 
 
 
 joel
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.