Re: [R] Creating a Data Frame from an XML
On Jan 22, 2013, at 3:11 PM, Adam Gabbert wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Hi, You are so close! You have a number of nodes with the name 'row'. The [[ function selects just one item from a list, and when there's a number that have that name it returns just the first. So you really want to use the [ function instead and then select by order index using [[ library(XML) s - c( data, row BRAND=\GMC\ NUM=\1\ YEAR=\1999\ VALUE=\1\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2000\ VALUE=\12000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2001\ VALUE=\12500\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2002\ VALUE=\13000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2003\ VALUE=\14000\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2004\ VALUE=\17000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2005\ VALUE=\15000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\1967\ VALUE=\PRICLESS\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2007\ VALUE=\17500\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2008\ VALUE=\22000\ /, /data) x - xmlRoot(xmlTreeParse(s, asText = TRUE, useInternalNodes = TRUE)) x[row][[1]] row BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ x[row][[2]] row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000/ Your rows are set up so the attributes have the values you want - use xmlAttrs to retrieve them. xmlAttrs(x[row][[2]]) BRAND NUMYEAR VALUE FORD 1 2000 12000 You can use lapply to iterate through each row and apply the xmlAttrs function. You'll end up with a list if character vectors. y - lapply(x[row], xmlAttrs) str(y) List of 10 $ row: Named chr [1:4] GMC 1 1999 1 ..- attr(*, names)= chr [1:4] BRAND NUM YEAR VALUE $ row: Named chr [1:4] FORD 1 2000 12000 ..- attr(*, names)= chr [1:4] BRAND NUM YEAR VALUE $ row: Named chr [1:4] GMC 1 2001 12500 ..- attr(*, names)= chr [1:4] BRAND NUM YEAR VALUE . . . Next make a character matrix using do.call and rbind ... m - do.call(rbind, y) str(m) chr [1:10, 1:4] GMC FORD GMC FORD GMC FORD GMC GMC FORD ... - attr(*, dimnames)=List of 2 ..$ : chr [1:10] row row row row ... ..$ : chr [1:4] BRAND NUM YEAR VALUE And then on to a data.frame... d - as.data.frame(m) str(d) 'data.frame': 10 obs. of 4 variables: $ BRAND: chr GMC FORD GMC FORD ... $ NUM : chr 1 1 1 1 ... $ YEAR : chr 1999 2000 2001 2002 ... $ VALUE: chr 1 12000 12500 13000 ... Cheers, Ben Thank you AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a Data Frame from an XML
On Tue, Jan 22, 2013 at 3:11 PM, Adam Gabbert adamjgabb...@gmail.com wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** This will get a data frame of character columns as.data.frame(t(xpathSApply(doc, //row, xmlAttrs)), stringsAsFactors = FALSE) BRAND NUM YEARVALUE 1GMC 1 19991 2 FORD 1 200012000 3GMC 1 200112500 4 FORD 1 200213000 5GMC 1 200314000 6 FORD 1 200417000 7GMC 1 200515000 8GMC 1 1967 PRICLESS 9 FORD 1 200717500 10 GMC 1 200822000 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a Data Frame from an XML
Hi, May be this also helps: s - c( data, row BRAND=\GMC\ NUM=\1\ YEAR=\1999\ VALUE=\1\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2000\ VALUE=\12000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2001\ VALUE=\12500\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2002\ VALUE=\13000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2003\ VALUE=\14000\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2004\ VALUE=\17000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2005\ VALUE=\15000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\1967\ VALUE=\PRICLESS\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2007\ VALUE=\17500\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2008\ VALUE=\22000\ /, /data) Lines1-gsub(^\\s+| \\s+$,,gsub([^0-9A-Z], ,s)) dat1-read.table(text=Lines1[Lines1!=],sep=,header=F,stringsAsFactors=F) dat1New-dat1[,seq(2,ncol(dat1),by=2)] colnames(dat1New)- unlist(unique(dat1[,seq(1,ncol(dat1),by=2)])) str(dat1New) #'data.frame': 10 obs. of 4 variables: # $ BRAND: chr GMC FORD GMC FORD ... # $ NUM : int 1 1 1 1 1 1 1 1 1 1 # $ YEAR : int 1999 2000 2001 2002 2003 2004 2005 1967 2007 2008 # $ VALUE: chr 1 12000 12500 13000 ... #or Lines2-gsub( .*,,gsub(^.*=\(.*)\\\s+.*=\(.*)\\\s+.*=\(.*)\\\s+.*=\(.*)\.*,\\1 \\2 \\3 \\4,s)) dat2-read.table(text=Lines2[Lines2!=Lines2!= ],sep=,header=FALSE,stringsAsFactors=FALSE) colnames(dat2)- unlist(unique(dat1[,seq(1,ncol(dat1),by=2)])) str(dat2) 'data.frame': 10 obs. of 4 variables: # $ BRAND: chr GMC FORD GMC FORD ... # $ NUM : int 1 1 1 1 1 1 1 1 1 1 # $ YEAR : int 1999 2000 2001 2002 2003 2004 2005 1967 2007 2008 # $ VALUE: chr 1 12000 12500 13000 ... head(dat2,3) # BRAND NUM YEAR VALUE #1 GMC 1 1999 1 #2 FORD 1 2000 12000 #3 GMC 1 2001 12500 A.K. - Original Message - From: Ben Tupper btup...@bigelow.org To: Adam Gabbert adamjgabb...@gmail.com Cc: r-help@r-project.org Sent: Tuesday, January 22, 2013 10:13 PM Subject: Re: [R] Creating a Data Frame from an XML On Jan 22, 2013, at 3:11 PM, Adam Gabbert wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRAND NUM YEAR VALUE GMC 1 1999 1 FORD 2 2000 12000 GMC 1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Hi, You are so close! You have a number of nodes with the name 'row'. The [[ function selects just one item from a list, and when there's a number that have that name it returns just the first. So you really want to use the [ function instead and then select by order index using [[ library(XML) s - c( data, row BRAND=\GMC\ NUM=\1\ YEAR=\1999\ VALUE=\1\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2000\ VALUE=\12000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2001\ VALUE=\12500\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2002\ VALUE=\13000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2003\ VALUE=\14000\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2004\ VALUE=\17000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2005\ VALUE=\15000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\1967\ VALUE=\PRICLESS\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2007\ VALUE=\17500\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2008\ VALUE=\22000\ /, /data) x - xmlRoot(xmlTreeParse(s, asText = TRUE, useInternalNodes = TRUE)) x[row][[1]] row BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ x[row][[2]] row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000/ Your rows are set up so the attributes have the values you want - use xmlAttrs to retrieve them. xmlAttrs(x[row][[2]]) BRAND NUM YEAR VALUE FORD 1 2000 12000 You can use lapply to iterate through each row and apply the xmlAttrs function. You'll end up with a list if character vectors. y - lapply(x[row], xmlAttrs) str(y) List of 10 $ row: Named chr [1:4] GMC 1 1999 1