Re: [R] Parsing JSON records to a dataframe

2011-01-07 Thread Martin Morgan
On 01/07/2011 12:05 AM, Dieter Menne wrote:
> 
> 
> Jeroen Ooms wrote:
>>
>> What is the most efficient method of parsing a dataframe-like structure
>> that has been json encoded in record-based format rather than vector
>> based. For example a structure like this:
>>
>> [ {"name":"joe", "gender":"male", "age":41}, {"name":"anna",
>> "gender":"female", "age":23} ]
>>
>> RJSONIO parses this as a list of lists, which I would then have to apply
>> as.data.frame to and append them to an existing dataframe, which is
>> terribly slow. 
>>
>>
> 
> unlist is pretty fast. The solution below assumes that you know how your
> structure is, so it is not very flexible, but it should show you that the
> conversion to data.frame is not the bottleneck.
> 
> # json
> library(RJSONIO)
> # [ {"name":"joe", "gender":"male", "age":41},
> #  {"name":"anna", "gender":"female", "age":23} ]
> n = 30
> d = data.frame(name=rep(c("joe","anna"),n),
>gender=rep(c("male","female"),n),
>age = rep(c("23","41"),n))
> dj = toJSON(d)

This doesn't create the required structure

> cat(dj)
{
 "name": [ "joe", "anna", "joe", "anna" ],
   "gender": [ "male", "female", "male", "female" ],
   "age": [ "23", "41", "23", "41" ]
}

instead

library(rjson)
n <- 1000
name <- apply(matrix(sample(letters, n * 5, TRUE), n),
  1, paste, collapse="")
gender <- sample(c("male", "female"), n, TRUE)
age <- ceiling(runif(n, 20, 60))
recs <- sprintf('{"name": "%s", "gender":"%s", "age":%d}',
name, gender, age)
j <- sprintf("[%s]", paste(recs, collapse=","))
lol <- fromJSON(j)

and then with

f <- function(lst)
function(nm) unlist(lapply(lst, "[[", nm), use.names=FALSE)

> oopt <- options(stringsAsFactors=FALSE) # convenience for 'identical'
> system.time({
+ df0 <- as.data.frame(Map(f(lol), names(lol[[1]])))
+ })
   user  system elapsed
  0.006   0.000   0.006

versus for instance

> system.time({
+ df1 <- do.call(rbind, lapply(lol, data.frame))
+ })
   user  system elapsed
  1.497   0.000   1.500
> identical(df0, df1)
[1] TRUE

Martin


> 
> system.time(d1 <- fromJSON(dj))
> #  user  system elapsed
> #   4.060.264.32
> 
> system.time(
>   dd <- data.frame(
> name = unlist(d1$name),
> gender = unlist(d1$gender),
> age=as.numeric(unlist(d1$age)))
> )
> #   user  system elapsed
> #   1.130.051.18
> 
> 
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing JSON records to a dataframe

2011-01-07 Thread Dieter Menne


Jeroen Ooms wrote:
> 
> What is the most efficient method of parsing a dataframe-like structure
> that has been json encoded in record-based format rather than vector
> based. For example a structure like this:
> 
> [ {"name":"joe", "gender":"male", "age":41}, {"name":"anna",
> "gender":"female", "age":23} ]
> 
> RJSONIO parses this as a list of lists, which I would then have to apply
> as.data.frame to and append them to an existing dataframe, which is
> terribly slow. 
> 
> 

unlist is pretty fast. The solution below assumes that you know how your
structure is, so it is not very flexible, but it should show you that the
conversion to data.frame is not the bottleneck.

# json
library(RJSONIO)
# [ {"name":"joe", "gender":"male", "age":41},
#  {"name":"anna", "gender":"female", "age":23} ]
n = 30
d = data.frame(name=rep(c("joe","anna"),n),
   gender=rep(c("male","female"),n),
   age = rep(c("23","41"),n))
dj = toJSON(d)

system.time(d1 <- fromJSON(dj))
#  user  system elapsed
#   4.060.264.32

system.time(
  dd <- data.frame(
name = unlist(d1$name),
gender = unlist(d1$gender),
age=as.numeric(unlist(d1$age)))
)
#   user  system elapsed
#   1.130.051.18




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Parsing-JSON-records-to-a-dataframe-tp3178646p3178753.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parsing JSON records to a dataframe

2011-01-06 Thread Jeroen Ooms

What is the most efficient method of parsing a dataframe-like structure that
has been json encoded in record-based format rather than vector based. For
example a structure like this:

[ {"name":"joe", "gender":"male", "age":41}, {"name":"anna",
"gender":"female", "age":23} ]

RJSONIO parses this as a list of lists, which I would then have to apply
as.data.frame to and append them to an existing dataframe, which is terribly
slow. 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Parsing-JSON-records-to-a-dataframe-tp3178646p3178646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.