Hi Simon, Aha! I re-read your message and noticed this line:
lapply(J("A")$direct(), .jevalArray) which I had overlooked earlier. I wrote an example that is very similar to yours and see what you mean now regarding how we can do this directly. Many thanks, T groovyScript <- paste ( "def stringList = [] as java.util.List", "def numberList = [] as java.util.List", "for (def ctr in 0..99) { stringList << new String(\"TGIF $ctr\"); numberList << ctr; }", "def strings = stringList.toArray()", "def numbers = numberList.toArray()", "def result = [strings, numbers]", "return (Object[]) result", sep="\n") result <- Evaluate (groovyScript=groovyScript) temp <- lapply(result, .jevalArray) On Fri, Jan 15, 2016 at 1:58 PM, Simon Urbanek <simon.urba...@r-project.org> wrote: > >> On Jan 15, 2016, at 12:35 PM, Thomas Fuller >> <thomas.ful...@coherentlogic.com> wrote: >> >> Hi Simon, >> >> Thanks for your feedback. -- this is an observation that I wasn't >> considering when I wrote this mainly because I am, in fact, working >> with rather small data sets. BTW: There is code there, it's under the >> bitbucket link -- here's the direct link if you'd still like to look >> at it: >> >> https://bitbucket.org/CoherentLogic/jdataframe >> > > Ah, sorry, all links just send you back to the page, so I missed the little > filed that tells you how to check it out. > > >> Re "for practical purposes is doesn't seem like the most efficient >> solution" and "So the JSON route is very roughly ~13x slower than >> using Java directly." >> >> I've not benchmarked this and will take a closer look at what you have >> today -- in fact I may include these details on the JDataFrame page. >> The JDataFrame targets the use case where there's significant >> development being done in Java and data is exported into R and, >> additionally, the developer intends to keep the two separated as much >> as possible. I could work with Java directly, but then I potentially >> end up with quite a bit of Java code taking up space in R and I don't >> like this because if I need to refactor something I have to do it in >> two places. >> > > No, the code is the same - it makes no difference. The R code is only one > call to fetch what you need by calling your Java method. The nice thing is > that you in fact save some code since there is no reason to serialize since > you can simply access all Java objects directly without serialization. > > >> There's another use case for the JDataFrame as well and that's in an >> enterprise application (you may have alluded to this when you said >> "[i]f you need process separation..."). Consider a business where >> users are working with R and the application that produces the data is >> actually running in Tomcat. Shipping large amounts of data over the >> wire in this example would be a performance destroyer, but for small >> data sets it certainly would be helpful from a development perspective >> to expose JSON-based web services where the R script would be able to >> convert a result into a data frame gracefully. >> > > Yes, sure, that makes sense. Like I said, I would probably use some native > format in that case if I worried about performance. Some candidates that come > to my mind are ProtoBuf and QAP (serialization used by Rserve). If you have > arrays, you can always serialize them directly which may be most efficient, > but you'd probably have to write the wrapper for that yourself (annoyingly, > the default Java methods use big-endian format which is slower on most > machines). But then, you're right that for Tomcat applications the sizes are > small enough that using JSON has the benefit that you can inspect payload by > eye and/or other tools very easily. > > Cheers, > Simon > > >> >> On Fri, Jan 15, 2016 at 10:58 AM, Simon Urbanek >> <simon.urba...@r-project.org> wrote: >>> Tom, >>> >>> this may be good for embedding small data sets, but for practical purposes >>> is doesn't seem like the most efficient solution. >>> >>> Since you didn't provide any code, I built a test case using the build-in >>> Java JSON API to build a medium-sized dataset (1e6 rows) and read it in >>> just to get a ballpark (see >>> https://gist.github.com/s-u/4efb284e3c15c6a2db16 >>> >>> # generate: >>> time java -cp .:javax.json-api-1.0.jar:javax.json-1.0.4.jar A > 1e6 >>> >>> real 0m2.764s >>> user 0m20.356s >>> sys 0m0.962s >>> >>> # read: >>>> system.time(temp <- RJSONIO::fromJSON("1e6")) >>> user system elapsed >>> 3.484 0.279 3.834 >>>> str(temp) >>> List of 2 >>> $ V1: num [1:1000000] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 ... >>> $ V2: chr [1:1000000] "X0" "X1" "X2" "X3" ... >>> >>> For comparison using Java directly (includes both generation and reading >>> into R): >>> >>>> system.time(temp <- lapply(J("A")$direct(), .jevalArray)) >>> user system elapsed >>> 0.962 0.186 0.494 >>> >>> So the JSON route is very roughly ~13x slower than using Java directly. >>> Obviously, this will vary by data set type etc. since there is R overhead >>> involved as well: for example, if you have only numeric variables, the JSON >>> route is 30x slower on reading alone [50x total]. String variables slow >>> down everyone equally. Interestingly, the JSON encoding is using all 16 >>> cores, so the 2.7s real time add up to over 20s CPU time so on smaller >>> machines you may see more overhead. >>> >>> If you need process separation, it may be a different story - in principle >>> it is faster to use more native serialization than JSON since parsing is >>> the slowest part for big datasets. >>> >>> Cheers, >>> Simon >>> >>> >>>> On Jan 14, 2016, at 4:52 PM, Thomas Fuller >>>> <thomas.ful...@coherentlogic.com> wrote: >>>> >>>> Hi Folks, >>>> >>>> If you need to send data from Java to R you may consider using the >>>> JDataFrame API -- which is used to convert data into JSON which then >>>> can be converted into a data frame in R. >>>> >>>> Here's the project page: >>>> >>>> https://coherentlogic.com/middleware-development/jdataframe/ >>>> >>>> and here's a partial example which demonstrates what the API looks like: >>>> >>>> String result = new JDataFrameBuilder() >>>> .addColumn("Code", new Object[] {"WV", "VA", }) >>>> .addColumn("Description", new Object[] {"West Virginia", "Virginia"}) >>>> .toJson(); >>>> >>>> and in R script we would need to do this: >>>> >>>> temp <- RJSONIO::fromJSON(json) >>>> tempDF <- as.data.frame(temp) >>>> >>>> which yields a data frame that looks like this: >>>> >>>>> tempDF >>>> Description Code >>>> 1 West Virginia WV >>>> 2 Virginia VA >>>> >>>> It is my intention to deploy this project to Maven Central this week, >>>> time permitting. >>>> >>>> Questions and comments are welcomed. >>>> >>>> Tom >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel