I've almost completed a library for speeding up current spark models serving
- https://github.com/Hydrospheredata/fastserving. It depends on spark, but
it provides a way to turn spark logical plan from dataframe sample, that was
passed into pipeline/transformer, into an alternative transformer
#1 - Yes. It doesn't look like that is being honored. This is
something we should follow up with CRAN about
#2 - Looking at it more closely, I'm not sure what the problem is. If
the version string is 1.8.0_144 then our parsing code does work
correctly. We might need to add more debug logging or
Hey all,
There are some fixes that went into 2.1.3 recently that probably
deserve a release. So as usual, please take a look if there's anything
else you'd like on that release, otherwise I'd like to start with the
process by early next week.
I'll go through jira to see what's the status of
Jakub,
You're right that Spark currently doesn't use the vectorized read path for
nested data, but I'm not sure that's the problem here. With 50k elements in
the f1 array, it could easily be that you're getting the significant
speed-up from not reading or materializing that column. The
Corresponding to the Spark 2.3.1 release, I submitted the SparkR build
to CRAN yesterday. Unfortunately it looks like there are a couple of
issues (full message from CRAN is forwarded below)
1. There are some builds started with Java 10
For #1 is system requirements not honored?
For #2 it looks like Oracle JDK?
From: Shivaram Venkataraman
Sent: Tuesday, June 12, 2018 3:17:52 PM
To: dev
Cc: Felix Cheung
Subject: Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1
Corresponding to the