Hi,
Are there drag and drop GUI (code-free) for RDD functions available? i.e. a
GUI that generates code based on drag-n-drops?
http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
thanks for brainstorming
I'm going to cut branch-2.2 tomorrow morning.
On Thu, Apr 13, 2017 at 11:02 AM, Michael Armbrust
wrote:
> Yeah, I was delaying until 2.1.1 was out and some of the hive questions
> were resolved. I'll make progress on that by the end of the week. Lets
> aim for 2.2 branch cut next week.
>
> On
I think this is Java 8 v Java 7, if you look at the previous build you see
a lot of the same missing classes but tagged as "warning" rather than
"error". I think all in all it makes sense to stick to JDK7 to build the
legacy build which have been built with it previously.
If there is consensus on
Also q-tree is implemented in algebird, not hard to get it going in spark.
That is another probabilistic data structure that is useful for this.
On Apr 17, 2017 11:27, "Jason White" wrote:
> Have you looked at t-digests?
>
> Calculating percentiles (including medians) is something that is inhere
The DataFrame API includes an approximate quartile implementation. If you
ask for quantile 0.5, you will get approximate median.
On Sun, Apr 16, 2017 at 9:24 PM svjk24 wrote:
> Hello,
> Is there any interest in an efficient distributed computation of the
> median algorithm?
> A google search
Have you looked at t-digests?
Calculating percentiles (including medians) is something that is inherently
difficult/inefficient to do in a distributed system. T-digests provide a
useful probabilistic structure to allow you to compute any percentile with a
known (and tunable) margin of error.
http