u need to implement three functions: createCombiner,
> mergeValue, mergeCombiners.
>
> Hope this helps!
> Liquan
>
> On Sun, Sep 28, 2014 at 11:59 PM, David Rowe wrote:
>
>> Hi All,
>>
>> After some hair pulling, I've reached the realisation that an operation I
Hi All,
After some hair pulling, I've reached the realisation that an operation I
am currently doing via:
myRDD.groupByKey.mapValues(func)
should be done more efficiently using aggregateByKey or combineByKey. Both
of these methods would do, and they seem very similar to me in terms of
their func
Hi Andrew,
I can't speak for Theodore, but I would find that incredibly useful.
Dave
On Wed, Sep 24, 2014 at 11:24 AM, Andrew Ash wrote:
> Hi Theodore,
>
> What do you mean by module diagram? A high level architecture diagram of
> how the classes are organized into packages?
>
> Andrew
>
> On
y be different from the previous code,
> I guess probably some potential bugs may introduced.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* David Rowe [mailto:davidr...@gmail.com]
> *Sent:* Monday, September 22, 2014 7:12 PM
> *To:* Andrew Ash
> *Cc:* Shao, Saisai;
;m seeing the same using Spark SQL on 1.1.0 -- I think there may have
> been a regression in 1.1 because the same SQL query works on the same
> cluster when back on 1.0.2
>
> Thanks!
> Andrew
>
> On Sun, Sep 21, 2014 at 5:15 AM, David Rowe wrote:
>
>> Hi,
>>
&g
Hi,
I've seen this problem before, and I'm not convinced it's GC.
When spark shuffles it writes a lot of small files to store the data to be
sent to other executors (AFAICT). According to what I've read around the
place the intention is that these files be stored in disk buffers, and
since sync()
Oh I see, I think you're trying to do something like (in SQL):
SELECT order, mean(price) FROM orders GROUP BY order
In this case, I'm not aware of a way to use the DoubleRDDFunctions, since
you have a single RDD of pairs where each pair is of type (KeyType,
Iterable[Double]).
It seems to me that
I generally call values.stats, e.g.:
val stats = myPairRdd.values.stats
On Fri, Sep 12, 2014 at 4:46 PM, rzykov wrote:
> Is it possible to use DoubleRDDFunctions
> <
> https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/rdd/DoubleRDDFunctions.html
> >
> for calculating mean and std d