Sounds like you guys are on the right track, this is purely FYI because
I haven't seen it posted, just responding to the line in the original
post that your data structure should fit in memory.
OK two more disclaimers "FWIW" and "maybe this is not relevant or
already covered" OK here goes...
Sorry need to clarify:
When you say:
/When the docs say //"If your application is launched through Spark
submit, then the application jar is automatically distributed to all
worker nodes,"//it is actually saying that your executors get their
jars from the driver. This is true
Wouldn't Amazon Elastic IP do this for you?
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
On 12/28/2015 10:58 PM, Divya Gehlot wrote:
Hi,
I have HDP2.3.2 cluster installed in Amazon EC2.
I want to update the IP adress of spark.driver.appUIAddress,which is
Hi,
What would be the best way to get percentiles from a Spark RDD? I can see
JavaDoubleRDD or MLlib's MultivariateStatisticalSummary
https://spark.apache.org/docs/latest/mllib-statistics.html provide the
mean() but not percentiles.
Thank you!
Horace
--
View this message in context: