Hi everyone,
I want to run my spark application with javaagent, specifically I want to
use newrelic with my application.
When I run spark-submit I must pass --conf
"spark.driver.extraJavaOptions=-javaagent="
My problem is that I can't specify the full path as I run in cluster mode
and I don't kn
Hello everyone,
Spark has a dynamic resource allocation scheme, where, when available Spark
manager will automatically add executors to the application resource.
Spark's default configuration is for executors to allocate the entire
worker node they are running on, but this is configurable, my que
Hello everyone,
I have been looking into Spark pools and have two questions I would really
like to get answers about.
1. Are pools available when Yarn is used as resource manager?
2. Do pools define static partitioning of the cluster? I mean, if I define
two pools (using xml file) with equal weig
you can always use array+explode, I don't know if its the most
elegant/optimal solution (would be happy to hear from the experts)
code example:
//create data
Dataset test= spark.createDataFrame(Arrays.asList(new
InternalData("bob", "b1", 1,2,3),
new InternalData("alive", "c1", 3,4,6),
Hi everyone,
have a cluster managed with Yarn and runs Spark jobs, the components were
installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6 cores. I
use Fair scheduler
I want Yarn to automatically add/remove executor cores, but no matter what
I do it doesn't work
Relevant Spark confi
Hi everyone,
have a cluster managed with Yarn and runs Spark jobs, the components were
installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6 cores. I
use Fair scheduler
I want Yarn to automatically add/remove executor cores, but no matter what
I do it doesn't work
Relevant Spark confi
I try to use spark sql built in window function:
https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/functions.html#window(org.apache.spark.sql.Column,%20java.lang.String)
I run it with step=1 seconds and window = 3 minutes (ratio of 180) and it
runs extremely slow compared to other
I write a sliding window analytic program and use the functions.window
function (
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html#window(org.apache.spark.sql.Column,%20java.lang.String,%20java.lang.String)
)
The code looks like this:
Column slidingWindow = function
Hi everyone,
I am trying to improve the performance of data loading from disk. For that
I have implemented my own RDD and now I am trying to increase the
performance with predicate pushdown.
I have used many sources including the documentations and
https://www.slideshare.net/databricks/yin-huai