Folks,
Does any body have production experience in running dockerized spark
application on DC/OS, and can the spark cluster run other than spark stand
alone mode ?..
What are the major differences between running spark with Mesos Cluster
manager Vs running Spark as dockerized container under DC/
Any help on this?
On Thu, Aug 31, 2017 at 10:30 AM, ayan guha wrote:
> Hi
>
> Want to understand a basic issue. Here is my code:
>
> def createStreamingContext(sparkCheckpointDir: String,batchDuration: Int
> ) = {
>
> val ssc = new StreamingContext(spark.sparkContext, Seconds(batchDuration))
>
>
Spark 2.1
My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic
allocation turned on. spark.executor.memory=6G, spark.executor.cores=6
First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing
left outer join.Next, I apply 7 different filter condition
I am getting the following in the logs:
Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated
due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0.
Does anyone have experience adding this sink to an EMR cluster?
Thanks,
Alex
I think something like state store can be used to keep the intermediate
data. For aggregations the engines keeps processing batches of data and
update the results in state store(or sth like this), and when a trigger
begins the engines just fetch the current result from state store and
output it to
Hello All
I am observing some strange results with aggregateByKey API which is
implemented with combineByKey. Not sure if this is by design or bug -
I created this toy example but same problem can be observed on large
datasets as well -
*case class ABC(key: Int, c1: Int, c2: Int)*
*case class AB
Hello,
We are running few Star Schema Benchmarks on parquet performance in Spark.
We have set up the below functions (shown at the bottom) for getting
runtimes. This is a simplified version of spark's benchmark API. The
benchmarks are called with 1 warmup and 10 runs. SSB link:
https://www.cs.umb
hey, did you manage to solve the problem?
I have exactly the same problem and I am not able to solve it.
Thank you!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@sp
This has indeed been caused by the network backend that dropped several
outgoing packets. I'm not sure why this wasn't "caught" by TCP.
We ended up with setting send_queue_size=256 recv_queue_size=512 for
ib_ipoib and krcvqs=4 fpr hfi1. We also updated our OmniPath switch
firmware to the current v