date:20170831

running dockerized spark applications in DC/OS

2017-08-31 Thread sujeet jog

Folks, Does any body have production experience in running dockerized spark application on DC/OS, and can the spark cluster run other than spark stand alone mode ?.. What are the major differences between running spark with Mesos Cluster manager Vs running Spark as dockerized container under DC/

Re: Checkpointing With NOT Serializable

2017-08-31 Thread ayan guha

Any help on this? On Thu, Aug 31, 2017 at 10:30 AM, ayan guha wrote: > Hi > > Want to understand a basic issue. Here is my code: > > def createStreamingContext(sparkCheckpointDir: String,batchDuration: Int > ) = { > > val ssc = new StreamingContext(spark.sparkContext, Seconds(batchDuration)) > >

[SPARK-SQL] Spark Persist slower than non-persist calls

2017-08-31 Thread saurabh raval

Spark 2.1 My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB. Dynamic allocation turned on. spark.executor.memory=6G, spark.executor.cores=6 First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and doing left outer join.Next, I apply 7 different filter condition

Cloudwatch metrics sink problem

2017-08-31 Thread Mikhailau, Alex

I am getting the following in the logs: Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0. Does anyone have experience adding this sink to an EMR cluster? Thanks, Alex

Re: [Structured Streaming]Data processing and output trigger should be decoupled

2017-08-31 Thread 张万新

I think something like state store can be used to keep the intermediate data. For aggregations the engines keeps processing batches of data and update the results in state store(or sth like this), and when a trigger begins the engines just fetch the current result from state store and output it to

Inconsistent results with combineByKey API

2017-08-31 Thread Swapnil Shinde

Hello All I am observing some strange results with aggregateByKey API which is implemented with combineByKey. Not sure if this is by design or bug - I created this toy example but same problem can be observed on large datasets as well - *case class ABC(key: Int, c1: Int, c2: Int)* *case class AB

Delay in Initial Execution Time in Spark for Benchmarking Parquet

2017-08-31 Thread austinldv

Hello, We are running few Star Schema Benchmarks on parquet performance in Spark. We have set up the below functions (shown at the bottom) for getting runtimes. This is a simplified version of spark's benchmark API. The benchmarks are called with 1 warmup and 10 runs. SSB link: https://www.cs.umb

Re: Issue with Spark Twitter Streaming

2017-08-31 Thread grumi

hey, did you manage to solve the problem? I have exactly the same problem and I am not able to solve it. Thank you! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@sp

Re: Failing jobs with Spark 2.2 running on Yarn with HDFS

2017-08-31 Thread Jan-Hendrik Zab

This has indeed been caused by the network backend that dropped several outgoing packets. I'm not sure why this wasn't "caught" by TCP. We ended up with setting send_queue_size=256 recv_queue_size=512 for ib_ipoib and krcvqs=4 fpr hfi1. We also updated our OmniPath switch firmware to the current v

running dockerized spark applications in DC/OS

Re: Checkpointing With NOT Serializable

[SPARK-SQL] Spark Persist slower than non-persist calls

Cloudwatch metrics sink problem

Re: [Structured Streaming]Data processing and output trigger should be decoupled

Inconsistent results with combineByKey API

Delay in Initial Execution Time in Spark for Benchmarking Parquet

Re: Issue with Spark Twitter Streaming

Re: Failing jobs with Spark 2.2 running on Yarn with HDFS

9 matches

Site Navigation

Mail list logo

Footer information