Questions on HDFS with Spark

2017-04-18 Thread kant kodali
Hi All, I've been using spark standalone for a while and now its time for me to install HDFS. If a spark worker goes down then Spark master restarts the worker similarly if a datanode process goes down it looks like it is not the namenode job to restart the datanode and if so, 1) should I use

How to fix error "Failed to get records for..." after polling for 120000

2017-04-18 Thread Dmitry Goldenberg
Hi, I was wondering if folks have some ideas, recommendation for how to fix this error (full stack trace included below). We're on Kafka 0.10.0.0 and spark_streaming_2.11 v. 2.0.0. We've tried a few things as suggested in these sources: -

Spark 2.1.0 hanging while writing a table in HDFS in parquet format

2017-04-18 Thread gae123
I have put more details and stack traces here: http://stackoverflow.com/questions/43462638/emr-spark-2-1-0-process-get-stuck-at-at-org-apache-spark-unsafe-platform-copymem Any suggestions would be very much appreciated. -- View this message in context:

An Apache Spark metric sink for Kafka

2017-04-18 Thread Erik Erlandson
I wrote up a simple metric sink for Spark that publishes metrics to a Kafka broker. Each metric is published as a message (in json format), with the metric name as the message key. https://github.com/erikerlandson/spark-kafka-sink Build with "(x)sbt assembly" and make sure the resulting jar

CfP - VHPC at ISC extension - Papers due May 2

2017-04-18 Thread VHPC 17
CALL FOR PAPERS 12th Workshop on Virtualization in High­-Performance Cloud Computing (VHPC '17) held in conjunction with the International Supercomputing Conference - High Performance, June 18-22, 2017, Frankfurt, Germany.

In an executor, are the Python worker memory and the MemoryOverhead overlapping?

2017-04-18 Thread o_rayer
Hi there, I'm using PySpark on a Hadoop cluster and I could not find the info about the executor memory model with Python. I know that the Python memory (spark.python.worker.memory) does not overlap the JVM Heap (spark.executor.memory). However, does the Python Memory overlap the Executor

Running 100 GB at standalone node

2017-04-18 Thread Vivek Mishra
Hi, I am running application over spark v 1.6.2(in standalone mode) for over 100 GB of data . Given below are my configurations: Job configuration spark.driver.memory=5g spark.executor.memory=5g spark.cores.max=4 spark-env.sh export SPARK_WORKER_INSTANCES=3; export SPARK_WORKER_MEMORY=5g;