Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread vonnagy
Nitin, I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to jobs, one that uses a Kafka stream and one that uses just the KafkaRDD. With the KafkaRDD, I continually get the "Failed to get records". I have adjusted the polling with `spark.streaming.kafka.consumer.poll.ms`

Memory leak warnings in Spark 2.0.1

2016-10-12 Thread vonnagy
I am getting excessive memory leak warnings when running multiple mapping and aggregations and using DataSets. Is there anything I should be looking for to resolve this or is this a known issue? WARN [Executor task launch worker-0] org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory

Submit job with driver options in Mesos Cluster mode

2016-10-06 Thread vonnagy
I am trying to submit a job to spark running in a Mesos cluster. We need to pass custom java options to the driver and executor for configuration, but the driver task never includes the options. Here is an example submit. GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc

Committing Kafka offsets when using DirectKafkaInputDStream

2016-09-02 Thread vonnagy
I have upgrading to Spark 2.0 and am experimenting with using Kafka 0.10.0. I have a stream that I extract the data and would like to update the Kafka offsets as each partition is handled. With Spark 1.6 or Spark 2.0 and Kafka 0.8.2 I was able to update the offsets, but now there seems no way to

Partitioned parquet files missing partition columns from data

2016-01-15 Thread vonnagy
When writing a DataFrame into partitioned parquet files, the partition columns are removed from the data. For example: df.write.mode(SaveMode.Append).partitionBy('year','month','day', 'hour').parquet(somePath) This creates a directory structure like: events |-> 2016 |-> 1 |-> 15

Re: Calling stop on StreamingContext locks up

2015-11-08 Thread vonnagy
Hi Ted, Your fix addresses the issue for me. Thanks again for your help and I saw the PR you submitted to Master. Ivan -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Calling-stop-on-StreamingContext-locks-up-tp15063p15073.html Sent from the Apache

Calling stop on StreamingContext locks up

2015-11-07 Thread vonnagy
If I have a streaming job (Spark 1.5.1) and attempt to stop the stream after the first batch, the system locks up and never completes. The pseudo code below shows that after the batch complete notification is called the stream is stopped. I have traced the lockup to the call `listener.stop()`in