Did you enable "spark.speculation"?
On Tue, Jan 5, 2016 at 9:14 AM, Prasad Ravilla wrote:
> I am using Spark 1.5.2.
>
> I am not using Dynamic allocation.
>
> Thanks,
> Prasad.
>
>
>
>
> On 1/5/16, 3:24 AM, "Ted Yu" wrote:
>
> >Which version of Spark do
Hey Rachana,
There are two jobs in your codes actually: `rdd.isEmpty` and
`rdd.saveAsTextFile`. Since you don't cache or checkpoint this rdd, it will
execute your map function twice for each record.
You can move "accum.add(1)" to "rdd.saveAsTextFile" like this:
JavaDStream lines =
You can use "_", e.g.,
sparkConf.registerKryoClasses(Array(classOf[scala.Tuple3[_, _, _]]))
Best Regards,
Shixiong(Ryan) Zhu
Software Engineer
Databricks Inc.
shixi...@databricks.com
databricks.com
<http://databricks.com/>
On Wed, Dec 30, 2015 at 10:16 AM, Russ <russ.br..
You can use "reduceByKeyAndWindow", e.g.,
val lines = ssc.socketTextStream("localhost", )
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow((x: Int,
y: Int) => x + y, Seconds(60), Seconds(60))
wordCounts.print()
On Wed, Dec
Could you disable `spark.kryo.registrationRequired`? Some classes may not
be registered but they work well with Kryo's default serializer.
On Fri, Jan 8, 2016 at 8:58 AM, Ted Yu wrote:
> bq. try adding scala.collection.mutable.WrappedArray
>
> But the hint said registering
Could you use "coalesce" to reduce the number of partitions?
Shixiong Zhu
On Mon, Jan 11, 2016 at 12:21 AM, Gavin Yue wrote:
> Here is more info.
>
> The job stuck at:
> INFO cluster.YarnScheduler: Adding task set 1.0 with 79212 tasks
>
> Then got the error:
> Caused
Hey Terry,
That's expected. If you want to only output (1, 3), you can use
"reduceByKey" before "mapWithState" like this:
dstream.reduceByKey(_ + _).mapWithState(spec)
On Fri, Jan 15, 2016 at 1:21 AM, Terry Hoo wrote:
> Hi,
> I am doing a simple test with mapWithState,
gt; allKafkaWindowData =
> this.sparkReceiverDStream.createReceiverDStream(jsc,this.streamingConf.getWindowDuration(),
>
> this.streamingConf.getSlideDuration());
>
>
>
> this.businessProcess(allKafkaWindowData);
>
> this.sleep();
>
>jsc.start();
&
Hey Marco,
Since the codes in Future is in an asynchronous way, you cannot call
"sparkContext.stop" at the end of "fetch" because the codes in Future may
not finish.
However, the exception seems weird. Do you have a simple reproducer?
On Mon, Jan 18, 2016 at 9:13 AM, Ted Yu
mapWithState uses HashPartitioner by default. You can use
"StateSpec.partitioner" to set your custom partitioner.
On Sun, Jan 17, 2016 at 11:00 AM, Lin Zhao wrote:
> When the state is passed to the task that handles a mapWithState for a
> particular key, if the key is
Hey, did you mean that the scheduling delay timeline is incorrect because
it's too short and some values are missing? A batch won't have a scheduling
delay until it starts to run. In your example, a lot of batches are waiting
so that they don't have the scheduling delay.
On Sun, Jan 17, 2016 at
Could you change MEMORY_ONLY_SER to MEMORY_AND_DISK_SER_2 and see if this
still happens? It may be because you don't have enough memory to cache the
events.
On Thu, Jan 14, 2016 at 4:06 PM, Lin Zhao wrote:
> Hi,
>
> I'm testing spark streaming with actor receiver. The actor
Yeah, it's hard code as "0.0.0.0". Could you send a PR to add a
configuration for it?
On Thu, Jan 14, 2016 at 2:51 PM, Zee Chen wrote:
> Hi, what is the easiest way to configure the Spark webui to bind to
> localhost or 127.0.0.1? I intend to use this with ssh socks proxy to
>
:31:31 INFO storage.BlockManager: Removing RDD 44
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 43
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 42
> 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 41
> 16/01/15 00:31:31 INFO storage.BlockMana
Could you try to use "Kryo.setDefaultSerializer" like this:
class YourKryoRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.setDefaultSerializer(classOf[com.esotericsoftware.kryo.serializers.JavaSerializer])
}
}
On Thu, Jan 14, 2016 at 12:54 PM, Durgesh
Could you show your codes? Did you use `StreamingContext.awaitTermination`?
If so, it will return if any exception happens.
On Wed, Jan 13, 2016 at 11:47 PM, Triones,Deng(vip.com) <
triones.d...@vipshop.com> wrote:
> What’s more, I am running a 7*24 hours job , so I won’t call System.exit()
> by
oint(RDD.scala:300)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>
> at org.a
You can't. The number of cores must be great than the number of receivers.
On Wed, Feb 10, 2016 at 2:34 AM, ajay garg wrote:
> Hi All,
> I am running 3 executors in my spark streaming application with 3
> cores per executors. I have written my custom receiver
tion.Iterator$$anon$
> org.apache.spark.InterruptibleIterator#
> scala.collection.IndexedSeqLike$Elements#
> scala.collection.mutable.ArrayOps$ofRef#
> java.lang.Object[]#
>
>
>
>
> On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrot
Hey Udo,
mapWithState usually uses much more memory than updateStateByKey since it
caches the states in memory.
However, from your description, looks BlockGenerator cannot push data into
BlockManager, there may be something wrong in BlockGenerator. Could you
share the top 50 objects in the heap
Could you do a thread dump in the executor that runs the Kinesis receiver
and post it? It would be great if you can provide the executor log as well?
On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio wrote:
> Hello,
>
> can anybody kindly help me out a little bit
Are you using a custom input dstream? If so, you can make the `compute`
method return None to skip a batch.
On Thu, Feb 11, 2016 at 1:03 PM, Sebastian Piu
wrote:
> I was wondering if there is there any way to skip batches with zero events
> when streaming?
> By skip I
is behaviour
>
> Thanks!
> On 11 Feb 2016 9:07 p.m., "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
> wrote:
>
>> Are you using a custom input dstream? If so, you can make the `compute`
>> method return None to skip a batch.
>>
>> On Thu, Feb 11
Thanks for reporting it. I will take a look.
On Thu, Feb 4, 2016 at 6:56 AM, Yuval.Itzchakov wrote:
> Hi,
> I've been playing with the expiramental PairDStreamFunctions.mapWithState
> feature and I've seem to have stumbled across a bug, and was wondering if
> anyone else has
Do you just want to write some unit tests? If so, you can use "queueStream"
to create a DStream from a queue of RDDs. However, because it doesn't
support metadata checkpointing, it's better to only use it in unit tests.
On Fri, Jan 29, 2016 at 7:35 AM, Sateesh Karuturi <
Hey Jesse,
Could you provide the operators you using?
For the heap dump, it may be not a real memory leak. Since batches started
to queue up, the memory usage should increase.
On Thu, Jan 28, 2016 at 11:54 AM, Ted Yu wrote:
> bq. The total size by class B is 3GB in 1.5.1
fileStream has a parameter "newFilesOnly". By default, it's true and means
processing only new files and ignore existing files in the directory. So
you need to ***move*** the files into the directory, otherwise it will
ignore existing files.
You can also set "newFilesOnly" to false. Then in the
I guess he used client model and the local Spark version is 1.5.2 but the
standalone Spark version is 1.5.1. In other words, he used a 1.5.2 driver
to talk with 1.5.1 executors.
On Mon, Feb 1, 2016 at 2:08 PM, Holden Karau wrote:
> So I'm a little confused to exactly how
hat all threads inside the worker clean up the
>> ThreadLocal once they are done with processing this task?
>>
>> Thanks
>> NB
>>
>>
>> On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Sp
anyways.
>
> Thanks
> NB
>
>
> On Fri, Jan 29, 2016 at 5:09 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> It looks weird. Why don't you just pass "new SomeClass()" to "somefunc"?
>> You don't need to use ThreadLocal if ther
1. To remove a state, you need to call "state.remove()". If you return a
None in the function, it just means don't output it as the DStream's
output, but the state won't be removed if you don't call "state.remove()".
2. For NoSuchElementException, here is the doc for "State.get":
/**
* Get
It depends. The default Kryo serializer cannot handle all cases. If you
encounter any issue, you can follow the Kryo doc to set up custom
serializer: https://github.com/EsotericSoftware/kryo/blob/master/README.md
On Wed, Jan 27, 2016 at 3:13 AM, amit tewari wrote:
> This
It's a known issue. See https://issues.apache.org/jira/browse/SPARK-10719
On Thu, Jan 28, 2016 at 5:44 PM, Khusro Siddiqui wrote:
> It is happening on random executors on random nodes. Not on any specific
> node everytime.
> Or not happening at all
>
> On Thu, Jan 28, 2016 at
Of cause. If you use a ThreadLocal in a long living thread and forget to
remove it, it's definitely a memory leak.
On Thu, Jan 28, 2016 at 9:31 PM, N B wrote:
> Hello,
>
> Does anyone know if there are any potential pitfalls associated with using
> ThreadLocal variables in
gt; On Fri, Jan 29, 2016 at 12:12 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Of cause. If you use a ThreadLocal in a long living thread and forget to
>> remove it, it's definitely a memory leak.
>>
>> On Thu, Jan 28, 2016 at 9:31 PM,
What's the error info reported by Streaming? And could you use "telnet" to
test if the network is normal?
On Mon, Feb 22, 2016 at 6:59 AM, Vinti Maheshwari
wrote:
> For reference, my program:
>
> def main(args: Array[String]): Unit = {
> val conf = new
Hey Abhi,
Could you post how you use mapWithState? By default, it should do
checkpointing every 10 batches.
However, there is a known issue that prevents mapWithState from
checkpointing in some special cases:
https://issues.apache.org/jira/browse/SPARK-6847
On Mon, Feb 22, 2016 at 5:47 AM,
=> rdd.count())
On Mon, Feb 22, 2016 at 12:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Fix for SPARK-6847 is not in branch-1.6
>
> Should the fix be ported to branch-1.6 ?
>
> Thanks
>
> On Feb 22, 2016, at 11:55 AM, Shixiong(Ryan) Zhu <shixi...@databricks.com>
urrently I am using reducebykeyandwindow without the inverse function and
> I am able to get the correct data. But, issue the might arise is when I
> have to restart my application from checkpoint and it repartitions and
> computes the previous 120 partitions, which delays the incoming batch
This is because the Snappy library cannot load the native library. Did you
forget to install the snappy native library in your new machines?
On Fri, Feb 26, 2016 at 11:05 PM, Abhishek Anand
wrote:
> Any insights on this ?
>
> On Fri, Feb 26, 2016 at 1:21 PM, Abhishek
;
> Thanks
> Abhi
>
>
> On Tue, Feb 23, 2016 at 2:36 AM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Hey Abhi,
>>
>> Using reducebykeyandwindow and mapWithState will trigger the bug
>> in SPARK-6847. Here is a workaround to trigger c
Please use Java 7 instead.
On Thu, Feb 25, 2016 at 1:54 PM, Marco Mistroni wrote:
> Hello all
> could anyone help?
> i have tried to install spark 1.6.0 on ubuntu, but the installation failed
> Here are my steps
>
> 1. download spark (successful)
>
> 31 wget
You can use `DStream.map` to transform objects to anything you want.
On Thu, Feb 25, 2016 at 11:06 AM, Mohammad Tariq wrote:
> Hi group,
>
> I have just started working with confluent platform and spark streaming,
> and was wondering if it is possible to access individual
Do you mean you cannot access Master UI after your application completes?
Could you check the master log?
On Mon, Feb 29, 2016 at 3:48 PM, Sumona Routh wrote:
> Hi there,
> I've been doing some performance tuning of our Spark application, which is
> using Spark 1.2.1
ype of operation that we can perform
> inside mapWithState ?
>
> Really need to resolve this one as currently if my application is
> restarted from checkpoint it has to repartition 120 previous stages which
> takes hell lot of time.
>
> Thanks !!
> Abhi
>
> On Mon, Feb
Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/", or
write into it directly? `textFileStream` requires that files must be
written to the monitored directory by "moving" them from another location
within the same file system.
On Mon, Jan 25, 2016 at 6:30 AM, patcharee
Hey Andrey,
`ConstantInputDStream` doesn't support checkpoint as it contains an RDD
field. It cannot resume from checkpoints.
On Mon, Jan 25, 2016 at 1:12 PM, Andrey Yegorov
wrote:
> Hi,
>
> I am new to spark (and scala) and hope someone can help me with the issue
> I
You need to define a create function and use StreamingContext.getOrCreate.
See the example here:
http://spark.apache.org/docs/latest/streaming-programming-guide.html#how-to-configure-checkpointing
On Thu, Jan 21, 2016 at 3:32 AM, Patrick McGloin
wrote:
> Hi all,
>
>
Could you share your log?
On Wed, Jan 20, 2016 at 7:55 AM, Brian London
wrote:
> I'm running a streaming job that has two calls to updateStateByKey. When
> run in standalone mode both calls to updateStateByKey behave as expected.
> When run on a cluster, however, it
Could you share your log?
On Wed, Jan 20, 2016 at 5:37 AM, Siddharth Ubale <
siddharth.ub...@syncoms.com> wrote:
>
>
> Hi,
>
>
>
> I am running a Spark Job on the yarn cluster.
>
> The spark job is a spark streaming application which is reading JSON from
> a kafka topic , inserting the JSON
You should not use SparkContext or RDD directly in your closures.
Could you show the codes of "method1"? Maybe you only needs join or
something else. E.g.,
val cassandraRDD = sc.cassandraTable("keySpace", "tableName")
reRDD.join(cassandraRDD).map().saveAsTextFile(outputDir)
On Tue, Jan 19,
The number of concurrent Streaming job is controlled by
"spark.streaming.concurrentJobs". It's 1 by default. However, you need to
keep in mind that setting it to a bigger number will allow jobs of several
batches running at the same time. It's hard to predicate the behavior and
sometimes will
You can use spark-xml to read the xml files.
https://github.com/databricks/spark-xml has some examples.
To save your results to cassandra, you can use spark-cassandra-connector:
https://github.com/datastax/spark-cassandra-connector
On Tue, Jan 26, 2016 at 10:10 AM, Sree Eedupuganti
`onApplicationEnd` is posted when SparkContext is stopping, and you cannot
submit any job to a stopping SparkContext. In general, SparkListener is
used to monitor the job progress and collect job information, an you should
not submit jobs there. Why not submit your jobs in the main thread?
On
textFileStream doesn't support that. It only supports monitoring one folder.
On Wed, Feb 17, 2016 at 7:20 AM, in4maniac wrote:
> Hi all,
>
> I am new to pyspark streaming and I was following a tutorial I saw in the
> internet
> (
>
The broadcasted object is serialized in driver and sent to the executors.
And in the executor, it will deserialize the bytes to get the broadcasted
object.
On Fri, Feb 19, 2016 at 5:54 AM, jeff saremi wrote:
> could someone please comment on this? thanks
>
>
Don't know where "argonaut.EncodeJson$$anon$2" comes from. However, you can
always put your codes into an method of an "object". Then just call it like
a Java static method.
On Tue, Mar 1, 2016 at 10:30 AM, Yuval.Itzchakov wrote:
> I have a small snippet of code which relays
lared inside a companion object of a case class.
>
> The problem is that Spark will still try to serialize the method, as it
> needs to execute on the worker. How will that change the fact that
> `EncodeJson[T]` is not serializable?
>
>
> On Tue, Mar 1, 2016, 21:12 Shixiong(Ryan) Zhu
For Array, you need to all `toSeq` at first. Scala can convert Array to
ArrayOps automatically. However, it's not a `Seq` and you need to call
`toSeq` explicitly.
On Tue, Mar 1, 2016 at 1:02 AM, Ashok Kumar
wrote:
> Thank you sir
>
> This works OK
> import
Could you search "OutOfMemoryError" in the executor logs? It could be
"OufOfMemoryError: Direct Buffer Memory" or something else.
On Tue, Mar 1, 2016 at 6:23 AM, Nirav Patel wrote:
> Hi,
>
> We are using spark 1.5.2 or yarn. We have a spark application utilizing
> about
Could you use netstat to show the ports that the driver is listening?
On Mon, Mar 14, 2016 at 1:45 PM, David Gomez Saavedra
wrote:
> hi everyone,
>
> I'm trying to set up spark streaming using akka with a similar example of
> the word count provided. When using spark master
You cannot. Streaming doesn't support it because code changes will break
Java serialization.
On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli wrote:
> hello,
>
> i am writing a spark streaming application to read data from kafka. I am
> using no receiver approach and enabled
Hey,
KafkaUtils.createDirectStream doesn't need a StorageLevel as it doesn't
store blocks to BlockManager. However, the error is not related
to StorageLevel. It may be a bug. Could you provide more info about it?
E.g., Spark version, your codes, logs.
On Wed, Mar 2, 2016 at 3:02 AM, Vinti
See https://issues.apache.org/jira/browse/SPARK-12591
After applying the patch, it should work. However, if you want to enable
"registrationRequired", you still need to register
"org.apache.spark.streaming.util.OpenHashMapBasedStateMap",
"org.apache.spark.streaming.util.EmptyStateMap" and
Could you post the log of Master?
On Mon, Mar 21, 2016 at 9:25 AM, Hao Ren wrote:
> Update:
>
> I am using --supervise flag for fault tolerance.
>
>
>
> On Mon, Mar 21, 2016 at 4:16 PM, Hao Ren wrote:
>
>> Using spark 1.6.1
>> Spark Streaming Jobs are
It's because the Scala version of Spark and the Scala version of Kafka
don't match. Please check them.
On Wed, May 4, 2016 at 6:17 AM, أنس الليثي wrote:
> NoSuchMethodError usually appears because of a difference in the library
> versions.
>
> Check the version of the
Hey Tom,
Could you provide all blocked threads? Perhaps due to some potential
deadlock.
On Fri, Jul 8, 2016 at 10:30 AM, Ellis, Tom (Financial Markets IT) <
tom.el...@lloydsbanking.com.invalid> wrote:
> Hi There,
>
>
>
> We’re currently using HDP 2.3.4, Spark 1.5.2 with a Spark Streaming job in
> }
>
> private void initialize() {
> pool = new ConcurrentLinkedQueue<KafkaProducer<String, String>>();
>
> for (int i = 0; i < minIdle; i++) {
> pool.add(createProducer());
> }
>}
>
>public void close
Could you provide your Spark version please?
On Tue, Jan 31, 2017 at 10:37 AM, Nipun Arora
wrote:
> Hi,
>
> I get a resource leak, where the number of file descriptors in spark
> streaming keeps increasing. We end up with a "too many file open" error
> eventually
You can create lazily instantiated singleton instances. See
http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints
for examples of accumulators and broadcast variables. You can use the same
approach to create your cached RDD.
On Tue,
It's documented here:
http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints
On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela wrote:
> Hi all,
>
> I was wondering if anyone ever used a broadcast variable within
> an
It means the total time to run a batch, including the Spark job duration +
time spent on the driver. E.g.,
foreachRDD { rdd =>
rdd.count() // say this takes 1 second.
Thread.sleep(1) // sleep 10 seconds
}
In the above example, the Spark job duration is 1 seconds and the output op
gt; On Tue, Jan 31, 2017 at 1:45 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Could you provide your Spark version please?
>>
>> On Tue, Jan 31, 2017 at 10:37 AM, Nipun Arora <nipunarora2...@gmail.com>
>> wrote:
>>
>>
t; String>(topic, message);
>
> if (async) {
> producer.send(record);
> } else {
> try {
> producer.send(record).get();
> } catch (Exception e) {
> e.printStackTrace();
> }
Which Spark version are you using? If you are using 2.1.0, could you use
the monitoring APIs (
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries)
to check the input rate and the processing rate? One possible issue is that
the Kafka source
Thanks for reporting this. Which Spark version are you using? Could you
provide the full log, please?
On Fri, Jan 27, 2017 at 10:24 AM, Koert Kuipers wrote:
> i checked my topic. it has 5 partitions but all the data is written to a
> single partition: wikipedia-2
> i turned
-- Forwarded message --
From: Shixiong(Ryan) Zhu <shixi...@databricks.com>
Date: Fri, Jan 20, 2017 at 12:06 PM
Subject: Re: Spark streaming app that processes Kafka DStreams produces no
output and no error
To: shyla deshpande <deshpandesh...@gmail.com>
That's how K
Even if you don't need the checkpointing data for recovery, "mapWithState"
still needs to use "checkpoint" to cut off the RDD lineage.
On Mon, Jan 23, 2017 at 12:30 AM, shyla deshpande
wrote:
> Hello spark users,
>
> I do have the same question as Daniel.
>
> I would
You can use the monitoring APIs of Structured Streaming to get metrics. See
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
On Tue, Jan 17, 2017 at 5:01 PM, Heji Kim
wrote:
> Hello. We are trying to
The REST APIs are not just for Spark history server. When an application is
running, you can use the REST APIs to talk to Spark UI HTTP server as well.
On Tue, Feb 28, 2017 at 10:46 AM, Parag Chaudhari
wrote:
> ping...
>
>
>
> *Thanks,Parag Chaudhari,**USC Alumnus (Fight
Conf and I registered the class. Is there a different
> configuration setting for the state map keys?
>
> Thanks!
>
> -Joey
>
> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
> <shixi...@databricks.com> wrote:
> > You can use Kryo. It also implements KryoSerializa
Yeah, the KafkaRDD cannot be reused. It's better to document it.
On Thu, Nov 10, 2016 at 8:26 AM, Ivan von Nagy wrote:
> Ok, I have split he KafkaRDD logic to each use their own group and bumped
> the poll.ms to 10 seconds. Anything less then 2 seconds on the poll.ms
> ends up
> wrote:
> I tried with 1.6.2 and saw the same behavior.
>
> -Joey
>
> On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu
> <shixi...@databricks.com> wrote:
> > There are some known issues in 1.6.0, e.g.,
> > https://issues.apache.org/jira/browse/SPARK-12591
Possibly https://issues.apache.org/jira/browse/SPARK-17396
On Tue, Nov 22, 2016 at 1:42 PM, Mohit Durgapal
wrote:
> Hi Everyone,
>
>
> I am getting the following error while running a spark streaming example
> on my local machine, the being ingested is only 506kb.
>
>
>
; On 22 November 2016 at 22:13, Shixiong(Ryan) Zhu <shixi...@databricks.com>
> wrote:
>
>> The workaround is defining the imports and class together using ":paste".
>>
>> On Tue, Nov 22, 2016 at 11:12 AM, Shixiong(Ryan) Zhu <
>> shixi...@databricks.com&
This relates to a known issue:
https://issues.apache.org/jira/browse/SPARK-14146 and
https://issues.scala-lang.org/browse/SI-9799
On Tue, Nov 22, 2016 at 6:37 AM, dbolshak wrote:
> Hello,
>
> We have the same issue,
>
> We use latest release 2.0.2.
>
> Setup with
The workaround is defining the imports and class together using ":paste".
On Tue, Nov 22, 2016 at 11:12 AM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:
> This relates to a known issue: https://issues.apache.
> org/jira/browse/SPARK-14146 and https://issues.scala
Could you provide the Person class?
On Wed, Nov 16, 2016 at 1:19 PM, shyla deshpande <deshpandesh...@gmail.com>
wrote:
> I am using 2.11.8. Thanks
>
> On Wed, Nov 16, 2016 at 1:15 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Which Scala versio
Which Scala version are you using? Is it Scala 2.10? Scala 2.10 has some
known race conditions in reflection and the Scala community doesn't have
plan to fix it (
http://docs.scala-lang.org/overviews/reflection/thread-safety.html) AFAIK,
the only way to fix it is upgrading to Scala 2.11.
On Wed,
`SQLContext.getOrCreate` will return the HiveContext you created.
On Mon, Nov 14, 2016 at 11:17 PM, Praseetha wrote:
>
> Hi All,
>
>
> I have a streaming app and when i try invoking the
> HiveContext.getOrCreate, it errors out with the following stmt. 'object
>
aultInstance = com.example.protos.demo.Person(
>> )
>> implicit class PersonLens[UpperPB](_l: com.trueaccord.lenses.Lens[UpperPB,
>> com.example.protos.demo.Person]) extends
>> com.trueaccord.lenses.ObjectLens[UpperPB,
>> com.example.protos.demo.Person](_l) {
>>
rs for me to
> reproduce the error so I can get back as early as possible.
>
> Thanks a lot!
>
> On Mon, Oct 31, 2016 at 12:41 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Then it should not be a Receiver issue. Could you use `jstack` to find
>> ou
Dstream "Window" uses "union" to combine multiple RDDs in one window into a
single RDD.
On Tue, Nov 1, 2016 at 2:59 AM kant kodali wrote:
> @Sean It looks like this problem can happen with other RDD's as well. Not
> just unionRDD
>
> On Tue, Nov 1, 2016 at 2:52 AM, kant
mode).
>
> Thanks!
>
> On Mon, Oct 31, 2016 at 12:32 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Sorry, there is a typo in my previous email: this may **not** be the
>> root cause if the leak threads are in the driver side.
>>
>> Doe
Yes, try 2.0.1!
On Tue, Nov 1, 2016 at 11:25 AM, kant kodali <kanth...@gmail.com> wrote:
> AH!!! Got it! Should I use 2.0.1 then ? I don't see 2.1.0
>
> On Tue, Nov 1, 2016 at 10:14 AM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Dstream "Wi
So in your code, each Receiver will start a new thread. Did you stop the
receiver properly in `Receiver.onStop`? Otherwise, you may leak threads
after a receiver crashes and is restarted by Spark. However, this may be
the root cause since the leak threads are in the driver side. Could you use
cs.oracle.com/javase/8/docs/api/java/lang/Thread.html doesn't seem
> to have any method where I can clean up the threads created during OnStart.
> any ideas?
>
> Thanks!
>
>
> On Mon, Oct 31, 2016 at 11:58 AM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
&g
; Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
> > in a later release let me know.
> >
> > -Joey
> >
> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
> > <shixi...@databricks.com> wrote:
> >> That's enough. Did you see a
You can increase the timeout using the option
"kafkaConsumer.pollTimeoutMs". In addition, I would recommend you try Spark
2.1.0 as there are many improvements in Structured Streaming.
On Wed, Jan 11, 2017 at 11:05 AM, Timothy Chan wrote:
> I'm using Spark 2.0.2 and running
You can find the Spark version of spark-submit in the log. Could you check
if it's not consistent?
On Thu, Jan 12, 2017 at 7:35 AM Ramkumar Venkataraman <
ram.the.m...@gmail.com> wrote:
> Spark: 1.6.1
>
> I am trying to use the new mapWithState API and I am getting the following
> error:
>
>
1 - 100 of 145 matches
Mail list logo