Re: Why don't I see my spark jobs running in parallel in Cassandra/Spark DSE cluster?

2017-10-26 Thread Cassa L
No, I dont use Yarn.  This is standalone spark that comes with DataStax
Enterprise version of Cassandra.

On Thu, Oct 26, 2017 at 11:22 PM, Jörn Franke  wrote:

> Do you use yarn ? Then you need to configure the queues with the right
> scheduler and method.
>
> On 27. Oct 2017, at 08:05, Cassa L  wrote:
>
> Hi,
> I have a spark job that has use case as below:
> RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some
> transformation and after that I do a count on transformed data.
>
> Code somewhat  looks like this:
>
> RDD1=JavaFunctions.cassandraTable(...)
> RDD2=JavaFunctions.cassandraTable(...)
> RDD3 = RDD1.flatMap(..)
> RDD4 = RDD2.flatMap()
>
> RDD3.count
> RDD4.count
>
> In Spark UI I see count() functions are getting called one after another.
> How do I make it parallel? I also looked at below discussion from Cloudera,
> but it does not show how to run driver functions in parallel. Do I just add
> Executor and run them in threads?
>
> https://community.cloudera.com/t5/Advanced-Analytics-
> Apache-Spark/Getting-Spark-stages-to-run-in-parallel-
> inside-an-application/td-p/38515
>
> Attaching UI snapshot here?
>
>
> Thanks.
> LCassa
>
>


Custom receiver for WebSocket in Spark not working

2016-11-02 Thread Cassa L
Hi,
I am using spark 1.6. I wrote a custom receiver to read from WebSocket. But
when I start my spark job, it  connects to the WebSocket but  doesn't get
any message. Same code, if I write as separate scala class, it works and
prints messages from WebSocket. Is anything missing in my Spark Code? There
are no errors in spark console.

Here is my receiver -

import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.receiver.Receiver
import org.jfarcand.wcs.{MessageListener, TextListener, WebSocket}

/**
  * Custom receiver for WebSocket
  */
class WebSocketReceiver extends
Receiver[String](StorageLevel.MEMORY_ONLY) with Runnable with Logging
{

  private var webSocket: WebSocket = _

  @transient
  private var thread: Thread = _

  override def onStart(): Unit = {
thread = new Thread(this)
thread.start()
  }

  override def onStop(): Unit = {
setWebSocket(null)
thread.interrupt()
  }

  override def run(): Unit = {
println("Received ")
receive()
  }

  private def receive(): Unit = {


val connection = WebSocket().open("ws://localhost:3001")
println("WebSocket  Connected ..." )
println("Connected --- " + connection)
setWebSocket(connection)

   connection.listener(new TextListener {

 override def onMessage(message: String) {
 System.out.println("Message in Spark client is --> " + message)
   }
})


}

private def setWebSocket(newWebSocket: WebSocket) = synchronized {
if (webSocket != null) {
webSocket.shutDown
}
webSocket = newWebSocket
}

}


=

Here is code for Spark job


object WebSocketTestApp {

  def main(args: Array[String]) {
val conf = new SparkConf()
  .setAppName("Test Web Socket")
  .setMaster("local[20]")
  .set("test", "")
val ssc = new StreamingContext(conf, Seconds(5))


val stream: ReceiverInputDStream[String] = ssc.receiverStream(new
WebSocketReceiver())
stream.print()

ssc.start()
ssc.awaitTermination()
  }


==
}


Thanks,

LCassa


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
Hi Dennis,

On Wed, Jun 15, 2016 at 11:39 PM, Dennis Lovely  wrote:

> You could try tuning spark.shuffle.memoryFraction and
> spark.storage.memoryFraction (both of which have been deprecated in 1.6),
> but ultimately you need to find out where you are bottlenecked and address
> that as adjusting memoryFraction will only be a stopgap.  both shuffle and
> storage memoryFractions default to 0.6
>
> I have set above parameters to 0.5. Does it need to increased?

Thanks.

> On Wed, Jun 15, 2016 at 9:37 PM, Cassa L  wrote:
>
>> Hi,
>>  I did set  --driver-memory 4G. I still run into this issue after 1 hour
>> of data load.
>>
>> I also tried version 1.6 in test environment. I hit this issue much
>> faster than in 1.5.1 setup.
>> LCassa
>>
>> On Tue, Jun 14, 2016 at 3:57 PM, Gaurav Bhatnagar 
>> wrote:
>>
>>> try setting the option --driver-memory 4G
>>>
>>> On Tue, Jun 14, 2016 at 3:52 PM, Ben Slater 
>>> wrote:
>>>
>>>> A high level shot in the dark but in our testing we found Spark 1.6 a
>>>> lot more reliable in low memory situations (presumably due to
>>>> https://issues.apache.org/jira/browse/SPARK-1). If it’s an option,
>>>> probably worth a try.
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>> On Wed, 15 Jun 2016 at 08:48 Cassa L  wrote:
>>>>
>>>>> Hi,
>>>>> I would appreciate any clue on this. It has become a bottleneck for
>>>>> our spark job.
>>>>>
>>>>> On Mon, Jun 13, 2016 at 2:56 PM, Cassa L  wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using spark 1.5.1 version. I am reading data from Kafka into Spark 
>>>>>> and writing it into Cassandra after processing it. Spark job starts fine 
>>>>>> and runs all good for some time until I start getting below errors. Once 
>>>>>> these errors come, job start to lag behind and I see that job has 
>>>>>> scheduling and processing delays in streaming  UI.
>>>>>>
>>>>>> Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak 
>>>>>> memoryFraction parameters. Nothing works.
>>>>>>
>>>>>>
>>>>>> 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with 
>>>>>> curMem=565394, maxMem=2778495713
>>>>>> 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored 
>>>>>> as bytes in memory (estimated size 3.9 KB, free 2.6 GB)
>>>>>> 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable 
>>>>>> 69652 took 2 ms
>>>>>> 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory 
>>>>>> threshold of 1024.0 KB for computing block broadcast_69652 in memory.
>>>>>> 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache 
>>>>>> broadcast_69652 in memory! (computed 496.0 B so far)
>>>>>> 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) + 2.6 
>>>>>> GB (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 
>>>>>> 2.6 GB.
>>>>>> 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to 
>>>>>> disk instead.
>>>>>> 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
>>>>>> 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID 
>>>>>> 452316). 2043 bytes result sent to driver
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> L
>>>>>>
>>>>>>
>>>>> --
>>>> 
>>>> Ben Slater
>>>> Chief Product Officer
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>> +61 437 929 798
>>>>
>>>
>>>
>>
>


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
On Thu, Jun 16, 2016 at 5:27 AM, Deepak Goel  wrote:

> What is your hardware configuration like which you are running Spark on?
>
> It  is 24core, 128GB RAM

> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
> On Thu, Jun 16, 2016 at 5:33 PM, Jacek Laskowski  wrote:
>
>> Hi,
>>
>> What do you see under Executors and Details for Stage (for the
>> affected stages)? Anything weird memory-related?
>>
>> How does your "I am reading data from Kafka into Spark and writing it
>> into Cassandra after processing it." pipeline look like?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Mon, Jun 13, 2016 at 11:56 PM, Cassa L  wrote:
>> > Hi,
>> >
>> > I'm using spark 1.5.1 version. I am reading data from Kafka into Spark
>> and
>> > writing it into Cassandra after processing it. Spark job starts fine and
>> > runs all good for some time until I start getting below errors. Once
>> these
>> > errors come, job start to lag behind and I see that job has scheduling
>> and
>> > processing delays in streaming  UI.
>> >
>> > Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak
>> > memoryFraction parameters. Nothing works.
>> >
>> >
>> > 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with
>> > curMem=565394, maxMem=2778495713
>> > 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored
>> as
>> > bytes in memory (estimated size 3.9 KB, free 2.6 GB)
>> > 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable
>> 69652
>> > took 2 ms
>> > 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory
>> > threshold of 1024.0 KB for computing block broadcast_69652 in memory.
>> > 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache
>> > broadcast_69652 in memory! (computed 496.0 B so far)
>> > 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) +
>> 2.6 GB
>> > (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 2.6
>> GB.
>> > 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to
>> disk
>> > instead.
>> > 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
>> > 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID
>> > 452316). 2043 bytes result sent to driver
>> >
>> >
>> > Thanks,
>> >
>> > L
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
Hi,

>
> What do you see under Executors and Details for Stage (for the
> affected stages)? Anything weird memory-related?
>
Under executor Tab, logs throw these warning -

16/06/16 20:45:40 INFO TorrentBroadcast: Reading broadcast variable
422145 took 1 ms
16/06/16 20:45:40 WARN MemoryStore: Failed to reserve initial memory
threshold of 1024.0 KB for computing block broadcast_422145 in memory.
16/06/16 20:45:40 WARN MemoryStore: Not enough space to cache
broadcast_422145 in memory! (computed 496.0 B so far)
16/06/16 20:45:40 INFO MemoryStore: Memory use = 147.9 KB (blocks) +
2.2 GB (scratch space shared across 0 tasks(s)) = 2.2 GB. Storage
limit = 2.2 GB.
16/06/16 20:45:40 WARN MemoryStore: Persisting block broadcast_422145
to disk instead.
16/06/16 20:45:40 INFO MapOutputTrackerWorker: Don't have map outputs
for shuffle 70278, fetching them

16/06/16 20:45:40 INFO MapOutputTrackerWorker: Doing the fetch; tracker
endpoint = AkkaRpcEndpointRef(Actor[akka.tcp://
sparkDriver@17.40.240.71:46187/user/MapOutputTracker#-1794035569])

I dont see any memory related errors on 'stages' Tab.

>
> How does your "I am reading data from Kafka into Spark and writing it
> into Cassandra after processing it." pipeline look like?
>
> This part has no issues. Reading from Kafka is always up to date. There
are no offset lags. Writting to Cassandra is also fine with less than 1ms
to write data.


> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Jun 13, 2016 at 11:56 PM, Cassa L  wrote:
> > Hi,
> >
> > I'm using spark 1.5.1 version. I am reading data from Kafka into Spark
> and
> > writing it into Cassandra after processing it. Spark job starts fine and
> > runs all good for some time until I start getting below errors. Once
> these
> > errors come, job start to lag behind and I see that job has scheduling
> and
> > processing delays in streaming  UI.
> >
> > Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak
> > memoryFraction parameters. Nothing works.
> >
> >
> > 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with
> > curMem=565394, maxMem=2778495713
> > 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored
> as
> > bytes in memory (estimated size 3.9 KB, free 2.6 GB)
> > 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable 69652
> > took 2 ms
> > 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory
> > threshold of 1024.0 KB for computing block broadcast_69652 in memory.
> > 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache
> > broadcast_69652 in memory! (computed 496.0 B so far)
> > 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) + 2.6
> GB
> > (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 2.6
> GB.
> > 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to
> disk
> > instead.
> > 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
> > 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID
> > 452316). 2043 bytes result sent to driver
> >
> >
> > Thanks,
> >
> > L
>


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-15 Thread Cassa L
Hi,
 I did set  --driver-memory 4G. I still run into this issue after 1 hour of
data load.

I also tried version 1.6 in test environment. I hit this issue much faster
than in 1.5.1 setup.
LCassa

On Tue, Jun 14, 2016 at 3:57 PM, Gaurav Bhatnagar 
wrote:

> try setting the option --driver-memory 4G
>
> On Tue, Jun 14, 2016 at 3:52 PM, Ben Slater 
> wrote:
>
>> A high level shot in the dark but in our testing we found Spark 1.6 a lot
>> more reliable in low memory situations (presumably due to
>> https://issues.apache.org/jira/browse/SPARK-1). If it’s an option,
>> probably worth a try.
>>
>> Cheers
>> Ben
>>
>> On Wed, 15 Jun 2016 at 08:48 Cassa L  wrote:
>>
>>> Hi,
>>> I would appreciate any clue on this. It has become a bottleneck for our
>>> spark job.
>>>
>>> On Mon, Jun 13, 2016 at 2:56 PM, Cassa L  wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using spark 1.5.1 version. I am reading data from Kafka into Spark and 
>>>> writing it into Cassandra after processing it. Spark job starts fine and 
>>>> runs all good for some time until I start getting below errors. Once these 
>>>> errors come, job start to lag behind and I see that job has scheduling and 
>>>> processing delays in streaming  UI.
>>>>
>>>> Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak 
>>>> memoryFraction parameters. Nothing works.
>>>>
>>>>
>>>> 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with 
>>>> curMem=565394, maxMem=2778495713
>>>> 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored as 
>>>> bytes in memory (estimated size 3.9 KB, free 2.6 GB)
>>>> 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable 69652 
>>>> took 2 ms
>>>> 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory 
>>>> threshold of 1024.0 KB for computing block broadcast_69652 in memory.
>>>> 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache 
>>>> broadcast_69652 in memory! (computed 496.0 B so far)
>>>> 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) + 2.6 
>>>> GB (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 2.6 
>>>> GB.
>>>> 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to 
>>>> disk instead.
>>>> 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
>>>> 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID 
>>>> 452316). 2043 bytes result sent to driver
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> L
>>>>
>>>>
>>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
>


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-15 Thread Cassa L
Hi,
Upgrading sprak is not option right now. I did set  --driver-memory 4G. I
still run into this issue after 1 hour of data load.

LCassa


On Tue, Jun 14, 2016 at 3:57 PM, Gaurav Bhatnagar 
wrote:

> try setting the option --driver-memory 4G
>
> On Tue, Jun 14, 2016 at 3:52 PM, Ben Slater 
> wrote:
>
>> A high level shot in the dark but in our testing we found Spark 1.6 a lot
>> more reliable in low memory situations (presumably due to
>> https://issues.apache.org/jira/browse/SPARK-1). If it’s an option,
>> probably worth a try.
>>
>> Cheers
>> Ben
>>
>> On Wed, 15 Jun 2016 at 08:48 Cassa L  wrote:
>>
>>> Hi,
>>> I would appreciate any clue on this. It has become a bottleneck for our
>>> spark job.
>>>
>>> On Mon, Jun 13, 2016 at 2:56 PM, Cassa L  wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using spark 1.5.1 version. I am reading data from Kafka into Spark and 
>>>> writing it into Cassandra after processing it. Spark job starts fine and 
>>>> runs all good for some time until I start getting below errors. Once these 
>>>> errors come, job start to lag behind and I see that job has scheduling and 
>>>> processing delays in streaming  UI.
>>>>
>>>> Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak 
>>>> memoryFraction parameters. Nothing works.
>>>>
>>>>
>>>> 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with 
>>>> curMem=565394, maxMem=2778495713
>>>> 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored as 
>>>> bytes in memory (estimated size 3.9 KB, free 2.6 GB)
>>>> 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable 69652 
>>>> took 2 ms
>>>> 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory 
>>>> threshold of 1024.0 KB for computing block broadcast_69652 in memory.
>>>> 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache 
>>>> broadcast_69652 in memory! (computed 496.0 B so far)
>>>> 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) + 2.6 
>>>> GB (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 2.6 
>>>> GB.
>>>> 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to 
>>>> disk instead.
>>>> 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
>>>> 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID 
>>>> 452316). 2043 bytes result sent to driver
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> L
>>>>
>>>>
>>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
>


Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-14 Thread Cassa L
Hi,
I would appreciate any clue on this. It has become a bottleneck for our
spark job.

On Mon, Jun 13, 2016 at 2:56 PM, Cassa L  wrote:

> Hi,
>
> I'm using spark 1.5.1 version. I am reading data from Kafka into Spark and 
> writing it into Cassandra after processing it. Spark job starts fine and runs 
> all good for some time until I start getting below errors. Once these errors 
> come, job start to lag behind and I see that job has scheduling and 
> processing delays in streaming  UI.
>
> Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak 
> memoryFraction parameters. Nothing works.
>
>
> 16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with 
> curMem=565394, maxMem=2778495713
> 16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0 stored as 
> bytes in memory (estimated size 3.9 KB, free 2.6 GB)
> 16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable 69652 
> took 2 ms
> 16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory 
> threshold of 1024.0 KB for computing block broadcast_69652 in memory.
> 16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache broadcast_69652 
> in memory! (computed 496.0 B so far)
> 16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) + 2.6 GB 
> (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage limit = 2.6 GB.
> 16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652 to disk 
> instead.
> 16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
> 16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0 (TID 
> 452316). 2043 bytes result sent to driver
>
>
> Thanks,
>
> L
>
>


Spark Memory Error - Not enough space to cache broadcast

2016-06-13 Thread Cassa L
Hi,

I'm using spark 1.5.1 version. I am reading data from Kafka into Spark
and writing it into Cassandra after processing it. Spark job starts
fine and runs all good for some time until I start getting below
errors. Once these errors come, job start to lag behind and I see that
job has scheduling and processing delays in streaming  UI.

Worker memory is 6GB, executor-memory is 5GB, I also tried to tweak
memoryFraction parameters. Nothing works.


16/06/13 21:26:02 INFO MemoryStore: ensureFreeSpace(4044) called with
curMem=565394, maxMem=2778495713
16/06/13 21:26:02 INFO MemoryStore: Block broadcast_69652_piece0
stored as bytes in memory (estimated size 3.9 KB, free 2.6 GB)
16/06/13 21:26:02 INFO TorrentBroadcast: Reading broadcast variable
69652 took 2 ms
16/06/13 21:26:02 WARN MemoryStore: Failed to reserve initial memory
threshold of 1024.0 KB for computing block broadcast_69652 in memory.
16/06/13 21:26:02 WARN MemoryStore: Not enough space to cache
broadcast_69652 in memory! (computed 496.0 B so far)
16/06/13 21:26:02 INFO MemoryStore: Memory use = 556.1 KB (blocks) +
2.6 GB (scratch space shared across 0 tasks(s)) = 2.6 GB. Storage
limit = 2.6 GB.
16/06/13 21:26:02 WARN MemoryStore: Persisting block broadcast_69652
to disk instead.
16/06/13 21:26:02 INFO BlockManager: Found block rdd_100761_1 locally
16/06/13 21:26:02 INFO Executor: Finished task 0.0 in stage 71577.0
(TID 452316). 2043 bytes result sent to driver


Thanks,

L


Re: Accessing Cassandra data from Spark Shell

2016-05-18 Thread Cassa L
I tried all combinations of spark-cassandra connector. Didn't work.
Finally, I downgraded spark to 1.5.1 and now it works.
LCassa

On Wed, May 18, 2016 at 11:11 AM, Mohammed Guller 
wrote:

> As Ben mentioned, Spark 1.5.2 does work with C*.  Make sure that you are
> using the correct version of the Spark Cassandra Connector.
>
>
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Ben Slater [mailto:ben.sla...@instaclustr.com]
> *Sent:* Tuesday, May 17, 2016 11:00 PM
> *To:* user@cassandra.apache.org; Mohammed Guller
> *Cc:* user
>
> *Subject:* Re: Accessing Cassandra data from Spark Shell
>
>
>
> It definitely should be possible for 1.5.2 (I have used it with
> spark-shell and cassandra connector with 1.4.x). The main trick is in
> lining up all the versions and building an appropriate connector jar.
>
>
>
> Cheers
>
> Ben
>
>
>
> On Wed, 18 May 2016 at 15:40 Cassa L  wrote:
>
> Hi,
>
> I followed instructions to run SparkShell with Spark-1.6. It works fine.
> However, I need to use spark-1.5.2 version. With it, it does not work. I
> keep getting NoSuchMethod Errors. Is there any issue running Spark Shell
> for Cassandra using older version of Spark?
>
>
>
>
>
> Regards,
>
> LCassa
>
>
>
> On Tue, May 10, 2016 at 6:48 PM, Mohammed Guller 
> wrote:
>
> Yes, it is very simple to access Cassandra data using Spark shell.
>
>
>
> Step 1: Launch the spark-shell with the spark-cassandra-connector package
>
> $SPARK_HOME/bin/spark-shell --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.5.0
>
>
>
> Step 2: Create a DataFrame pointing to your Cassandra table
>
> val dfCassTable = sqlContext.read
>
>
> .format("org.apache.spark.sql.cassandra")
>
>  .options(Map(
> "table" -> "your_column_family", "keyspace" -> "your_keyspace"))
>
>  .load()
>
>
>
> From this point onward, you have complete access to the DataFrame API. You
> can even register it as a temporary table, if you would prefer to use
> SQL/HiveQL.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Ben Slater [mailto:ben.sla...@instaclustr.com]
> *Sent:* Monday, May 9, 2016 9:28 PM
> *To:* user@cassandra.apache.org; user
> *Subject:* Re: Accessing Cassandra data from Spark Shell
>
>
>
> You can use SparkShell to access Cassandra via the Spark Cassandra
> connector. The getting started article on our support page will probably
> give you a good steer to get started even if you’re not using Instaclustr:
> https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-
>
>
>
> Cheers
>
> Ben
>
>
>
> On Tue, 10 May 2016 at 14:08 Cassa L  wrote:
>
> Hi,
>
> Has anyone tried accessing Cassandra data using SparkShell? How do you do
> it? Can you use HiveContext for Cassandra data? I'm using community version
> of Cassandra-3.0
>
>
>
> Thanks,
>
> LCassa
>
> --
>
> 
>
> Ben Slater
>
> Chief Product Officer, Instaclustr
>
> +61 437 929 798
>
>
>
> --
>
> 
>
> Ben Slater
>
> Chief Product Officer, Instaclustr
>
> +61 437 929 798
>


Re: Accessing Cassandra data from Spark Shell

2016-05-17 Thread Cassa L
Hi,
I followed instructions to run SparkShell with Spark-1.6. It works fine.
However, I need to use spark-1.5.2 version. With it, it does not work. I
keep getting NoSuchMethod Errors. Is there any issue running Spark Shell
for Cassandra using older version of Spark?


Regards,
LCassa

On Tue, May 10, 2016 at 6:48 PM, Mohammed Guller 
wrote:

> Yes, it is very simple to access Cassandra data using Spark shell.
>
>
>
> Step 1: Launch the spark-shell with the spark-cassandra-connector package
>
> $SPARK_HOME/bin/spark-shell --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.5.0
>
>
>
> Step 2: Create a DataFrame pointing to your Cassandra table
>
> val dfCassTable = sqlContext.read
>
>
> .format("org.apache.spark.sql.cassandra")
>
>  .options(Map(
> "table" -> "your_column_family", "keyspace" -> "your_keyspace"))
>
>  .load()
>
>
>
> From this point onward, you have complete access to the DataFrame API. You
> can even register it as a temporary table, if you would prefer to use
> SQL/HiveQL.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Ben Slater [mailto:ben.sla...@instaclustr.com]
> *Sent:* Monday, May 9, 2016 9:28 PM
> *To:* user@cassandra.apache.org; user
> *Subject:* Re: Accessing Cassandra data from Spark Shell
>
>
>
> You can use SparkShell to access Cassandra via the Spark Cassandra
> connector. The getting started article on our support page will probably
> give you a good steer to get started even if you’re not using Instaclustr:
> https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-
>
>
>
> Cheers
>
> Ben
>
>
>
> On Tue, 10 May 2016 at 14:08 Cassa L  wrote:
>
> Hi,
>
> Has anyone tried accessing Cassandra data using SparkShell? How do you do
> it? Can you use HiveContext for Cassandra data? I'm using community version
> of Cassandra-3.0
>
>
>
> Thanks,
>
> LCassa
>
> --
>
> 
>
> Ben Slater
>
> Chief Product Officer, Instaclustr
>
> +61 437 929 798
>


Accessing Cassandra data from Spark Shell

2016-05-09 Thread Cassa L
Hi,
Has anyone tried accessing Cassandra data using SparkShell? How do you do
it? Can you use HiveContext for Cassandra data? I'm using community version
of Cassandra-3.0

Thanks,
LCassa


Issue with protobuff and Spark cassandra connector

2015-11-16 Thread Cassa L
Hi,
 Has anyone used Protobuff with spark-cassandra connector? I am using
protobuff-3.0-beta with spark-1.4 and cassandra-connector-2.10. I keep
getting "Unable to find proto buffer class" in my code. I checked version
of protobuff jar and it is loaded with 3.0-beta in classpath. Protobuff is
coming form KAfka stream.

5/11/16 15:32:21 ERROR Executor: Exception in task 2.0 in stage 13.0 (TID
35)
java.lang.RuntimeException: Unable to find proto buffer class:
com.test.serializers.TestEvent$Event
at
com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1063)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)



Here is my code:

JavaDStream rddStream =protoBuffMsgs.map(protoBuff ->
StreamRawData.convertProtoBuffToRawData(protoBuff));

rddStream.foreachRDD(rdd -> {
StreamRawData.writeToCassandra(rdd);
return null;
});

public static void writeToCassandra(JavaRDD rowRDD){
//write to Cassandra
javaFunctions(rowRDD).writerBuilder("keyspace", "data",
mapToRow(MyData.class)).saveToCassandra();
}

If I remove writeToCassandra() from my code, it works. It also counts and
filters on my protobuff stream of data.


Re: Netflix/Astynax Client for Cassandra

2013-02-07 Thread Cassa L
Thank you all for the responses to this thread. I am  planning to use
Cassandra 1.1.9 with Astynax. Does anyone has Cassandra 1.x version running
in production with astynax? Did you come across any show-stopper issues?

Thanks
LCassa


On Thu, Feb 7, 2013 at 8:50 AM, Bartłomiej Romański  wrote:

> Hi,
>
> Does anyone know how about virtual nodes support in Astynax? Are they
> handled correctly? Especially with ConnectionPoolType.TOKEN_AWARE?
>
> Thanks,
> BR
>


Netflix/Astynax Client for Cassandra

2013-02-06 Thread Cassa L
Hi,
 Has anyone used Netflix/astynax java client library for Cassandra? I have
used Hector before and would like to evaluate astynax. Not sure, how it is
accepted in Cassandra community. Any issues with it or advantagest? API
looks very clean and simple compare to Hector. Has anyone used it in
production except Netflix themselves?

Thanks
LCassa


Re: shutdown by drain

2011-10-14 Thread Cassa L
> Now it is true that it could be a shame to interrupt a compaction that have
> been running for a long time and is about to finish (so typically not one
> that
> has just been triggered by your drain), but you can always check the
> compaction manager in JMX to see if it's the case before killing the node.
>
> --


Continuing this conversation. If there was a long running compaction
happening, I have to kill the node and start it again. Will it pick up that
compaction immediately?  Will this be part of start-up process or compaction
will be triggered after restart.

L.


Re: Multi DC setup

2011-10-10 Thread Cassa L
We already have two separate rings. Idea of bidirectional sync is, if one
ring is down, we can still send the traffic to other ring. When original
cluster comes back, it will pick up the data from available cluster. I'm not
sure if it makes sense to have separate rings or combine these two rings
into one.



On Mon, Oct 10, 2011 at 10:17 PM, Milind Parikh wrote:

> Why have two rings? Cassandra manages the replication for youone ring
> with physical nodes in two dc might be a better option. Of course, depending
> on the inter-dc failure characteristics, might need to endure split-brain
> for a while.
>
> /***
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> /
>
> On Oct 10, 2011 10:09 PM, "Cassa L"  wrote:
>
> I am trying to understand multi DC setup for cassandra. As I understand, in
> this setup,  replicas exists in same cluster ring, but physically nodes are
> distributed across DCs. Is this correct?
> I have two different cluster rings in two DCs, and want to replicate data
> bidirectionally. They both have same keyspace. They take  data traffic from
> different sources, but we want to make sure, data exists in both the rings.
> What could be the way to achieve this?
>
> Thanks,
> L.
>
>


Multi DC setup

2011-10-10 Thread Cassa L
I am trying to understand multi DC setup for cassandra. As I understand, in
this setup,  replicas exists in same cluster ring, but physically nodes are
distributed across DCs. Is this correct?
I have two different cluster rings in two DCs, and want to replicate data
bidirectionally. They both have same keyspace. They take  data traffic from
different sources, but we want to make sure, data exists in both the rings.
What could be the way to achieve this?

Thanks,
L.


Copy data from 0.7.4 to 0.8

2011-10-06 Thread Cassa L
Hi,
  I want to transfer data from a ring which is on 0.7.4 to the separate ring
running on 0.8. This ring does not even have schema definition of the data
available on 0.7.4. What is the best way to copy data and schema from 0.7
cluster to 0.8. Do I need to define schema manually and then copy ssTables ?

L.


Gossiper question

2011-05-17 Thread Cassa L
Hi,
  I have 9 node cluster with RF-3 and using Cassandra0.70/Hector26. Recently
we are seeing lot of "UnavailableException" at the client side. Whenever
this happens, I found following pattern in Cassandra node's log file at that
given time,

* INFO [ScheduledTasks:1] 2011-05-13 02:59:55,365 Gossiper.java (line 195)
InetAddress /**.**.***.54 is now dead.*
* INFO [ScheduledTasks:1] 2011-05-13 02:59:57,369 Gossiper.java (line 195)
InetAddress /.**.***.**59 is now dead.*
 INFO [HintedHandoff:1] 2011-05-13 03:00:04,706 HintedHandOffManager.java
(line 192) Started hinted handoff for endpoint /***.**.***.*54
* INFO [GossipStage:1] 2011-05-13 03:00:04,706 Gossiper.java (line 569)
InetAddress /.**.*.54 is now UP*
 INFO [HintedHandoff:1] 2011-05-13 03:00:04,706 HintedHandOffManager.java
(line 248) Finished hinted handoff of 0 rows to endpoint /***.**..54
 INFO [HintedHandoff:1] 2011-05-13 03:00:20,601 HintedHandOffManager.java
(line 192) Started hinted handoff for endpoint /***.**..59
* INFO [GossipStage:1] 2011-05-13 03:00:20,601 Gossiper.java (line 569)
InetAddress /.**.*.59 is now UP*

The exception occurred at "2011-05-13 03:00:00,664". I am wondering what why
this dead/up pattern is occurring at Gossip.

Thanks in advance,
Cassa L.