Re: saveAsTextFile error

2014-11-14 Thread Harold Nguyen
Hi Niko, It looks like you are calling a method on DStream, which does not exist. Check out: https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html#output-operations-on-dstreams for the method "saveAsTextFiles" Harold On Fri, Nov 14, 2014 at 10:39 AM, Niko Gamulin wr

Re: Can spark read and write to cassandra without HDFS?

2014-11-12 Thread Harold Nguyen
Hi Kevin, Yes, Spark can read and write to Cassandra without Hadoop. Have you seen this: https://github.com/datastax/spark-cassandra-connector Harold On Wed, Nov 12, 2014 at 9:28 PM, Kevin Burton wrote: > We have all our data in Cassandra so I’d prefer to not have to bring up > Hadoo

Spark Streaming - Most popular Twitter Hashtags

2014-11-03 Thread Harold Nguyen
managed to do this, and was willing to share as an example :) This seems to be the exact use case that will help me! Thanks! Harold

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
. However, for all the examples I've seen, inserting into Cassandra is something like: val collection = sc.parralellize(Seq("foo", bar"))) Where "foo" and "bar" could be elements in the arr array. So I would like to know how to insert into Cassandra at the

Re: Manipulating RDDs within a DStream

2014-10-31 Thread Harold Nguyen
Hi Helena, Thanks very much ! I'm using Spark 1.1.0, and spark-cassandra-connector-assembly-1.2.0-SNAPSHOT Best wishes, Harold On Fri, Oct 31, 2014 at 10:31 AM, Helena Edelson < helena.edel...@datastax.com> wrote: > Hi Harold, > Can you include the versions of spark an

Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
g like that, but as you know, I can't do this within the "foreacRDD" but only at the driver level. How do I use the "arr" variable to do something like that ? Thanks for any help, Harold

Re: Manipulating RDDs within a DStream

2014-10-30 Thread Harold Nguyen
Hi, Sorry, there's a typo there: val arr = rdd.toArray Harold On Thu, Oct 30, 2014 at 9:58 AM, Harold Nguyen wrote: > Hi all, > > I'd like to be able to modify values in a DStream, and then send it off to > an external source like Cassandra, but I keep getting Seriali

NonSerializable Exception in foreachRDD

2014-10-30 Thread Harold Nguyen
Hi all, In Spark Streaming, when I do "foreachRDD" on my DStreams, I get a NonSerializable exception when I try to do something like: DStream.foreachRDD( rdd => { var sc.parallelize(Seq(("test", "blah"))) }) Is there any way around that ? Thanks, Harold

Re: Convert DStream to String

2014-10-29 Thread Harold Nguyen
As an example, here is a line: "hello world you are SECRETWORDthebest hello world" And it should do this: (SECRETWORDthebest_hello, 2), (SECRETWORDthebest_world, 2), (SECRETWORDthebest_you, 1), etc... Harold On Wed, Oct 29, 2014 at 3:36 PM, Sean Owen wrote: > What would it

Convert DStream to String

2014-10-29 Thread Harold Nguyen
+"_"+word, 1)) Thanks for any help, Harold

Re: Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
nsResolver;)V It look every similar to this post: http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi Since I'm a little new to everything, would someone be able to provide a step-by-step guidance for that ? Harold

Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
Hi all, I followed the guide here: http://spark.apache.org/docs/latest/streaming-kinesis-integration.html But got this error: Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Would you happen to know what dependency or jar is needed ? Harold

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
ke the kafka jar that you are using isn't compatible with your >> scala version. >> Thanks >> Best Regards >> On Wed, Oct 29, 2014 at 11:50 AM, Harold Nguyen wrote: >>> Hi, >>> >>> Just wondering if you've seen the following error when rea

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
Thanks! How do I find out which Kafka jar to use for scala 2.10.4? — Sent from Mailbox On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das wrote: > Looks like the kafka jar that you are using isn't compatible with your > scala version. > Thanks > Best Regards > On Wed, Oct 29, 2014

Spark Streaming from Kafka

2014-10-28 Thread Harold Nguyen
a:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 18 more Thanks, Harold

Re: Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
at now, and I couldn't be happier. I had to piece together 6 different forums and sites to get that working (being absolutely new to Spark and Scala and sbt). I'll write a blog post on how to get this working later, in case it can help someone. I really appreciate the help! Harold On Tu

Including jars in Spark-shell vs Spark-submit

2014-10-28 Thread Harold Nguyen
-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar Am I issuing the spark-submit command incorrectly ? Each of the workers has that built jar in their respective directories (spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar) Thanks, Harold

Saving to Cassandra from Spark Streaming

2014-10-28 Thread Harold Nguyen
he error: Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper Thanks, Harold

Re: Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
a) On Mon, Oct 27, 2014 at 9:22 PM, Harold Nguyen wrote: > Hi Spark friends, > > I'm trying to connect Spark Streaming into Cassandra by modifying the > NetworkWordCount.scala streaming example, and doing the "make as few > changes as possible" but having it insert d

Spark Streaming into Cassandra - NoClass ColumnMapper

2014-10-27 Thread Harold Nguyen
is my sbt build file: == name := "Simple Streaming" version := "1.0" scalaVersion := "2.10.4" libraryDependencies ++= Seq( "org.apache.spark" % "spark-streaming_2.10" % "1.1.0", "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-alpha3" withSources() withJavadoc(), "org.apache.spark" %% "spark-sql" % "1.1.0" ) = Any help would be appreciated! Thanks so much! Harold