Unsubscribe

2022-08-10 Thread Shrikar archak
unsubscribe


Re: Shark Vs Spark SQL

2014-07-02 Thread Shrikar archak
As of the spark summit 2014 they mentioned that there will be no active
development on shark.

Thanks,
Shrikar


On Wed, Jul 2, 2014 at 3:53 PM, Subacini B  wrote:

> Hi,
>
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E
>
> This talks about  Shark backend will be replaced with Spark SQL engine in
> future.
> Does that mean Spark will continue to support Shark + Spark SQL for long
> term? OR
> After some period, Shark will be decommissioned ??
>
> Thanks
> Subacini
>


Re: Possible approaches for adding extra metadata (Spark Streaming)?

2014-06-20 Thread Shrikar archak
Thanks Mayur and TD for your inputs.

~Shrikar


On Fri, Jun 20, 2014 at 1:20 PM, Tathagata Das 
wrote:

> If the metadata is directly related to each individual records, then it
> can be done either ways. Since I am not sure how easy or hard will it be
> for you add tags before putting the data into spark streaming, its hard to
> recommend one method over the other.
>
> However, if the metadata is related to each key (based on which you are
> called updateStateByKey) and not every record, then it may be more
> efficient to maintain that per-key metadata in the updateStateByKey's state
> object.
>
> Regarding doing http calls, I would be a bit cautious about performance.
> Doing a http call for every records it going to be quite expensive, and
> reduce throughput significantly. If it is possible, cache values as much as
> possible to amortize the cost of http calls.
>
> TD
>
>
>
>
>
> On Fri, Jun 20, 2014 at 11:16 AM, Shrikar archak 
> wrote:
>
>> Hi All,
>>
>> I was curious to know which of the two approach is better for doing
>> analytics using spark streaming. Lets say we want to add some metadata to
>> the stream which is being processed like sentiment, tags etc and then
>> perform some analytics using these added metadata.
>>
>> 1)  Is it ok to make a http call and add some extra information to the
>> stream being processed in the updateByKeyAndWindow operations.
>>
>> 2) Add these sentiment/tags before and then stream through DStreams.
>>
>> Thanks,
>> Shrikar
>>
>>
>


Possible approaches for adding extra metadata (Spark Streaming)?

2014-06-20 Thread Shrikar archak
Hi All,

I was curious to know which of the two approach is better for doing
analytics using spark streaming. Lets say we want to add some metadata to
the stream which is being processed like sentiment, tags etc and then
perform some analytics using these added metadata.

1)  Is it ok to make a http call and add some extra information to the
stream being processed in the updateByKeyAndWindow operations.

2) Add these sentiment/tags before and then stream through DStreams.

Thanks,
Shrikar


Re: How do you run your spark app?

2014-06-20 Thread Shrikar archak
Hi Shivani,

I use sbt assembly to create a fat jar .
https://github.com/sbt/sbt-assembly

Example of the sbt file is below.

import AssemblyKeys._ // put this at the top of the file

assemblySettings

mainClass in assembly := Some("FifaSparkStreaming")

name := "FifaSparkStreaming"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" %
"provided",
"org.apache.spark" %% "spark-streaming" %
"1.0.0" % "provided",
("org.apache.spark" %%
"spark-streaming-twitter" %
"1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction")

   .exclude("org.eclipse.jetty.orbit","javax.servlet")

   .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish")

   .exclude("org.eclipse.jetty.orbit","javax.activation")

   .exclude("com.esotericsoftware.minlog", "minlog"),
("net.debasishg" % "redisclient_2.10" %
"2.12").exclude("com.typesafe.akka","akka-actor_2.10"))

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
case PathList("org", "apache", xs @ _*) => MergeStrategy.first
case PathList("org", "apache", xs @ _*) => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "unwanted.txt" => MergeStrategy.discard
case x => old(x)
  }
}


resolvers += "Akka Repository" at "http://repo.akka.io/releases/";


And I run as mentioned below.

LOCALLY :
1)  sbt 'run AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014'

If you want to submit on the cluster

CLUSTER:
2) spark-submit --class FifaSparkStreaming --master
"spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster
FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014


Hope this helps.

Thanks,
Shrikar


On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao  wrote:

> Hello Michael,
>
> I have a quick question for you. Can you clarify the statement " build
> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's
> and everything needed to run a Job".  Can you give an example.
>
> I am using sbt assembly as well to create a fat jar, and supplying the
> spark and hadoop locations in the class path. Inside the main() function
> where spark context is created, I use SparkContext.jarOfClass(this).toList
> add the fat jar to my spark context. However, I seem to be running into
> issues with this approach. I was wondering if you had any inputs Michael.
>
> Thanks,
> Shivani
>
>
> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal 
> wrote:
>
>> We use maven for building our code and then invoke spark-submit through
>> the exec plugin, passing in our parameters. Works well for us.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies 
>>
>> 
>>
>>
>>
>>
>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler 
>> wrote:
>>
>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and build
>>> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
>>> to run a Job.  These are automatically built from source by our Jenkins and
>>> stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
>>> direct from HDFS, unpack it and launch the appropriate script.
>>>
>>> Makes for a much cleaner development / testing / deployment to package
>>> everything required in one go instead of relying on cluster specific
>>> classpath additions or any add-jars functionality.
>>>
>>>
>>> On 19 June 2014 22:53, Michael Cutler  wrote:
>>>
 When you start seriously using Spark in production there are basically
 two things everyone eventually needs:

1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
2. Always-On Jobs - that require monitoring, restarting etc.

 There are lots of ways to implement these requirements, everything from
 crontab through to workflow managers like Oozie.

 We opted for the following stack:

- Apache Mesos  (mesosphere.io distribution)


- Marathon  - init/control
system for starting, stopping, and maintaining always-on applications.


- Chronos  - general-purpose
scheduler for Mesos, supports job dependency graphs.


- ** Spark Job Server  -
primarily for it's ability to reuse shared contexts with multiple jobs

 The majority of our jobs are periodic (batch) jobs run through

Possible approaches for adding extra metadata (Spark Streaming)

2014-06-19 Thread Shrikar archak
Hi All,

I was curious to know which of the two approach is better for doing
analytics using spark streaming. Lets say we want to add some metadata to
the stream which is being processed like sentiment, tags etc and then
perform some analytics using these added metadata.

1)  Is it ok to make a http call and add some extra information to the
stream being processed in the updateByKeyAndWindow operations.

2) Add these sentiment/tags before and then stream through DStreams.

Thanks,
Shrikar


SaveAsTextfile per day instead of window?

2014-06-09 Thread Shrikar archak
Hi All,

Is there a way to store the streamed data as textfiles per day instead of
per window?

Thanks,
Shrikar


Spark Streaming union expected behaviour?

2014-06-08 Thread Shrikar archak
Hi All,

I was writing a simple Streaming job to get more understanding about Spark
streaming.
I am not understanding why the union behaviour in this particular case

*WORKS:*
val lines = ssc.socketTextStream("localhost", ,
StorageLevel.MEMORY_AND_DISK_SER)
val words = lines..flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
wordCounts.saveAsTextFiles("all")

This works as expected as well as the streams are stored as files


*DOESN'T WORK*
val lines = ssc.socketTextStream("localhost", ,
StorageLevel.MEMORY_AND_DISK_SER)
val lines1 = ssc.socketTextStream("localhost", 1,
StorageLevel.MEMORY_AND_DISK_SER)
   * val words = lines.union(lines1).flatMap(_.split(" "))*


val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
wordCounts.saveAsTextFiles("all")

In the above case neither the messages are printed nor the files are saved.
Am I doing something wrong here?

Thanks,
Shrikar


Re: Unable to run a Standalone job([NOT FOUND ] org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020)

2014-06-05 Thread Shrikar archak
Hi Prabeesh/ Sean,

I tried both the steps you guys mentioned looks like its not able to
resolve it.

[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.transaction;1.1.1.v201105210645!javax.transaction.orbit
(131ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
(225ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020!javax.mail.glassfish.orbit
(214ms)
[warn]  public: tried
[warn]
http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.orbit
[warn] [NOT FOUND  ]
org.eclipse.jetty.orbit#javax.activation;1.1.0.v201105071233!javax.activation.orbit
(112ms)
[warn]  public: tried

Thanks,
Shrikar


On Thu, Jun 5, 2014 at 1:27 AM, prabeesh k  wrote:

> try sbt clean command before build the app.
>
> or delete .ivy2 ans .sbt  folders(not a good methode). Then try to rebuild
> the project.
>
>
> On Thu, Jun 5, 2014 at 11:45 AM, Sean Owen  wrote:
>
>> I think this is SPARK-1949 again:
>> https://github.com/apache/spark/pull/906
>> I think this change fixed this issue for a few people using the SBT
>> build, worth committing?
>>
>> On Thu, Jun 5, 2014 at 6:40 AM, Shrikar archak 
>> wrote:
>> > Hi All,
>> > Now that the Spark Version 1.0.0 is release there should not be any
>> problem
>> > with the local jars.
>> > Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt
>> > name := "Simple Project"
>> >
>> > version := "1.0"
>> >
>> > scalaVersion := "2.10.4"
>> >
>> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
>> "1.0.0",
>> > "org.apache.spark" %% "spark-streaming" %
>> > "1.0.0")
>> >
>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>> >
>> > I am still having this issue
>> > [error] (run-main) java.lang.NoClassDefFoundError:
>> > javax/servlet/http/HttpServletResponse
>> > java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
>> > at org.apache.spark.HttpServer.start(HttpServer.scala:54)
>> > at
>> >
>> org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
>> > at
>> >
>> org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
>> > at
>> >
>> org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
>> > at
>> >
>> org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
>> > at
>> >
>> org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
>> > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
>> > at org.apache.spark.SparkContext.(SparkContext.scala:202)
>> >
>> > Any help would be greatly appreciated.
>> >
>> > Thanks,
>> > Shrikar
>> >
>> >
>> > On Fri, May 23, 2014 at 3:58 PM, Shrikar archak 
>> wrote:
>> >>
>> >> Still the same error no change
>> >>
>> >> Thanks,
>> >> Shrikar
>> >>
>> >>
>> >> On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski 
>> wrote:
>> >>>
>> >>> Hi Shrikar,
>> >>>
>> >>> How did you build Spark 1.0.0-SNAPSHOT on your machine? My
>> >>> understanding is that `sbt publishLocal` is not enough and you really
>> >>> need `sbt assembly` instead. Give it a try and report back.
>> >>>
>> >>> As to your build.sbt, upgrade Scala to 2.10.4 and "org.apache.spark"
>> >>> %% "spark-streaming" % "1.0.0-SNAPSHOT" only that will pull down
>> >>> spark-core as a transitive dep. The resolver for Akka Repository is
>> >>> not needed. Your build.sbt should really look as follows:
>> >>>
>> >>> name := "Simple Project"
>> >>>
>> >>> version := "1.0"
>> >>>
>> >>> scalaVersion := "2.10.4"
>> &g

Re: Unable to run a Standalone job

2014-06-04 Thread Shrikar archak
Hi All,
Now that the Spark Version 1.0.0 is release there should not be any problem
with the local jars.
Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0",
"org.apache.spark" %% "spark-streaming" %
"1.0.0")

resolvers += "Akka Repository" at "http://repo.akka.io/releases/";

I am still having this issue
[error] (run-main) java.lang.NoClassDefFoundError:
javax/servlet/http/HttpServletResponse
java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
 at org.apache.spark.HttpServer.start(HttpServer.scala:54)
at
org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
 at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
 at
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at
org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.SparkContext.(SparkContext.scala:202)

Any help would be greatly appreciated.

Thanks,
Shrikar


On Fri, May 23, 2014 at 3:58 PM, Shrikar archak  wrote:

> Still the same error no change
>
> Thanks,
> Shrikar
>
>
> On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski  wrote:
>
>> Hi Shrikar,
>>
>> How did you build Spark 1.0.0-SNAPSHOT on your machine? My
>> understanding is that `sbt publishLocal` is not enough and you really
>> need `sbt assembly` instead. Give it a try and report back.
>>
>> As to your build.sbt, upgrade Scala to 2.10.4 and "org.apache.spark"
>> %% "spark-streaming" % "1.0.0-SNAPSHOT" only that will pull down
>> spark-core as a transitive dep. The resolver for Akka Repository is
>> not needed. Your build.sbt should really look as follows:
>>
>> name := "Simple Project"
>>
>> version := "1.0"
>>
>> scalaVersion := "2.10.4"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>> "1.0.0-SNAPSHOT"
>>
>> Jacek
>>
>> On Thu, May 22, 2014 at 11:27 PM, Shrikar archak 
>> wrote:
>> > Hi All,
>> >
>> > I am trying to run the network count example as a seperate standalone
>> job
>> > and running into some issues.
>> >
>> > Environment:
>> > 1) Mac Mavericks
>> > 2) Latest spark repo from Github.
>> >
>> >
>> > I have a structure like this
>> >
>> > Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
>> > .
>> > ./simple.sbt
>> > ./src
>> > ./src/main
>> > ./src/main/scala
>> > ./src/main/scala/NetworkWordCount.scala
>> > ./src/main/scala/SimpleApp.scala.bk
>> >
>> >
>> > simple.sbt
>> > name := "Simple Project"
>> >
>> > version := "1.0"
>> >
>> > scalaVersion := "2.10.3"
>> >
>> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
>> > "1.0.0-SNAPSHOT",
>> > "org.apache.spark" %% "spark-streaming" %
>> > "1.0.0-SNAPSHOT")
>> >
>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>> >
>> >
>> > I am able to run the SimpleApp which is mentioned in the doc but when I
>> try
>> > to run the NetworkWordCount app I get error like this am I missing
>> > something?
>> >
>> > [info] Running com.shrikar.sparkapps.NetworkWordCount
>> > 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
>> shrikar
>> > 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
>> > authentication disabled; ui acls disabled; users with view permissions:
>> > Set(shrikar)
>> > 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> > 14/05/22 14:26:48 INFO Remoting: Starting remoting
>> > 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
>> addresses
>> > :[akka.tcp://spark@192.168.10.88:49963]
>> > 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
>> > [akka.tcp://spark@192.168.10.88:49963]
>> > 14/05/22 14:26:48 INFO spark.SparkEnv: Regis

Re: Unable to run a Standalone job

2014-05-23 Thread Shrikar archak
Still the same error no change

Thanks,
Shrikar


On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski  wrote:

> Hi Shrikar,
>
> How did you build Spark 1.0.0-SNAPSHOT on your machine? My
> understanding is that `sbt publishLocal` is not enough and you really
> need `sbt assembly` instead. Give it a try and report back.
>
> As to your build.sbt, upgrade Scala to 2.10.4 and "org.apache.spark"
> %% "spark-streaming" % "1.0.0-SNAPSHOT" only that will pull down
> spark-core as a transitive dep. The resolver for Akka Repository is
> not needed. Your build.sbt should really look as follows:
>
> name := "Simple Project"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming" %
> "1.0.0-SNAPSHOT"
>
> Jacek
>
> On Thu, May 22, 2014 at 11:27 PM, Shrikar archak 
> wrote:
> > Hi All,
> >
> > I am trying to run the network count example as a seperate standalone job
> > and running into some issues.
> >
> > Environment:
> > 1) Mac Mavericks
> > 2) Latest spark repo from Github.
> >
> >
> > I have a structure like this
> >
> > Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
> > .
> > ./simple.sbt
> > ./src
> > ./src/main
> > ./src/main/scala
> > ./src/main/scala/NetworkWordCount.scala
> > ./src/main/scala/SimpleApp.scala.bk
> >
> >
> > simple.sbt
> > name := "Simple Project"
> >
> > version := "1.0"
> >
> > scalaVersion := "2.10.3"
> >
> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
> > "1.0.0-SNAPSHOT",
> > "org.apache.spark" %% "spark-streaming" %
> > "1.0.0-SNAPSHOT")
> >
> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
> >
> >
> > I am able to run the SimpleApp which is mentioned in the doc but when I
> try
> > to run the NetworkWordCount app I get error like this am I missing
> > something?
> >
> > [info] Running com.shrikar.sparkapps.NetworkWordCount
> > 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
> shrikar
> > 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
> > authentication disabled; ui acls disabled; users with view permissions:
> > Set(shrikar)
> > 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
> > 14/05/22 14:26:48 INFO Remoting: Starting remoting
> > 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
> > :[akka.tcp://spark@192.168.10.88:49963]
> > 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
> > [akka.tcp://spark@192.168.10.88:49963]
> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
> > 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory
> at
> >
> /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
> > 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
> > capacity 911.6 MB.
> > 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
> 49964
> > with id = ConnectionManagerId(192.168.10.88,49964)
> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
> > BlockManager
> > 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
> manager
> > 192.168.10.88:49964 with 911.6 MB RAM
> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
> BlockManager
> > 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
> > [error] (run-main) java.lang.NoClassDefFoundError:
> > javax/servlet/http/HttpServletResponse
> > java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
> > at org.apache.spark.HttpServer.start(HttpServer.scala:54)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
> > at
> >
> org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
> > at
> >
> org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
> > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
> > at org.apa

Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Hi,
I tried clearing maven and ivy cache and I am a bit confused at this point
in time.

1) Running the example from the spark directory and running using
bin/run-example. It works fine as well as it prints the word counts.

2) Trying to run the same code as a seperate job.
   *) Using the latest 1.0.0-SNAPSHOT it doesn't work and throws exception.
  *) Using 0.9.1 doesn't throws any exception but doesn't print any word
counts.

Thanks,
Shrikar


On Thu, May 22, 2014 at 9:19 PM, Soumya Simanta wrote:

> Try cleaning your maven (.m2) and ivy cache.
>
>
>
> On May 23, 2014, at 12:03 AM, Shrikar archak  wrote:
>
> Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1.
>
> Thanks,
> Shrikar
>
>
> On Thu, May 22, 2014 at 8:53 PM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you
>> publish Spark locally which allowed you to use it as a dependency?
>>
>> This is a weird indeed. SBT should take care of all the dependencies of
>> spark.
>>
>> In any case, you can try the last released Spark 0.9.1 and see if the
>> problem persists.
>>
>>
>> On Thu, May 22, 2014 at 3:59 PM, Shrikar archak wrote:
>>
>>> I am running as sbt run. I am running it locally .
>>>
>>> Thanks,
>>> Shrikar
>>>
>>>
>>> On Thu, May 22, 2014 at 3:53 PM, Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
>>>> How are you launching the application? sbt run ? spark-submit? local
>>>> mode or Spark standalone cluster? Are you packaging all your code into
>>>> a jar?
>>>> Looks to me that you seem to have spark classes in your execution
>>>> environment but missing some of Spark's dependencies.
>>>>
>>>> TD
>>>>
>>>>
>>>>
>>>> On Thu, May 22, 2014 at 2:27 PM, Shrikar archak 
>>>> wrote:
>>>> > Hi All,
>>>> >
>>>> > I am trying to run the network count example as a seperate standalone
>>>> job
>>>> > and running into some issues.
>>>> >
>>>> > Environment:
>>>> > 1) Mac Mavericks
>>>> > 2) Latest spark repo from Github.
>>>> >
>>>> >
>>>> > I have a structure like this
>>>> >
>>>> > Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
>>>> > .
>>>> > ./simple.sbt
>>>> > ./src
>>>> > ./src/main
>>>> > ./src/main/scala
>>>> > ./src/main/scala/NetworkWordCount.scala
>>>> > ./src/main/scala/SimpleApp.scala.bk
>>>> >
>>>> >
>>>> > simple.sbt
>>>> > name := "Simple Project"
>>>> >
>>>> > version := "1.0"
>>>> >
>>>> > scalaVersion := "2.10.3"
>>>> >
>>>> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
>>>> > "1.0.0-SNAPSHOT",
>>>> > "org.apache.spark" %% "spark-streaming" %
>>>> > "1.0.0-SNAPSHOT")
>>>> >
>>>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>>> >
>>>> >
>>>> > I am able to run the SimpleApp which is mentioned in the doc but when
>>>> I try
>>>> > to run the NetworkWordCount app I get error like this am I missing
>>>> > something?
>>>> >
>>>> > [info] Running com.shrikar.sparkapps.NetworkWordCount
>>>> > 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
>>>> shrikar
>>>> > 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
>>>> > authentication disabled; ui acls disabled; users with view
>>>> permissions:
>>>> > Set(shrikar)
>>>> > 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>> > 14/05/22 14:26:48 INFO Remoting: Starting remoting
>>>> > 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
>>>> addresses
>>>> > :[akka.tcp://spark@192.168.10.88:49963]
>>>> > 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
>>>> > [akka.tcp://spark@192.168.10.88:49963]
>>>> > 14/05/22 14:26:48 INFO spark.Spa

Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Yes I did a sbt publish-local. Ok I will try with Spark 0.9.1.

Thanks,
Shrikar


On Thu, May 22, 2014 at 8:53 PM, Tathagata Das
wrote:

> How are you getting Spark with 1.0.0-SNAPSHOT through maven? Did you
> publish Spark locally which allowed you to use it as a dependency?
>
> This is a weird indeed. SBT should take care of all the dependencies of
> spark.
>
> In any case, you can try the last released Spark 0.9.1 and see if the
> problem persists.
>
>
> On Thu, May 22, 2014 at 3:59 PM, Shrikar archak wrote:
>
>> I am running as sbt run. I am running it locally .
>>
>> Thanks,
>> Shrikar
>>
>>
>> On Thu, May 22, 2014 at 3:53 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> How are you launching the application? sbt run ? spark-submit? local
>>> mode or Spark standalone cluster? Are you packaging all your code into
>>> a jar?
>>> Looks to me that you seem to have spark classes in your execution
>>> environment but missing some of Spark's dependencies.
>>>
>>> TD
>>>
>>>
>>>
>>> On Thu, May 22, 2014 at 2:27 PM, Shrikar archak 
>>> wrote:
>>> > Hi All,
>>> >
>>> > I am trying to run the network count example as a seperate standalone
>>> job
>>> > and running into some issues.
>>> >
>>> > Environment:
>>> > 1) Mac Mavericks
>>> > 2) Latest spark repo from Github.
>>> >
>>> >
>>> > I have a structure like this
>>> >
>>> > Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
>>> > .
>>> > ./simple.sbt
>>> > ./src
>>> > ./src/main
>>> > ./src/main/scala
>>> > ./src/main/scala/NetworkWordCount.scala
>>> > ./src/main/scala/SimpleApp.scala.bk
>>> >
>>> >
>>> > simple.sbt
>>> > name := "Simple Project"
>>> >
>>> > version := "1.0"
>>> >
>>> > scalaVersion := "2.10.3"
>>> >
>>> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
>>> > "1.0.0-SNAPSHOT",
>>> > "org.apache.spark" %% "spark-streaming" %
>>> > "1.0.0-SNAPSHOT")
>>> >
>>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>> >
>>> >
>>> > I am able to run the SimpleApp which is mentioned in the doc but when
>>> I try
>>> > to run the NetworkWordCount app I get error like this am I missing
>>> > something?
>>> >
>>> > [info] Running com.shrikar.sparkapps.NetworkWordCount
>>> > 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
>>> shrikar
>>> > 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
>>> > authentication disabled; ui acls disabled; users with view permissions:
>>> > Set(shrikar)
>>> > 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> > 14/05/22 14:26:48 INFO Remoting: Starting remoting
>>> > 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
>>> addresses
>>> > :[akka.tcp://spark@192.168.10.88:49963]
>>> > 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
>>> > [akka.tcp://spark@192.168.10.88:49963]
>>> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
>>> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
>>> > 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local
>>> directory at
>>> >
>>> /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
>>> > 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
>>> > capacity 911.6 MB.
>>> > 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
>>> 49964
>>> > with id = ConnectionManagerId(192.168.10.88,49964)
>>> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
>>> > BlockManager
>>> > 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> > 192.168.10.88:49964 with 911.6 MB RAM
>>> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
>>> BlockManager
>>> > 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP

Re: Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
I am running as sbt run. I am running it locally .

Thanks,
Shrikar


On Thu, May 22, 2014 at 3:53 PM, Tathagata Das
wrote:

> How are you launching the application? sbt run ? spark-submit? local
> mode or Spark standalone cluster? Are you packaging all your code into
> a jar?
> Looks to me that you seem to have spark classes in your execution
> environment but missing some of Spark's dependencies.
>
> TD
>
>
>
> On Thu, May 22, 2014 at 2:27 PM, Shrikar archak 
> wrote:
> > Hi All,
> >
> > I am trying to run the network count example as a seperate standalone job
> > and running into some issues.
> >
> > Environment:
> > 1) Mac Mavericks
> > 2) Latest spark repo from Github.
> >
> >
> > I have a structure like this
> >
> > Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
> > .
> > ./simple.sbt
> > ./src
> > ./src/main
> > ./src/main/scala
> > ./src/main/scala/NetworkWordCount.scala
> > ./src/main/scala/SimpleApp.scala.bk
> >
> >
> > simple.sbt
> > name := "Simple Project"
> >
> > version := "1.0"
> >
> > scalaVersion := "2.10.3"
> >
> > libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
> > "1.0.0-SNAPSHOT",
> > "org.apache.spark" %% "spark-streaming" %
> > "1.0.0-SNAPSHOT")
> >
> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
> >
> >
> > I am able to run the SimpleApp which is mentioned in the doc but when I
> try
> > to run the NetworkWordCount app I get error like this am I missing
> > something?
> >
> > [info] Running com.shrikar.sparkapps.NetworkWordCount
> > 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
> shrikar
> > 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
> > authentication disabled; ui acls disabled; users with view permissions:
> > Set(shrikar)
> > 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
> > 14/05/22 14:26:48 INFO Remoting: Starting remoting
> > 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
> > :[akka.tcp://spark@192.168.10.88:49963]
> > 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
> > [akka.tcp://spark@192.168.10.88:49963]
> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
> > 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
> > 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory
> at
> >
> /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
> > 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
> > capacity 911.6 MB.
> > 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
> 49964
> > with id = ConnectionManagerId(192.168.10.88,49964)
> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
> > BlockManager
> > 14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block
> manager
> > 192.168.10.88:49964 with 911.6 MB RAM
> > 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered
> BlockManager
> > 14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server
> > [error] (run-main) java.lang.NoClassDefFoundError:
> > javax/servlet/http/HttpServletResponse
> > java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
> > at org.apache.spark.HttpServer.start(HttpServer.scala:54)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
> > at
> >
> org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
> > at
> >
> org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
> > at
> >
> org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
> > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
> > at org.apache.spark.SparkContext.(SparkContext.scala:202)
> > at
> >
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
> > at
> >
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
> > at
> >
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:91)
> > at
> com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
> > at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >
> > Thanks,
> > Shrikar
> >
>


Unable to run a Standalone job

2014-05-22 Thread Shrikar archak
Hi All,

I am trying to run the network count example as a seperate standalone job
and running into some issues.

Environment:
1) Mac Mavericks
2) Latest spark repo from Github.


I have a structure like this

Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/NetworkWordCount.scala
./src/main/scala/SimpleApp.scala.bk


simple.sbt
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" %
"1.0.0-SNAPSHOT",
"org.apache.spark" %% "spark-streaming" %
"1.0.0-SNAPSHOT")

resolvers += "Akka Repository" at "http://repo.akka.io/releases/";


I am able to run the SimpleApp which is mentioned in the doc but when I try
to run the NetworkWordCount app I get error like this am I missing
something?

[info] Running com.shrikar.sparkapps.NetworkWordCount
14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar
14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(shrikar)
14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/22 14:26:48 INFO Remoting: Starting remoting
14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@192.168.10.88:49963]
14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@192.168.10.88:49963]
14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at
/var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
capacity 911.6 MB.
14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port
49964 with id = ConnectionManagerId(192.168.10.88,49964)
14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/05/22 14:26:48 INFO storage.BlockManagerInfo: Registering block manager
192.168.10.88:49964 with 911.6 MB RAM
*14/05/22 14:26:48 INFO storage.BlockManagerMaster: Registered BlockManager*
*14/05/22 14:26:48 INFO spark.HttpServer: Starting HTTP Server*
*[error] (run-main) java.lang.NoClassDefFoundError:
javax/servlet/http/HttpServletResponse*
*java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse*
* at org.apache.spark.HttpServer.start(HttpServer.scala:54)*
* at
org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)*
* at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)*
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
 at
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at
org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.SparkContext.(SparkContext.scala:202)
 at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:549)
at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:561)
 at
org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:91)
at com.shrikar.sparkapps.NetworkWordCount$.main(NetworkWordCount.scala:39)
 at com.shrikar.sparkapps.NetworkWordCount.main(NetworkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)


Thanks,
Shrikar