RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-25 Thread Cheng, Hao
Ok, I see, thanks for the correction, but this should be optimized.

From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 2:08 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?

That's two jobs. `SparkPlan.executeTake` will call `runJob` twice in this case.


Best Regards,
Shixiong Zhu

2015-08-25 14:01 GMT+08:00 Cheng, Hao 
mailto:hao.ch...@intel.com>>:
O, Sorry, I miss reading your reply!

I know the minimum tasks will be 2 for scanning, but Jeff is talking about 2 
jobs, not 2 tasks.

From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 1:29 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org

Subject: Re: DataFrame#show cost 2 Spark Jobs ?

Hao,

I can reproduce it using the master branch. I'm curious why you cannot 
reproduce it. Did you check if the input HadoopRDD did have two partitions? My 
test code is

val df = sqlContext.read.json("examples/src/main/resources/people.json")
df.show()



Best Regards,
Shixiong Zhu

2015-08-25 13:01 GMT+08:00 Cheng, Hao 
mailto:hao.ch...@intel.com>>:
Hi Jeff, which version are you using? I couldn’t reproduce the 2 spark jobs in 
the `df.show()` with latest code, we did refactor the code for json data source 
recently, not sure you’re running an earlier version of it.

And a known issue is Spark SQL will try to re-list the files every time when 
loading the data for JSON, it’s probably causes longer time for ramp up with 
large number of files/partitions.

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?

Hi Cheng,

I know that sqlContext.read will trigger one spark job to infer the schema. 
What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3 
jobs.

Here's the command I use:

>> val df = 
>> sqlContext.read.json("file:///Users/hadoop/github/spark/examples/src/main/resources/people.json")
>> // trigger one spark job to infer schema
>> df.show()// trigger 2 spark jobs which is weird




On Mon, Aug 24, 2015 at 10:56 PM, Cheng, Hao 
mailto:hao.ch...@intel.com>> wrote:
The first job is to infer the json schema, and the second one is what you mean 
of the query.
You can provide the schema while loading the json file, like below:

sqlContext.read.schema(xxx).json(“…”)?

Hao
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Monday, August 24, 2015 6:20 PM
To: user@spark.apache.org
Subject: DataFrame#show cost 2 Spark Jobs ?

It's weird to me that the simple show function will cost 2 spark jobs. 
DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs.

== Parsed Logical Plan ==
Relation[age#0L,name#1] 
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Analyzed Logical Plan ==
age: bigint, name: string
Relation[age#0L,name#1] 
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Optimized Logical Plan ==
Relation[age#0L,name#1] 
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Physical Plan ==
Scan 
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1]



--
Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang




Invalid environment variable name when submitting job from windows

2015-08-25 Thread Yann ROBIN
Hi,

We have a spark standalone cluster running on linux.
We have a job that we submit to the spark cluster on windows. When
submitting this job using windows the execution failed with this error
in the Notes "java.lang.IllegalArgumentException: Invalid environment
variable name: "=::"". When submitting from linux it works fine.

I thought that this might be the result of one of the ENV variable on
my system so I've modify the submit cmd to remove all env variable
except the one needed by Java. This is the env before executing java
command :
ASSEMBLY_DIR=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib
ASSEMBLY_DIR1=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\../assembly/target/scala-2.10
ASSEMBLY_DIR2=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\../assembly/target/scala-2.11
CLASS=org.apache.spark.deploy.SparkSubmit
CLASSPATH=.;
JAVA_HOME=C:\Program Files\Java\jre1.8.0_51
LAUNCHER_OUTPUT=\spark-class-launcher-output-23386.txt
LAUNCH_CLASSPATH=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\spark-assembly-1.4.0-hadoop2.6.0.jar
PYTHONHASHSEED=0
RUNNER=C:\Program Files\Java\jre1.8.0_51\bin\java
SPARK_ASSEMBLY_JAR=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\spark-assembly-1.4.0-hadoop2.6.0.jar
SPARK_CMD="C:\Program Files\Java\jre1.8.0_51\bin\java" -cp
"c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\conf\;c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\spark-assembly-1.4.0-hadoop2.6.0.jar;c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\datanucleus-api-jdo-3.2.6.jar;c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\datanucleus-core-3.2.10.jar;c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\datanucleus-rdbms-3.2.9.jar"
org.apache.spark.deploy.SparkSubmit --master spark://172.16.8.21:7077
--deploy-mode cluster --conf "spark.driver.memory=4G" --conf
"spark.driver.extraClassPath=/opt/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar"
--class com.publica.Accounts --verbose
http://server/data-analytics/data-analytics.jar
spark://172.16.8.21:7077 data-analysis
http://server/data-analytics/data-analytics.jar 23 8 2015
SPARK_ENV_LOADED=1
SPARK_HOME=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..
SPARK_SCALA_VERSION=2.10
SystemRoot=C:\Windows
user_conf_dir=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\..\conf
_SPARK_ASSEMBLY=c:\spark\spark-1.4.0-bin-hadoop2.6\bin\..\lib\spark-assembly-1.4.0-hadoop2.6.0.jar

Is there a way to make this works ?

--
Yann

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Local Spark talking to remote HDFS?

2015-08-25 Thread Roberto Congiu
Port 8020 is not the only port you need tunnelled for HDFS to work. If you
only list the contents of a directory, port 8020 is enough... for instance,
using something

val p = new org.apache.hadoop.fs.Path("hdfs://localhost:8020/")
val fs = p.getFileSystem(sc.hadoopConfiguration)
fs.listStatus(p)

you should see the file list.
But then, when accessing a file, you need to actually get its blocks, it
has to connect to the data node.
The error 'could not obtain block' means it can't get that block from the
DataNode.
Refer to
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html
to see the complete list of ports that also need to be tunnelled.



2015-08-24 13:10 GMT-07:00 Dino Fancellu :

> Changing the ip to the guest IP address just never connects.
>
> The VM has port tunnelling, and it passes through all the main ports,
> 8020 included to the host VM.
>
> You can tell that it was talking to the guest VM before, simply
> because it said when file not found
>
> Error is:
>
> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
> file=/tmp/people.txt
>
> but I have no idea what it means by that. It certainly can find the
> file and knows it exists.
>
>
>
> On 24 August 2015 at 20:43, Roberto Congiu 
> wrote:
> > When you launch your HDP guest VM, most likely it gets launched with NAT
> and
> > an address on a private network (192.168.x.x) so on your windows host you
> > should use that address (you can find out using ifconfig on the guest
> OS).
> > I usually add an entry to my /etc/hosts for VMs that I use oftenif
> you
> > use vagrant, there's also a vagrant module that can do that
> automatically.
> > Also, I am not sure how the default HDP VM is set up, that is, if it only
> > binds HDFS to 127.0.0.1 or to all addresses. You can check that with
> netstat
> > -a.
> >
> > R.
> >
> > 2015-08-24 11:46 GMT-07:00 Dino Fancellu :
> >>
> >> I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.
> >>
> >> If I go into the guest spark-shell and refer to the file thus, it works
> >> fine
> >>
> >>   val words=sc.textFile("hdfs:///tmp/people.txt")
> >>   words.count
> >>
> >> However if I try to access it from a local Spark app on my Windows host,
> >> it
> >> doesn't work
> >>
> >>   val conf = new SparkConf().setMaster("local").setAppName("My App")
> >>   val sc = new SparkContext(conf)
> >>
> >>   val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt")
> >>   words.count
> >>
> >> Emits
> >>
> >>
> >>
> >> The port 8020 is open, and if I choose the wrong file name, it will tell
> >> me
> >>
> >>
> >>
> >> My pom has
> >>
> >> 
> >> org.apache.spark
> >> spark-core_2.11
> >> 1.4.1
> >> provided
> >> 
> >>
> >> Am I doing something wrong?
> >>
> >> Thanks.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>


Re: Spark stages very slow to complete

2015-08-25 Thread Olivier Girardot
I have pretty much the same "symptoms" - the computation itself is pretty
fast, but most of my computation is spent in JavaToPython steps (~15min).
I'm using the Spark 1.5.0-rc1 with DataFrame and ML Pipelines.
Any insights into what these steps are exactly ?

2015-06-02 9:18 GMT+02:00 Karlson :

> Hi, the code is some hundreds lines of Python. I can try to compose a
> minimal example as soon as I find the time, though. Any ideas until then?
>
>
> Would you mind posting the code?
>> On 2 Jun 2015 00:53, "Karlson"  wrote:
>>
>> Hi,
>>>
>>> In all (pyspark) Spark jobs, that become somewhat more involved, I am
>>> experiencing the issue that some stages take a very long time to complete
>>> and sometimes don't at all. This clearly correlates with the size of my
>>> input data. Looking at the stage details for one such stage, I am
>>> wondering
>>> where Spark spends all this time. Take this table of the stages task
>>> metrics for example:
>>>
>>> Metric  Min 25th
>>> percentile  Median  75th percentile Max
>>> Duration1.4 min 1.5 min 1.7 min
>>>  1.9 min 2.3 min
>>> Scheduler Delay 1 ms3 ms4 ms
>>>   5 ms23 ms
>>> Task Deserialization Time   1 ms2 ms3 ms
>>>   8 ms22 ms
>>> GC Time 0 ms0 ms0 ms
>>>   0 ms0 ms
>>> Result Serialization Time   0 ms0 ms0 ms
>>>   0 ms1 ms
>>> Getting Result Time 0 ms0 ms0 ms
>>>   0 ms0 ms
>>> Input Size / Records23.9 KB / 1 24.0 KB / 1 24.1 KB /
>>> 1 24.1 KB / 1 24.3 KB / 1
>>>
>>> Why is the overall duration almost 2min? Where is all this time spent,
>>> when no progress of the stages is visible? The progress bar simply
>>> displays
>>> 0 succeeded tasks for a very long time before sometimes slowly
>>> progressing.
>>>
>>> Also, the name of the stage displayed above is `javaToPython at null:-1`,
>>> which I find very uninformative. I don't even know which action exactly
>>> is
>>> responsible for this stage. Does anyone experience similar issues or have
>>> any advice for me?
>>>
>>> Thanks!
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
*Olivier Girardot* | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94


Re: What does Attribute and AttributeReference mean in Spark SQL

2015-08-25 Thread Michael Armbrust
Attribute is the Catalyst name for an input column from a child operator.
An AttributeReference has been resolved, meaning we know which input column
in particular it is referring too.  An AttributeReference also has a known
DataType.  In contrast, before analysis there might still exist
UnresolvedReferences, which are just string identifiers from a parsed query.

An Expression can be more complex (like you suggested,  a + b), though
technically just a is also a very simple Expression.  The following console
session shows how these types are composed:

$ build/sbt sql/console
import org.apache.spark.SparkContextimport
org.apache.spark.sql.SQLContextimport
org.apache.spark.sql.catalyst.analysis._import
org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.dsl.expressions._import
org.apache.spark.sql.catalyst.dsl.plans._

sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5adfe37d
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@20d05227import
sqlContext.implicits._import sqlContext._Welcome to Scala version
2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45).Type in
expressions to have them evaluated.Type :help for more information.

scala> val unresolvedAttr: UnresolvedAttribute = 'a
unresolvedAttr: org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute = 'a

scala> val relation = LocalRelation('a.int)
relation: org.apache.spark.sql.catalyst.plans.logical.LocalRelation =
LocalRelation [a#0]

scala> val parsedQuery = relation.select(unresolvedAttr)
parsedQuery: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
'Project ['a]
 LocalRelation [a#0]

scala> parsedQuery.analyze
res11: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [a#0]
 LocalRelation [a#0]

The #0 after a is a unique identifier (within this JVM) that says where the
data is coming from, even as plans are rearranged due to optimizations.

On Mon, Aug 24, 2015 at 6:13 PM, Todd  wrote:

> There are many such kind of case class or concept such as
> Attribute/AttributeReference/Expression in Spark SQL
>
> I would ask what Attribute/AttributeReference/Expression mean, given a sql
> query like select a,b from c, it a,  b are two Attributes? a + b is an
> expression?
> Looks I misunderstand it because Attribute is extending Expression in the
> code,which means Attribute itself is an Expression.
>
>
> Thanks.
>


Re: Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Jeff Zhang
As I remember, you also need to change guava and jetty related dependency
to compile if you run to run SparkPi in intellij.



On Tue, Aug 25, 2015 at 3:15 PM, Hemant Bhanawat 
wrote:

> Go to the module settings of the project and in the dependencies section
> check the scope of scala jars. It would be either Test or Provided. Change
> it to compile and it should work. Check the following link to understand
> more about scope of modules:
>
>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html
>
>
>
> On Tue, Aug 25, 2015 at 12:18 PM, Todd  wrote:
>
>> I cloned the code from https://github.com/apache/spark to my machine. It
>> can compile successfully,
>> But when I run the sparkpi, it throws an exception below complaining the
>> scala.collection.Seq is not found.
>> I have installed scala2.10.4 in my machine, and use the default profiles:
>> window,scala2.10,maven-3,test-java-home.
>> In Idea, I can find that the Seq class is on my classpath:
>>
>>
>>
>>
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> scala/collection/Seq
>> at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
>> Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 6 more
>>
>>
>


-- 
Best Regards

Jeff Zhang


Re: Loading already existing tables in spark shell

2015-08-25 Thread Jeetendra Gangele
In spark shell "use database " not working saying use not found in the
shell?
did you ran this with scala shell ?

On 24 August 2015 at 18:26, Ishwardeep Singh  wrote:

> Hi Jeetendra,
>
>
> I faced this issue. I did not specify the database where this table
> exists. Please set the database by using "use " command before
> executing the query.
>
>
> Regards,
>
> Ishwardeep
>
> --
> *From:* Jeetendra Gangele 
> *Sent:* Monday, August 24, 2015 5:47 PM
> *To:* user
> *Subject:* Loading already existing tables in spark shell
>
> Hi All I have few tables in hive and I wanted to run query against them
> with spark as execution engine.
>
> Can I direct;y load these tables in spark shell and run query?
>
> I tried with
> 1.val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> 2.qlContext.sql("FROM event_impressions select count(*)") where
> event_impressions is the table name.
>
> It give me error saying "org.apache.spark.sql.AnalysisException: no such
> table event_impressions; line 1 pos 5"
>
> Does anybody hit similar issues?
>
>
> regards
> jeetendra
>
> --
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: How to set environment of worker applications

2015-08-25 Thread Hemant Bhanawat
Ok, I went in the direction of system vars since beginning probably because
the question was to pass variables to a particular job.

Anyway, the decision to use either system vars or environment vars would
solely depend on whether you want to make them available to all the spark
processes on a node or to a particular job.

Are there any other reasons why one would prefer one over the other?


On Mon, Aug 24, 2015 at 8:48 PM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> System properties and environment variables are two different things.. One
> can use spark.executor.extraJavaOptions to pass system properties and
> spark-env.sh to pass environment variables.
>
> -raghav
>
> On Mon, Aug 24, 2015 at 1:00 PM, Hemant Bhanawat 
> wrote:
>
>> That's surprising. Passing the environment variables using
>> spark.executor.extraJavaOptions=-Dmyenvvar=xxx to the executor and then
>> fetching them using System.getProperty("myenvvar") has worked for me.
>>
>> What is the error that you guys got?
>>
>> On Mon, Aug 24, 2015 at 12:10 AM, Sathish Kumaran Vairavelu <
>> vsathishkuma...@gmail.com> wrote:
>>
>>> spark-env.sh works for me in Spark 1.4 but not
>>> spark.executor.extraJavaOptions.
>>>
>>> On Sun, Aug 23, 2015 at 11:27 AM Raghavendra Pandey <
>>> raghavendra.pan...@gmail.com> wrote:
>>>
 I think the only way to pass on environment variables to worker node is
 to write it in spark-env.sh file on each worker node.

 On Sun, Aug 23, 2015 at 8:16 PM, Hemant Bhanawat 
 wrote:

> Check for spark.driver.extraJavaOptions and
> spark.executor.extraJavaOptions in the following article. I think you can
> use -D to pass system vars:
>
> spark.apache.org/docs/latest/configuration.html#runtime-environment
> Hi,
>
> I am starting a spark streaming job in standalone mode with
> spark-submit.
>
> Is there a way to make the UNIX environment variables with which
> spark-submit is started available to the processes started on the worker
> nodes?
>
> Jan
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

>>
>


Re: Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Hemant Bhanawat
Go to the module settings of the project and in the dependencies section
check the scope of scala jars. It would be either Test or Provided. Change
it to compile and it should work. Check the following link to understand
more about scope of modules:

https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html



On Tue, Aug 25, 2015 at 12:18 PM, Todd  wrote:

> I cloned the code from https://github.com/apache/spark to my machine. It
> can compile successfully,
> But when I run the sparkpi, it throws an exception below complaining the
> scala.collection.Seq is not found.
> I have installed scala2.10.4 in my machine, and use the default profiles:
> window,scala2.10,maven-3,test-java-home.
> In Idea, I can find that the Seq class is on my classpath:
>
>
>
>
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> scala/collection/Seq
> at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 6 more
>
>


Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Todd
I cloned the code from https://github.com/apache/spark to my machine. It can 
compile successfully,
But when I run the sparkpi, it throws an exception below complaining the 
scala.collection.Seq is not found.
I have installed scala2.10.4 in my machine, and use the default profiles: 
window,scala2.10,maven-3,test-java-home.
In Idea, I can find that the Seq class is on my classpath:





Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 6 more



<    1   2