Re: 回复:Spark-submit Problems

2016-10-16 Thread Sean Owen
Is it just a typo in the email or are you missing a space after your
--master argument?


The logs here actually don't say much but "something went wrong". It seems
fairly low-level, like the gateway process failed or didn't start, rather
than a problem with the program. It's hard to say more unless you can dig
out any more logs, like from the worker, executor?

On Sun, Oct 16, 2016 at 4:24 AM Tobi Bosede  wrote:

Hi Mekal, thanks for wanting to help. I have attached the python script as
well as the different exceptions here. I have also pasted the cluster
exception below so I can highlight the relevant parts.


[abosede2@badboy ~]$ spark-submit --master spark://10.160.5.48:7077
trade_data_count.py
Ivy Default Cache set to: /home/abosede2/.ivy2/cache
The jars for the packages stored in: /home/abosede2/.ivy2/jars
:: loading settings :: url =
jar:file:/usr/local/spark-1.6.1/assembly/target/scala-2.11/spark-assembly-1.6.1-hre/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.11;1.3.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
:: resolution report :: resolve 160ms :: artifacts dl 7ms
:: modules in use:
com.databricks#spark-csv_2.11;1.3.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
-
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
-
| default | 3 | 0 | 0 | 0 || 3 | 0 |
-
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/6ms)
[Stage 0:> (104 + 8) /235]16/10/15 19:42:37 ERROR
TaskScadboy.win.ad.jhu.edu : Remote RPC
client disassociated. Likely due to containers exceeding thresholds, or
netwoWARN messages.
16/10/15 19:42:37 ERROR SparkDeploySchedulerBackend: Application has been
killed. Reason: Master removed our a
[Stage 0:===> (104 + -28) / 235]Traceback (most recent
call la
File "/home/abosede2/trade_data_count.py", line 79, in 
print("Raw data is %d rows." % data.count())
File
"/usr/local/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
line 269, in count
File
"/usr/lib/python2.7/site-packages/py4j-0.9.2-py2.7.egg/py4j/java_gateway.py",
line 836, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/spark-1.6.1/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 45, in deco
File
"/usr/lib/python2.7/site-packages/py4j-0.9.2-py2.7.egg/py4j/protocol.py",
line 310, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o6867.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Master
removed our application: FAILED
at org.apache.spark.scheduler.DAGScheduler.org

$apache$spark$scheduler$DAGScheduler$$failJobAndIndepend)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799
at scala.Option.foreach(Option.scala:257)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at

Re: 回复:Spark-submit Problems

2016-10-15 Thread Tobi Bosede
lect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(
RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(
RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.SparkPlan.
executeCollect(SparkPlan.scala:166)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(
SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$
spark$sql$DataFrame$$execute$1$1.apply(DataFrame
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$
spark$sql$DataFrame$$execute$1$1.apply(DataFrame
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(
DataFrame.scala:2086)
at org.apache.spark.sql.DataFrame.org
<http://org.apache.spark.sql.dataframe.org/>$apache$spark$
sql$DataFrame$$execute$1(DataFrame.scala:1498)
at org.apache.spark.sql.DataFrame.org
<http://org.apache.spark.sql.dataframe.org/>$apache$spark$
sql$DataFrame$$collect(DataFrame.scala:1505)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.
apply(DataFrame.scala:1515)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.
apply(DataFrame.scala:1514)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1514)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(
ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.
java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Unknown Source)

On Sat, Oct 15, 2016 at 10:06 PM, Mekal Zheng <mekal.zh...@gmail.com> wrote:

> Show me your code
>
>
> 2016年10月16日 +0800 08:24 hxfeng <980548...@qq.com>,写道:
>
> show you pi.py code and what is  the exception message?
>
>
> -- 原始邮件 --
> *发件人:* "Tobi Bosede";<ani.to...@gmail.com>;
> *发送时间:* 2016年10月16日(星期天) 上午8:04
> *收件人:* "user"<user@spark.apache.org>;
> *主题:* Spark-submit Problems
>
> Hi everyone,
>
> I am having problems submitting an app through spark-submit when the
> master is not "local". However the pi.py example which comes with Spark
> works with any master. I believe my script has the same structure as pi.py,
> but for some reason my script is not as flexible. Specifically, the failure
> occurs when count() is called. Count is the first action in the script.
> Also, Spark complains that is is losing executors however, interactively in
> Jupyter, everything works perfectly with any master passed to spark conf.
>
> Does anyone know what might be happening? Is there anywhere I can look up
> the requirements for spark-submit scripts?
>
> Thanks,
> Tobi
>
>

[abosede2@badboy ~]$ spark-submit --master spark://10.160.5.48:7077 
trade_data_count.py
Ivy Default Cache set to: /home/abosede2/.ivy2/cache
The jars for the packages stored in: /home/abosede2/.ivy2/jars
:: loading settings :: url = 
jar:file:/usr/local/spark-1.6.1/assembly/target/scala-2.11/spark-assembly-1.6.1-hre/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.11;1.3.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
:: resolution report :: resolve 160ms :: artifacts dl 7ms
:: modules in use:
com.databricks#spark-csv_2.11;1.3.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   3   |   0   |   0   |   0   ||   3   |   0   |
-
:: retrieving :: org.apache.s

Re: 回复:Spark-submit Problems

2016-10-15 Thread Mekal Zheng
Show me your code

2016年10月16日 +0800 08:24 hxfeng <980548...@qq.com>,写道:
> show you pi.py code and what is the exception message?
>
>
> -- 原始邮件 --
> 发件人: "Tobi Bosede";<ani.to...@gmail.com>;
> 发送时间: 2016年10月16日(星期天) 上午8:04
> 收件人: "user"<user@spark.apache.org>;
>
> 主题: Spark-submit Problems
>
>
> Hi everyone,
>
> I am having problems submitting an app through spark-submit when the master 
> is not "local". However the pi.py example which comes with Spark works with 
> any master. I believe my script has the same structure as pi.py, but for some 
> reason my script is not as flexible. Specifically, the failure occurs when 
> count() is called. Count is the first action in the script. Also, Spark 
> complains that is is losing executors however, interactively in Jupyter, 
> everything works perfectly with any master passed to spark conf.
>
> Does anyone know what might be happening? Is there anywhere I can look up the 
> requirements for spark-submit scripts?
>
> Thanks,
> Tobi
>
>



??????Spark-submit Problems

2016-10-15 Thread hxfeng
show you pi.py code and what is  the exception message?




--  --
??: "Tobi Bosede";<ani.to...@gmail.com>;
: 2016??10??16??(??) 8:04
??: "user"<user@spark.apache.org>; 

: Spark-submit Problems



Hi everyone,

I am having problems submitting an app through spark-submit when the master is 
not "local". However the pi.py example which comes with Spark works with any 
master. I believe my script has the same structure as pi.py, but for some 
reason my script is not as flexible. Specifically, the failure occurs when 
count() is called. Count is the first action in the script. Also, Spark 
complains that is is losing executors however, interactively in Jupyter, 
everything works perfectly with any master passed to spark conf. 


Does anyone know what might be happening? Is there anywhere I can look up the 
requirements for spark-submit scripts?


Thanks,
Tobi

Spark-submit Problems

2016-10-15 Thread Tobi Bosede
Hi everyone,

I am having problems submitting an app through spark-submit when the master
is not "local". However the pi.py example which comes with Spark works with
any master. I believe my script has the same structure as pi.py, but for
some reason my script is not as flexible. Specifically, the failure occurs
when count() is called. Count is the first action in the script. Also,
Spark complains that is is losing executors however, interactively in
Jupyter, everything works perfectly with any master passed to spark conf.

Does anyone know what might be happening? Is there anywhere I can look up
the requirements for spark-submit scripts?

Thanks,
Tobi


spark-submit problems with --packages and --deploy-mode cluster

2015-12-11 Thread Greg Hill
I'm using Spark 1.5.0 with the standalone scheduler, and for the life of me I 
can't figure out why this isn't working.  I have an application that works fine 
with --deploy-mode client that I'm trying to get to run in cluster mode so I 
can use --supervise.  I ran into a few issues with my configuration that I had 
to sort out (classpath stuff mostly), but now I'm stumped.  We rely on the 
databricks spark csv plugin.  We're loading that using --packages 
"com.databricks:spark-csv_2.11:1.2.0".  This works without issue in client 
mode, but when run in cluster mode, it tries to load the spark-csv jar from 
/root/.ivy2 and fails because that folder doesn't exist on the slave node that 
ends up running the driver.  Does --packages not work when the driver is loaded 
on the cluster?  Does it download the jars in the client before loading the 
driver on the cluster and doesn't pass along the downloaded JARs?

Here's my stderr output:

https://gist.github.com/jimbobhickville/1f10b3508ef946eccb92

Thanks in advance for any suggestions.

Greg