Re: Unable to ship external Python libraries in PYSPARK

2014-09-16 Thread daijia
Is there some way to ship textfile just like ship python libraries?

Thanks in advance
Daijia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074p14412.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to submit Pyspark job in mesos?

2014-07-30 Thread daijia
:21.963773 20042 status_update_manager.cpp:368] Forwarding status
update TASK_LOST (UUID: 84107fc4-d997-4e9c-a256-00d30e5eb4f4) for task 5 of
framework 20140730-165621-1526966464-5050-23977- to
master@192.168.3.91:5050
I0730 16:57:21.966195 20042 status_update_manager.cpp:393] Received status
update acknowledgement (UUID: 84107fc4-d997-4e9c-a256-00d30e5eb4f4) for task
5 of framework 20140730-165621-1526966464-5050-23977-
I0730 16:57:21.966434 20042 slave.cpp:2198] Cleaning up executor '5' of
framework 20140730-165621-1526966464-5050-23977-
I0730 16:57:21.966717 20049 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20140730-154530-1526966464-5050-22832-1/frameworks/20140730-165621-1526966464-5050-23977-/executors/5/runs/33dedf06-507b-4f0f-b59b-7890f876d3b4'
for gc 6.8881231704days in the future
I0730 16:57:21.966872 20042 slave.cpp:2273] Cleaning up framework
20140730-165621-1526966464-5050-23977-
I0730 16:57:21.967042 20049 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20140730-154530-1526966464-5050-22832-1/frameworks/20140730-165621-1526966464-5050-23977-/executors/5'
for gc 6.8880958518days in the future
I0730 16:57:21.967258 20049 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20140730-154530-1526966464-5050-22832-1/frameworks/20140730-165621-1526966464-5050-23977-'
for gc 6.8880614519days in the future
I0730 16:57:21.967341 20042 status_update_manager.cpp:277] Closing status
update streams for framework 20140730-165621-1526966464-5050-23977-



spark running console print:
14/07/30 16:56:48 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4041
14/07/30 16:56:48 INFO ui.SparkUI: Started SparkUI at http://CentOS-19:4041
14/07/30 16:56:48 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/30 16:56:49 INFO scheduler.EventLoggingListener: Logging events to
/tmp/spark-events/my_test.py-1406710609033
14/07/30 16:56:49 INFO util.Utils: Copying
/home/daijia/deal_three_word/my_test.py to
/tmp/spark-c8e9af2f-32b5-4bf0-9f57-c46dc82a4450/my_test.py
14/07/30 16:56:49 INFO spark.SparkContext: Added file
file:/home/daijia/deal_three_word/my_test.py at
http://192.168.3.91:42379/files/my_test.py with timestamp 1406710609772
I0730 16:56:49.882772 24123 sched.cpp:121] Version: 0.18.1
I0730 16:56:49.884660 24131 sched.cpp:217] New master detected at
master@192.168.3.91:5050
I0730 16:56:49.884770 24131 sched.cpp:225] No credentials provided.
Attempting to register without authentication
I0730 16:56:49.885520 24131 sched.cpp:391] Framework registered with
20140730-165621-1526966464-5050-23977-
14/07/30 16:56:49 INFO mesos.CoarseMesosSchedulerBackend: Registered as
framework ID 20140730-165621-1526966464-5050-23977-
14/07/30 16:56:50 INFO spark.SparkContext: Starting job: count at
/home/daijia/deal_three_word/my_test.py:27
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Got job 0 (count at
/home/daijia/deal_three_word/my_test.py:27) with 2 output partitions
(allowLocal=false)
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at
/home/daijia/deal_three_word/my_test.py:27)
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Submitting Stage 0
(PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
14/07/30 16:56:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
14/07/30 16:56:50 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with
2 tasks
14/07/30 16:56:55 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 0 is
now TASK_LOST
14/07/30 16:57:00 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 1 is
now TASK_LOST
14/07/30 16:57:00 INFO mesos.CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: 20140730-154530-1526966464-5050-22832-2
 due to too many failures; is Spark installed on it?
14/07/30 16:57:05 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 2 is
now TASK_LOST
14/07/30 16:57:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/07/30 16:57:11 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 3 is
now TASK_LOST
14/07/30 16:57:11 INFO mesos.CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: 20140730-154530-1526966464-5050-22832-0
 due to too many failures; is Spark installed on it?
14/07/30 16:57:16 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 4 is
now TASK_LOST
14/07/30 16:57:20 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/07/30 16:57:21 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 5 is
now TASK_LOST
14/07/30 16:57:21 INFO mesos.CoarseMesosSchedulerBackend

How to submit Pyspark job in mesos?

2014-07-29 Thread daijia
Dear all, 

   I have spark1.0.0 and mesos0.18.1. After setting in mesos and spark
and starting the mesos cluster, I try to run the pyspark job by the command
below:

   spark-submit /path/to/my_pyspark_job.py  --master
mesos://192.168.0.21:5050
   
   It occurs error below:

14/07/29 18:40:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/29 18:40:49 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4041
14/07/29 18:40:49 INFO ui.SparkUI: Started SparkUI at http://CentOS-19:4041
14/07/29 18:40:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/29 18:40:50 INFO scheduler.EventLoggingListener: Logging events to
/tmp/spark-events/my_test.py-1406630449771
14/07/29 18:40:50 INFO util.Utils: Copying
/home/daijia/deal_three_word/my_test.py to
/tmp/spark-4365b01d-b57a-4abb-b39c-cb57b83a28ce/my_test.py
14/07/29 18:40:50 INFO spark.SparkContext: Added file
file:/home/daijia/deal_three_word/my_test.py at
http://192.168.3.91:51188/files/my_test.py with timestamp 1406630450333
I0729 18:40:50.440551 15033 sched.cpp:121] Version: 0.18.1
I0729 18:40:50.442450 15035 sched.cpp:217] New master detected at
master@192.168.3.91:5050
I0729 18:40:50.442570 15035 sched.cpp:225] No credentials provided.
Attempting to register without authentication
I0729 18:40:50.443234 15036 sched.cpp:391] Framework registered with
20140729-174911-1526966464-5050-13758-0006
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Registered as
framework ID 20140729-174911-1526966464-5050-13758-0006
14/07/29 18:40:50 INFO spark.SparkContext: Starting job: count at
/home/daijia/deal_three_word/my_test.py:27
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 0 is
now TASK_LOST
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 1 is
now TASK_LOST
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 3 is
now TASK_LOST
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: 20140729-163345-1526966464-5050-10913-0
 due to too many failures; is Spark installed on it?
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 2 is
now TASK_LOST
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: 20140729-163345-1526966464-5050-10913-2
 due to too many failures; is Spark installed on it?
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Got job 0 (count at
/home/daijia/deal_three_word/my_test.py:27) with 2 output partitions
(allowLocal=false)
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Final stage: Stage 0(count at
/home/daijia/deal_three_word/my_test.py:27)
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Missing parents: List()
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 4 is
now TASK_LOST
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Submitting Stage 0
(PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 5 is
now TASK_LOST
14/07/29 18:40:50 INFO mesos.CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: 20140729-163345-1526966464-5050-10913-1
 due to too many failures; is Spark installed on it?
14/07/29 18:40:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
14/07/29 18:40:50 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with
2 tasks
14/07/29 18:41:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/07/29 18:41:20 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/07/29 18:41:20 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory

 It just repeats the last message.
 Here is my python scirpt: 

#!/usr/bin/env python
#coding=utf-8
from pyspark import SparkContext
sc = SparkContext()
temp = []
for index in range(1000):
temp.append(index)
sc.parallelize(temp).count()


So, the running command is right? Or some other reasons lead to the
problem.

Thanks in advance,
Daijia










--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-submit-Pyspark-job-in-mesos-tp10905.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: How to submit Pyspark job in mesos?

2014-07-29 Thread daijia

Actually, it runs okay in my slaves deployed by standalone mode.
When I switch to mesos, the error just occurs.

Anyway, thanks for your reply and any ideas will help.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-submit-Pyspark-job-in-mesos-tp10905p10918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.