RE: PySpark 1.6.1: 'builtin_function_or_method' object has no attribute '__code__' in Pickles

2016-07-30 Thread Joaquin Alzola
An example (adding a package to the spark submit):
bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 spark_v3.py


From: Bhaarat Sharma [mailto:bhaara...@gmail.com]
Sent: 30 July 2016 06:38
To: ayan guha 
Cc: user 
Subject: Re: PySpark 1.6.1: 'builtin_function_or_method' object has no 
attribute '__code__' in Pickles

I'm very new to Spark. Im running it on a single CentOS7 box. How would I add a 
test.py to spark submit? Point to any resources would be great. Thanks for your 
help.

On Sat, Jul 30, 2016 at 1:28 AM, ayan guha 
mailto:guha.a...@gmail.com>> wrote:
I think you need to add test.py in spark submit so that it gets shipped to all 
executors

On Sat, Jul 30, 2016 at 3:24 PM, Bhaarat Sharma 
mailto:bhaara...@gmail.com>> wrote:
I am using PySpark 1.6.1. In my python program I'm using ctypes and trying to 
load the liblept library via the liblept.so.4.0.2 file on my system.

While trying to load the library via cdll.LoadLibrary("liblept.so.4.0.2") I get 
an error : 'builtin_function_or_method' object has no attribute '__code__'

Here are my files

test.py


from ctypes import *



class FooBar:

def __init__(self, options=None, **kwargs):

if options is not None:

self.options = options



def read_image_from_bytes(self, bytes):

return "img"



def text_from_image(self, img):

self.leptonica = cdll.LoadLibrary("liblept.so.4.0.2")

return "test from foobar"



spark.py

from pyspark import SparkContext

import test

import numpy as np

sc = SparkContext("local", "test")

foo = test.FooBar()



def file_bytes(rawdata):

return np.asarray(bytearray(rawdata),dtype=np.uint8)



def do_some_with_bytes(bytes):

return foo.do_something_on_image(foo.read_image_from_bytes(bytes))



images = sc.binaryFiles("/myimages/*.jpg")

image_to_text = lambda rawdata: do_some_with_bytes(file_bytes(rawdata))

print images.values().map(image_to_text).take(1) #this gives an error



What is the way to load this library?



--
Best Regards,
Ayan Guha

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Understanding spark concepts cluster, master, slave, job, stage, worker, executor, task

2016-07-21 Thread Joaquin Alzola
You have the same as link 1 but in English?

  *   
spark-questions-concepts
  *   deep-into-spark-exection-model 

Seems really interesting post but in Chinese. I suppose google translate suck 
on the translation.


From: Taotao.Li [mailto:charles.up...@gmail.com]
Sent: 21 July 2016 04:04
To: Jean Georges Perrin 
Cc: Sachin Mittal ; user 
Subject: Re: Understanding spark concepts cluster, master, slave, job, stage, 
worker, executor, task

Hi, Sachin,  here are two posts about the basic concepts about spark:


  *   
spark-questions-concepts
  *   deep-into-spark-exection-model 


And, I fully recommend databrick's post: 
https://databricks.com/blog/2016/06/22/apache-spark-key-terms-explained.html


On Thu, Jul 21, 2016 at 1:36 AM, Jean Georges Perrin 
mailto:j...@jgp.net>> wrote:
Hey,

I love when questions are numbered, it's easier :)

1) Yes (but I am not an expert)
2) You don't control... One of my process is going to 8k tasks, so...
3) Yes, if you have HT, it double. My servers have 12 cores, but HT, so it 
makes 24.
4) From my understanding: Slave is the logical computational unit and Worker is 
really the one doing the job.
5) Dunnoh
6) Dunnoh

On Jul 20, 2016, at 1:30 PM, Sachin Mittal 
mailto:sjmit...@gmail.com>> wrote:

Hi,
I was able to build and run my spark application via spark submit.
I have understood some of the concepts by going through the resources at 
https://spark.apache.org but few doubts still 
remain. I have few specific questions and would be glad if someone could share 
some light on it.
So I submitted the application using spark.masterlocal[*] and I have a 8 
core PC.

- What I understand is that application is called as job. Since mine had two 
stages it gets divided into 2 stages and each stage had number of tasks which 
ran in parallel.
Is this understanding correct.

- What I notice is that each stage is further divided into 262 tasks From where 
did this number 262 came from. Is this configurable. Would increasing this 
number improve performance.
- Also I see that the tasks are run in parallel in set of 8. Is this because I 
have a 8 core PC.
- What is the difference or relation between slave and worker. When I did 
spark-submit did it start 8 slaves or worker threads?
- I see all worker threads running in one single JVM. Is this because I did not 
start  slaves separately and connect it to a single master cluster manager. If 
I had done that then each worker would have run in its own JVM.
- What is the relationship between worker and executor. Can a worker have more 
than one executors? If yes then how do we configure that. Does all executor run 
in the worker JVM and are independent threads.
I suppose that is all for now. Would appreciate any response.Will add followup 
questions if any.
Thanks
Sachin





--
___
Quant | Engineer | Boy
___
blog:
http://litaotao.github.io
github: www.github.com/litaotao
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: run spark apps in linux crontab

2016-07-20 Thread Joaquin Alzola

Remember that the you need to souce your .bashrc
For your PATH to be set up.

From: luohui20...@sina.com [mailto:luohui20...@sina.com]
Sent: 20 July 2016 11:01
To: user 
Subject: run spark apps in linux crontab

hi guys:
  I add a spark-submit job into my Linux crontab list by the means below 
,however none of them works. If I change it to a normal shell script, it is ok. 
I don't quite understand why. I checked the 8080 web ui of my spark cluster, no 
job submitted, and there is not messages in /home/hadoop/log.
  Any idea is welcome.

[hadoop@master ~]$ crontab -e
1.
22 21 * * * sh /home/hadoop/shellscripts/run4.sh > /home/hadoop/log

and in run4.sh,it wrote:
$SPARK_HOME/bin/spark-submit --class com.abc.myclass --total-executor-cores 10 
--jars $SPARK_HOME/lib/MyDep.jar $SPARK_HOME/MyJar.jar  > /home/hadoop/log

2.
22 21 * * * $SPARK_HOME/bin/spark-submit --class com.abc.myclass 
--total-executor-cores 10 --jars $SPARK_HOME/lib/MyDep.jar 
$SPARK_HOME/MyJar.jar  > /home/hadoop/log

3.
22 21 * * * /usr/lib/spark/bin/spark-submit --class com.abc.myclass 
--total-executor-cores 10 --jars /usr/lib/spark/lib/MyDep.jar 
/usr/lib/spark/MyJar.jar  > /home/hadoop/log

4.
22 21 * * * hadoop /usr/lib/spark/bin/spark-submit --class com.abc.myclass 
--total-executor-cores 10 --jars /usr/lib/spark/lib/MyDep.jar 
/usr/lib/spark/MyJar.jar  > /home/hadoop/log



Thanks&Best regards!
San.Luo
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-15 Thread Joaquin Alzola
It is on the 20th (Wednesday) next week.

From: Marco Mistroni [mailto:mmistr...@gmail.com]
Sent: 15 July 2016 11:04
To: Mich Talebzadeh 
Cc: user @spark ; user 
Subject: Re: Presentation in London: Running Spark on Hive or Hive on Spark

Dr Mich
  do you have any slides or videos available for the presentation you did 
@Canary Wharf?
kindest regards
 marco

On Wed, Jul 6, 2016 at 10:37 PM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Dear forum members


I will be presenting on the topic of "Running Spark on Hive or Hive on Spark, 
your mileage varies" in Future of Data: 
London

Details

Organized by: Hortonworks

Date: Wednesday, July 20, 2016, 6:00 PM to 8:30 PM

Place: London

Location: One Canada Square, Canary Wharf,  London E14 5AB.

Nearest Underground:  Canary Warf 
(map)

If you are interested please register 
here

Looking forward to seeing those who can make it to have an interesting 
discussion and leverage your experience.
Regards,


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


JAr files into python3

2016-07-03 Thread Joaquin Alzola
HI List,


I have the following script which will be used in Spark.



#!/usr/bin/env python3

from pyspark_cassandra import CassandraSparkContext, Row

from pyspark import SparkContext, SparkConf

from pyspark.sql import SQLContext

import os



os.environ['CLASSPATH']="/mnt/spark/lib"



conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")

sc = CassandraSparkContext(conf=conf) sqlContext = SQLContext(sc)

df = 
sqlContext.read.format("org.apache.spark.sql.cassandra").options(keyspace="lebara_diameter_codes",
 table="nl_lebara_diameter_codes").load()

list = df.select("errorcode2001").where("errorcode2001 > 1200").collect()

list2 = df.select("date").collect()

print([i for i in list[0]])

print(type(list[0]))



The error that it throws is the following one (which is logical because I do 
not load the jar files):

py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.

: java.lang.ClassNotFoundException: Failed to find data source: 
org.apache.spark.sql.cassandra. Please find packages at 
http://spark-packages.org



Is there a way to load those jar files into python or the classpath when 
calling sqlContext.read.format("org.apache.spark.sql.cassandra")?



Or on the other hand I have to create python scripts with #!/usr/bin/env 
pyspark?



BR



Joaqun



This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Remote RPC client disassociated

2016-07-01 Thread Joaquin Alzola
HI Akhil

I am using:
Cassandra: 3.0.5
Spark: 1.6.1
Scala 2.10
Spark-cassandra connector: 1.6.0

From: Akhil Das [mailto:ak...@hacked.work]
Sent: 01 July 2016 11:38
To: Joaquin Alzola 
Cc: user@spark.apache.org
Subject: Re: Remote RPC client disassociated

This looks like a version conflict, which version of spark are you using? The 
Cassandra connector you are using is for Scala 2.10x and Spark 1.6 version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola 
mailto:joaquin.alz...@lebara.com>> wrote:
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars 
/mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077<http://192.168.23.31:7077>").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>  (0 + 3) / 
7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 7) / 
7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 5) / 
7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 4) / 
7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, 
in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in 
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 
14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running 
tasks) Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAG

RE: Spark jobs

2016-06-30 Thread Joaquin Alzola
HI Sujeet,

Thinking that might not work

Running this:
#!/usr/bin/env python3
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = 
sqlContext.read.format("org.apache.spark.sql.cassandra").options(keyspace="lebara_diameter_codes",
 table="nl_lebara_diameter_codes").load()
list = df.select("errorcode2001").where("errorcode2001 > 1200").collect()
list2 = df.select("date").collect()
print([i for i in list[0]])
print(type(list[0]))

of course show this error:
py4j.protocol.Py4JJavaError: An error occurred while calling o29.load.
: java.lang.ClassNotFoundException: Failed to find data source: 
org.apache.spark.sql.cassandra. Please find packages at 
http://spark-packages.org

Is there a way to load up those jar files into the script

Jo

From: sujeet jog [mailto:sujeet@gmail.com]
Sent: 29 June 2016 14:51
To: Joaquin Alzola ; user 
Subject: Re: Spark jobs

check if this helps,

from multiprocessing import Process

def training() :
print ("Training Workflow")

cmd = spark/bin/spark-submit  ./ml.py & "
os.system(cmd)

w_training  = Process(target = training)



On Wed, Jun 29, 2016 at 6:28 PM, Joaquin Alzola 
mailto:joaquin.alz...@lebara.com>> wrote:
Hi,

This is a totally newbie question but I seem not to find the link ….. when I 
create a spark-submit python script to be launch …

how should I call it from the main python script with a subprocess.popen?

BR

Joaquin






This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Remote RPC client disassociated

2016-06-30 Thread Joaquin Alzola
>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout 
>>> writer for python
java.lang.AbstractMethodError: 
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>> You are trying to call an abstract method.  Please check the method 
>> DeferringRowReader.read

Do not know how to fix this issue.
Have seen in many tutorials around the net and those ones made the same calling 
I am currently doing

from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()

I am really new to this psark thing. Was able to configure it correctly nd now 
learning the API.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Remote RPC client disassociated

2016-06-30 Thread Joaquin Alzola
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages 
com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars 
/mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = 
SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host",
 "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>  (0 + 3) / 
7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 7) / 
7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 5) / 
7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
[Stage 0:>  (0 + 4) / 
7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC 
client disassociated. Likely due to containers exceeding thresholds, or network 
issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in 
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, 
in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in 
get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 
14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running 
tasks) Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apa

RE: Unsubscribe - 3rd time

2016-06-29 Thread Joaquin Alzola
And 3rd time is not enough to know that unsubscribe is done through --> 
user-unsubscr...@spark.apache.org

From: Steve Florence [mailto:sflore...@ypm.com]
Sent: 29 June 2016 16:47
To: user@spark.apache.org
Subject: Unsubscribe - 3rd time


This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Spark jobs

2016-06-29 Thread Joaquin Alzola
Hi,

This is a totally newbie question but I seem not to find the link . when I 
create a spark-submit python script to be launch ...

how should I call it from the main python script with a subprocess.popen?

BR

Joaquin






This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Spark-Cassandra connector

2016-06-21 Thread Joaquin Alzola
Hi List

I am trying to install the Spark-Cassandra connector through maven or sbt but 
neither works.
Both of them try to connect to the Internet (which I do not have connection) to 
download certain files.

Is there a way to install the files manually?

I downloaded from the maven repository --> 
spark-cassandra-connector_2.10-1.6.0.jar
Which is the version of Scala and Spark that I have ... but where to put it?

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.