Re: PySpark with OpenCV causes python worker to crash

2015-06-04 Thread Sam Stoelinga
n Tue, Jun 2, 2015 at 5:06 AM, Davies Liu wrote: > Could you run the single thread version in worker machine to make sure > that OpenCV is installed and configured correctly? > > On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga > wrote: > > I've verified the issue lies

Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
ay = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) sift = cv2.xfeatures2d.SIFT_create() kp, descriptors = sift.detectAndCompute(gray, None) return (imgfilename, "test") And corresponding tests.py: https://gist.github.com/samos123/d383c26f6d47d34d32d6 On Sat, May 30, 2015 at 8:04 PM, Sam

Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
> > If the bytes came from sequenceFile() is broken, it's easy to crash a > C library in Python (OpenCV). > > On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga > wrote: > > Hi sparkers, > > > > I am working on a PySpark application which uses the OpenCV libr

PySpark with OpenCV causes python worker to crash

2015-05-28 Thread Sam Stoelinga
error message taken from STDERR of the worker log: https://gist.github.com/samos123/3300191684aee7fc8013 Would like pointers or tips on how to debug further? Would be nice to know the reason why the worker crashed. Thanks, Sam Stoelinga org.apache.spark.SparkException: Python worker exited

Re: MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
Guys, great feedback by pointing out my stupidity :D Rows and columns got intermixed hence the weird results I was seeing. Ignore my previous issues will reformat my data first. On Wed, Apr 29, 2015 at 8:47 PM, Sam Stoelinga wrote: > I'm mostly using example code, see here

Re: MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
15 at 8:00 PM, Jeetendra Gangele wrote: > How you are passing feature vector to K means? > its in 2-D space of 1-D array? > > Did you try using Streaming Kmeans? > > will you be able to paste code here? > > On 29 April 2015 at 17:23, Sam Stoelinga wrote: > >>

MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
ut too much problems. Looking forward to hear you point out my stupidity or provide work-arounds that could make Spark KMeans work well on large datasets. Regards, Sam Stoelinga

monit with spark

2015-02-15 Thread Mike Sam
We want to monitor spark master and spark slaves using monit but we want to use the sbin scripts to do so. The scripts create the spark master and salve processes independent from themselves so monit would not know the started processed pid to watch. Is this correct? Should we watch the ports? How

Strategy to automatically configure spark workers env params in standalone mode

2015-02-14 Thread Mike Sam
We are planning to use varying servers spec (32 GB, 64GB, 244GB RAM or even higher and varying cores) for an standalone deployment of spark but we do not know the spec of the server ahead of time and we need to script up some logic that will run on the server on boot and automatically set the follo

Re: Spark (Streaming?) holding on to Mesos resources

2015-01-27 Thread Sam Bessalah
Hi Geraard, isn't this the same issueas this? https://issues.apache.org/jira/browse/MESOS-1688 On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas wrote: > Hi, > > We are observing with certain regularity that our Spark jobs, as Mesos > framework, are hoarding resources and not releasing them, resulti

Spark response times for queries seem slow

2015-01-05 Thread Sam Flint
ondering if there is a configuration that needs to be tweaked or if this is expected response time. Machines are 30g RAM and 4 cores. Seems the CPU's are just getting pegged and that is what is taking so long. Any help on this would be amazing. Thanks, -- *MAGNE**+**I**C* *Sam Flint

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-01-05 Thread Sam Flint
er.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:

Strange results of running Spark GenSort.scala

2014-12-28 Thread Sam Liu
ition), but NOT 100 GB. Why? Thanks! ---- Sam Liu

Spark Sql on Yarn using python

2014-12-16 Thread Sam Flint
I have tested my python script by using the pyspark shell. I run into an error because of memory limits on the name node. I am wondering how I run the script no spark yarn. I am not familiar with this at all. Any help would be greatly appreciated. Thanks, -- *MAGNE**+**I**C* *Sam Flint

Actor System Corrupted!

2014-12-10 Thread Stephen Samuel (Sam)
Hi all, Having a strange issue that I can't find any previous issues for on the mailing list or stack overflow. Frequently we are getting "ACTOR SYSTEM CORRUPTED!! A Dispatcher can't have less than 0 inhabitants!" with a stack trace, from akka, in the executor logs, and the executor is marked as

Re: NEW to spark and sparksql

2014-11-20 Thread Sam Flint
ontains all the data. > > On Wed, Nov 19, 2014 at 2:46 PM, Sam Flint wrote: > >> Michael, >> Thanks for your help. I found a wholeTextFiles() that I can use to >> import all files in a directory. I believe this would be the case if all >> the files existed in the

NEW to spark and sparksql

2014-11-19 Thread Sam Flint
hanks for your time, Sam

single worker vs multiple workers on each machine

2014-09-11 Thread Mike Sam
Hi There, I am new to Spark and I was wondering when you have so much memory on each machine of the cluster, is it better to run multiple workers with limited memory on each machine or is it better to run a single worker with access to the majority of the machine memory? If the answer is "it depen

Why spark-submit command hangs?

2014-07-21 Thread Sam Liu
t org.apache.spark.SparkContext@610c610c Thanks! Sam Liu

RE: Spark 1.0 and Logistic Regression Python Example

2014-07-01 Thread Sam Jacobs
Thanks Xiangrui, your suggestion fixed the problem. I will see if I can upgrade the numpy/python for a permanent fix. My current versions of python and numpy are 2.6 and 4.1.9 respectively. Thanks, Sam -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Tuesday

Spark 1.0 and Logistic Regression Python Example

2014-06-30 Thread Sam Jacobs
Hi, I modified the example code for logistic regression to compute the error in classification. Please see below. However the code is failing when it makes a call to: labelsAndPreds.filter(lambda (v, p): v != p).count() with the error message (something related to numpy or dot product): F

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread sam
Any idea when they will release it? Also I'm uncertain what we will need to do to fix the shell? Will we have to reinstall spark? or reinstall hadoop? (i'm not a devops so maybe this question sounds silly) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Arr

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
Awesome, that worked. Thank you! - Original Message - From: "Krishna Sankar" To: user@spark.apache.org Sent: Wednesday, June 4, 2014 12:52:00 PM Subject: Re: Trouble launching EC2 Cluster with Spark chmod 600 /FinalKey.pem Cheers On Wed, Jun 4, 2014 at 12:49 PM, Sam Tay

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
Also, once my friend logged in to his cluster he received the error "Permissions 0644 for 'FinalKey.pem' are too open." This sounds like the other problem described. How do we make the permissions more private? Thanks very much, Sam - Original Message - From: &qu

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
protocols explicitly.7ff92687-b95a-4a39-94cb-e2d00a6928fd This sounds like it could have to do with the access settings of the security group, but I don't know how to change. Any advice would be much appreciated! Sam - Original Message - From: "Krishna Sankar" To: u

Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
alid value 'null' for protocol. VPC security groups must specify protocols explicitly. My project partner gets one step further and then gets the error The key pair 'JamesAndSamTest' does not exist. Any thoughts as to how we could fix these problems? Thanks a lot! Sam

Re: spark on yarn fail with IOException

2014-06-04 Thread sam
I get a very similar stack trace and have no idea what could be causing it (see below). I've created a SO: http://stackoverflow.com/questions/24038908/spark-fails-on-big-jobs-with-java-io-ioexception-filesystem-closed 14/06/02 20:44:04 INFO client.AppClient$ClientActor: Executor updated: app-2014

Apache Spark Throws java.lang.IllegalStateException: unread block data

2014-05-17 Thread sam
What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt dep

Re: [ann] Spark-NYC Meetup

2014-04-21 Thread Sam Bessalah
Sounds great François. On 21 Apr 2014 22:31, "François Le Lay" wrote: > Hi everyone, > > This is a quick email to announce the creation of a Spark-NYC Meetup. > We have 2 upcoming events, one at PlaceIQ, another at Spotify where > Reynold Xin (Databricks) and Christopher Johnson (Spotify) have t

Re: Spark is slow

2014-04-21 Thread Sam Bessalah
Why don't start by explaining what kind of operation you're running on spark that's faster than hadoop mapred. Mybewe could start there. And yes this mailing is very busy since many people are getting into Spark, it's hard to answer to everyone. On 21 Apr 2014 20:23, "Joe L" wrote: > It is claime

Re: worker keeps getting disassociated upon a failed job spark version 0.90

2014-03-22 Thread sam
I have this problem too. Eventually the job fails (on the UI) and hangs the terminal until I CTRL + C. (Logs below) Now the Spark docs explain the heartbeat configuration stuff can be tweaked to handle GC hangs. I'm wondering if this is symptomatic of pushing the cluster a little too hard (we w

<    1   2