Re: Getting spark to use more than 4 cores on Amazon EC2

2014-10-22 Thread Andy Davidson
On a related note, how are you submitting your job? I have a simple streaming proof of concept and noticed that everything runs on my master. I wonder if I do not have enough load for spark to push tasks to the slaves. Thanks Andy From: Daniel Mahler Date: Monday, October 20, 2014 at 5:22 P

small bug in pyspark

2014-10-10 Thread Andy Davidson
Hi I am running spark on an ec2 cluster. I need to update python to 2.7. I have been following the directions on http://nbviewer.ipython.org/gist/JoshRosen/6856670 https://issues.apache.org/jira/browse/SPARK-922 I noticed that when I start a shell using pyspark, I correctly got python2.7, how e

Does Ipython notebook work with spark? trivial example does not work. Re: bug with IPython notebook?

2014-10-09 Thread Andy Davidson
I wonder if I am starting iPython notebook incorrectly. The example in my original email does not work. It looks like stdout is not configured correctly If I submit it as a python.py file It works correctly Any idea how I what the problem is? Thanks Andy From: Andrew Davidson Date: Tuesday

bug with IPython notebook?

2014-10-07 Thread Andy Davidson
Hi I think I found a bug in the iPython notebook integration. I am not sure how to report it I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the cluster using the launch script provided by spark I start iPython notebook on my cluster master as follows and use an ssh tunnel

problem with user@spark.apache.org spam filter

2014-10-03 Thread Andy Davidson
Any idea why my email was returned with the following error message? Thanks Andy This is the mail system at host smtprelay06.hostedemail.com. I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below. For further assistance, please

Re: can I think of JavaDStream<> foreachRDD() as being 'for each mini batch' ?

2014-10-01 Thread Andy Davidson
since the logging will happen on a potentially remote > receiver. I am not sure if this explains your observed behavior; it > depends on what you were logging. > > On Wed, Oct 1, 2014 at 6:51 PM, Andy Davidson > wrote: >> Hi >> >> I am new to Spark Streaming.

can I think of JavaDStream<> foreachRDD() as being 'for each mini batch' ?

2014-10-01 Thread Andy Davidson
Hi I am new to Spark Streaming. Can I think of JavaDStream<> foreachRDD() as being 'for each mini batch¹? The java doc does not say much about this function. Here is the background. I am writing a little test program to figure out how to use streams. At some point I wanted to calculate an aggre

Re: how to get actual count from as long from JavaDStream ?

2014-10-01 Thread Andy Davidson
ch RDD. DStream.count() gives you > exactly that: a DStream of Longs which are the counts of events in > each mini batch. > > On Tue, Sep 30, 2014 at 8:42 PM, Andy Davidson > wrote: >> Hi >> >> I have a simple streaming app. All I want to do is figure out how many l

Re: how to get actual count from as long from JavaDStream ?

2014-09-30 Thread Andy Davidson
the casting should be changed for Java and probably the > function argument syntax is wrong too, but hopefully there's enough there to > help. > > Jon > > > On Tue, Sep 30, 2014 at 3:42 PM, Andy Davidson > wrote: >> Hi >> >> I have a simple st

how to get actual count from as long from JavaDStream ?

2014-09-30 Thread Andy Davidson
Hi I have a simple streaming app. All I want to do is figure out how many lines I have received in the current mini batch. If numLines was a JavaRDD I could simply call count(). How do you do something similar in Streaming? Here is my psudo code JavaDStream msg = logs.filter(selectINFO); J

newbie system architecture problem, trouble using streaming and RDD.pipe()

2014-09-29 Thread Andy Davidson
Hello I am trying to build a system that does a very simple calculation on a stream and displays the results in a graph that I want to update the graph every second or so. I think I have a fundamental mis understanding about how steams and rdd.pipe() works. I want to do the data visualization part

Re: iPython notebook ec2 cluster matlabplot not found?

2014-09-29 Thread Andy Davidson
g so for all the nodes in > your cluster using pssh. If you install stuff just on the master without > somehow transferring it to the slaves, that will be problematic. > > Finally, there is an open pull request > <https://github.com/apache/spark/pull/2554> related to IPython th

Re: iPython notebook ec2 cluster matlabplot not found?

2014-09-29 Thread Andy Davidson
; Finally, there is an open pull request > <https://github.com/apache/spark/pull/2554> related to IPython that may be > relevant, though I haven’t looked at it too closely. > > Nick > > ​ > > On Sat, Sep 27, 2014 at 7:33 PM, Andy Davidson > wrote: >> Hi &

iPython notebook ec2 cluster matlabplot not found?

2014-09-27 Thread Andy Davidson
Hi I am having a heck of time trying to get python to work correctly on my cluster created using the spark-ec2 script The following link was really helpful https://issues.apache.org/jira/browse/SPARK-922 I am still running into problem with matplotlib. (it works fine on my mac). I can not fig

Re: problem with spark-ec2 launch script Re: spark-ec2 ERROR: Line magic function `%matplotlib` not found

2014-09-26 Thread Andy Davidson
luster launched by spark-ec2, there are some instructions in the comments > here <https://issues.apache.org/jira/browse/SPARK-922> for doing so. > > Nick > > ​ > > On Fri, Sep 26, 2014 at 2:18 PM, Andy Davidson > wrote: >> Hi Davies >> >> The real is

problem with spark-ec2 launch script Re: spark-ec2 ERROR: Line magic function `%matplotlib` not found

2014-09-26 Thread Andy Davidson
on master but Python 2.6 in cluster, > you should upgrade python to 2.7 in cluster, or use python 2.6 in > master by set PYSPARK_PYTHON=python2.6 > > On Thu, Sep 25, 2014 at 5:11 PM, Andy Davidson > wrote: >> Hi >> >> I am running into trouble using iPython

spark-ec2 ERROR: Line magic function `%matplotlib` not found

2014-09-25 Thread Andy Davidson
Hi I am running into trouble using iPython notebook on my cluster. Use the following command to set the cluster up $ ./spark-ec2 --key-pair=$KEY_PAIR --identity-file=$KEY_FILE --region=$REGION --slaves=$NUM_SLAVES launch $CLUSTER_NAME On master I launch python as follows $ IPYTHON_OPTS="noteboo

understanding rdd pipe() and bin/spark-submit --master

2014-09-20 Thread Andy Davidson
Hi I am new to spark and started writing some simple test code to figure out how things works. I am very interested in spark streaming and python. It appears that streaming is not supported in python yet. The work around I found by googling is to write your streaming code in either Scala or Jav

Re: RDD pipe example. Is this a bug or a feature?

2014-09-19 Thread Andy Davidson
c/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();) >System.out.println(iter.next()); > > > Hope that helps, > -Jey > > On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson > wrote: >> Hi >> >> I am wrote a little java job to try and figure out how RDD

RDD pipe example. Is this a bug or a feature?

2014-09-19 Thread Andy Davidson
Hi I am wrote a little java job to try and figure out how RDD pipe works. Bellow is my test shell script. If in the script I turn on debugging I get output. In my console. If debugging is turned off in the shell script, I do not see anything in my console. Is this a bug or feature? I am running t

Re: spark-1.1.0-bin-hadoop2.4 java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass

2014-09-18 Thread Andy Davidson
After lots of hacking I figure out how to resolve this problem. This is good solution. It severalty cripples jackson but at least for now I am unblocked 1) turn off annotations. mapper.configure(Feature.USE_ANNOTATIONS, false); 2) in maven set the jackson dependencies as provided. 1.9

spark-1.1.0-bin-hadoop2.4 java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass

2014-09-17 Thread Andy Davidson
Hi I am new to spark. I am trying to write a simple java program that process tweets that where collected and stored in a file. I figured the simplest thing to do would be to convert the JSON string into a java map. When I submit my jar file I keep getting the following error java.lang.NoClassDef

how to report documentation bug?

2014-09-16 Thread Andy Davidson
http://spark.apache.org/docs/latest/quick-start.html#standalone-applications Click on java tab There is a bug in the maven section 1.1.0-SNAPSHOT Should be 1.1.0 Hope this helps Andy

Re: SparkSql newbie problems with nested selects

2014-07-13 Thread Andy Davidson
quency freq JOIN >(select term, docid, count from Frequency) freqTranspose >where freq.term = freqTranspose.term >group by freq.docid, freqTranspose.docid""") > > Michael > > > On Sun, Jul 13, 2014 at 12:43 PM, Andy Davidson > wrote: >

SparkSql newbie problems with nested selects

2014-07-13 Thread Andy Davidson
Hi I am running into trouble with a nested query using python. To try and debug it, I first wrote the query I want using sqlite3 select freq.docid, freqTranspose.docid, sum(freq.count * freqTranspose.count) from Frequency as freq, (select term, docid, count from Frequency) as freqT

<    1   2   3