On a related note, how are you submitting your job?
I have a simple streaming proof of concept and noticed that everything runs
on my master. I wonder if I do not have enough load for spark to push tasks
to the slaves.
Thanks
Andy
From: Daniel Mahler
Date: Monday, October 20, 2014 at 5:22 P
Hi
I am running spark on an ec2 cluster. I need to update python to 2.7. I have
been following the directions on
http://nbviewer.ipython.org/gist/JoshRosen/6856670
https://issues.apache.org/jira/browse/SPARK-922
I noticed that when I start a shell using pyspark, I correctly got
python2.7, how e
I wonder if I am starting iPython notebook incorrectly. The example in my
original email does not work. It looks like stdout is not configured
correctly If I submit it as a python.py file It works correctly
Any idea how I what the problem is?
Thanks
Andy
From: Andrew Davidson
Date: Tuesday
Hi
I think I found a bug in the iPython notebook integration. I am not sure how
to report it
I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
cluster using the launch script provided by spark
I start iPython notebook on my cluster master as follows and use an ssh
tunnel
Any idea why my email was returned with the following error message?
Thanks
Andy
This is the mail system at host smtprelay06.hostedemail.com.
I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.
For further assistance, please
since the logging will happen on a potentially remote
> receiver. I am not sure if this explains your observed behavior; it
> depends on what you were logging.
>
> On Wed, Oct 1, 2014 at 6:51 PM, Andy Davidson
> wrote:
>> Hi
>>
>> I am new to Spark Streaming.
Hi
I am new to Spark Streaming. Can I think of JavaDStream<> foreachRDD() as
being 'for each mini batch¹? The java doc does not say much about this
function.
Here is the background. I am writing a little test program to figure out how
to use streams. At some point I wanted to calculate an aggre
ch RDD. DStream.count() gives you
> exactly that: a DStream of Longs which are the counts of events in
> each mini batch.
>
> On Tue, Sep 30, 2014 at 8:42 PM, Andy Davidson
> wrote:
>> Hi
>>
>> I have a simple streaming app. All I want to do is figure out how many l
the casting should be changed for Java and probably the
> function argument syntax is wrong too, but hopefully there's enough there to
> help.
>
> Jon
>
>
> On Tue, Sep 30, 2014 at 3:42 PM, Andy Davidson
> wrote:
>> Hi
>>
>> I have a simple st
Hi
I have a simple streaming app. All I want to do is figure out how many lines
I have received in the current mini batch. If numLines was a JavaRDD I could
simply call count(). How do you do something similar in Streaming?
Here is my psudo code
JavaDStream msg = logs.filter(selectINFO);
J
Hello
I am trying to build a system that does a very simple calculation on a
stream and displays the results in a graph that I want to update the graph
every second or so. I think I have a fundamental mis understanding about how
steams and rdd.pipe() works. I want to do the data visualization part
g so for all the nodes in
> your cluster using pssh. If you install stuff just on the master without
> somehow transferring it to the slaves, that will be problematic.
>
> Finally, there is an open pull request
> <https://github.com/apache/spark/pull/2554> related to IPython th
; Finally, there is an open pull request
> <https://github.com/apache/spark/pull/2554> related to IPython that may be
> relevant, though I haven’t looked at it too closely.
>
> Nick
>
>
>
> On Sat, Sep 27, 2014 at 7:33 PM, Andy Davidson
> wrote:
>> Hi
&
Hi
I am having a heck of time trying to get python to work correctly on my
cluster created using the spark-ec2 script
The following link was really helpful
https://issues.apache.org/jira/browse/SPARK-922
I am still running into problem with matplotlib. (it works fine on my mac).
I can not fig
luster launched by spark-ec2, there are some instructions in the comments
> here <https://issues.apache.org/jira/browse/SPARK-922> for doing so.
>
> Nick
>
>
>
> On Fri, Sep 26, 2014 at 2:18 PM, Andy Davidson
> wrote:
>> Hi Davies
>>
>> The real is
on master but Python 2.6 in cluster,
> you should upgrade python to 2.7 in cluster, or use python 2.6 in
> master by set PYSPARK_PYTHON=python2.6
>
> On Thu, Sep 25, 2014 at 5:11 PM, Andy Davidson
> wrote:
>> Hi
>>
>> I am running into trouble using iPython
Hi
I am running into trouble using iPython notebook on my cluster. Use the
following command to set the cluster up
$ ./spark-ec2 --key-pair=$KEY_PAIR --identity-file=$KEY_FILE
--region=$REGION --slaves=$NUM_SLAVES launch $CLUSTER_NAME
On master I launch python as follows
$ IPYTHON_OPTS="noteboo
Hi
I am new to spark and started writing some simple test code to figure out
how things works. I am very interested in spark streaming and python. It
appears that streaming is not supported in python yet. The work around I
found by googling is to write your streaming code in either Scala or Jav
c/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
>System.out.println(iter.next());
>
>
> Hope that helps,
> -Jey
>
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> wrote:
>> Hi
>>
>> I am wrote a little java job to try and figure out how RDD
Hi
I am wrote a little java job to try and figure out how RDD pipe works.
Bellow is my test shell script. If in the script I turn on debugging I get
output. In my console. If debugging is turned off in the shell script, I do
not see anything in my console. Is this a bug or feature?
I am running t
After lots of hacking I figure out how to resolve this problem. This is good
solution. It severalty cripples jackson but at least for now I am unblocked
1) turn off annotations.
mapper.configure(Feature.USE_ANNOTATIONS, false);
2) in maven set the jackson dependencies as provided.
1.9
Hi I am new to spark.
I am trying to write a simple java program that process tweets that where
collected and stored in a file. I figured the simplest thing to do would be
to convert the JSON string into a java map. When I submit my jar file I keep
getting the following error
java.lang.NoClassDef
http://spark.apache.org/docs/latest/quick-start.html#standalone-applications
Click on java tab There is a bug in the maven section
1.1.0-SNAPSHOT
Should be
1.1.0
Hope this helps
Andy
quency freq JOIN
>(select term, docid, count from Frequency) freqTranspose
>where freq.term = freqTranspose.term
>group by freq.docid, freqTranspose.docid""")
>
> Michael
>
>
> On Sun, Jul 13, 2014 at 12:43 PM, Andy Davidson
> wrote:
>
Hi
I am running into trouble with a nested query using python. To try and debug
it, I first wrote the query I want using sqlite3
select freq.docid, freqTranspose.docid, sum(freq.count *
freqTranspose.count) from
Frequency as freq,
(select term, docid, count from Frequency) as freqT
201 - 225 of 225 matches
Mail list logo