odd python.PythonRunner Times values?

2016-05-23 Thread Adrian Bridgett
I'm seeing output like this on our mesos spark slaves: 16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1137, boot = -590, init = 593, finish = 1134 16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1652, boot = -446, init = 481, finish = 1617 This seems to be coming from p

coalesce serialising earlier work

2016-08-09 Thread Adrian Bridgett
to be calculated in parallel and then this is _then_ coalesced before being written. (It may be that the -getmerge approach will still be faster) df.coalesce(100).coalesce(1).write. doesn't look very likely to help! Adrian -- *Adrian Bridgett*

2.0.1/2.1.x release dates

2016-08-18 Thread Adrian Bridgett
Just wondering if there were any rumoured release dates for either of the above. I'm seeing some odd hangs with 2.0.0 and mesos (and I know that the mesos integration has had a bit of updating in 2.1.x). Looking at JIRA, there's no suggested release date and issues seem to be added to a rele

Re: default parallelism and mesos executors

2015-12-15 Thread Adrian Bridgett
Thanks Iulian, I'll retest with 1.6.x once it's released (probably won't have enough spare time to test with the RC). On 11/12/2015 15:00, Iulian DragoČ™ wrote: On Wed, Dec 9, 2015 at 4:29 PM, Adrian Bridgett <mailto:adr...@opensignal.com>> wrote: (resending, te

Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-29 Thread Adrian Bridgett
e driver (will retry setting that on the shuffle service): spark.network.timeout 180s spark.shuffle.io.connectionTimeout 240s Adrian -- *Adrian Bridgett*

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
to be the core issue. On 29/12/2015 21:17, Ted Yu wrote: Have you searched log for 'f02cb67a-3519-4655-b23a-edc0dd082bf1-S1/4' ? In the snippet you posted, I don't see registration of this Executor. Cheers On Tue, Dec 29, 2015 at 12:43 PM, Adrian Bridgett mailto:adr...@op

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
l spark (which I thought the Driver used). Anyhow I'll do more testing and then raise a JIRA. Adrian -- *Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com> _ Office: First Floor, Scriptor Court, 155-157 F

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
To wrap this up, it's the shuffle manager sending the FIN so setting spark.shuffle.io.connectionTimeout to 3600s is the only workaround right now. SPARK-12583 raised. Adrian -- *Adrian Bridgett*

Re: Worker's BlockManager Folder not getting cleared

2016-01-26 Thread Adrian Bridgett
get rid of this and help on understanding this behaviour. Thanks !!! Abhi -- *Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com> _ Office: 3rd Floor, The Angel Office, 2 Angel Square, London, EC1V 1NY Pho

default parallelism and mesos executors

2015-12-02 Thread Adrian Bridgett
9650]) with ID 20151117-115458-164233482-5050-24333-S22/5 15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1) >>> print (sc.defaultParallel

default parallelism and mesos executors

2015-12-09 Thread Adrian Bridgett
kExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650]) with ID 20151117-115458-164233482-5050-24333-S22/5 15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 20151117-115458-164233482-5050-24333-S22/5 has registered (ne

JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed the instructions (uploaded spark tarball to HDFS, set executor uri in both places etc) and yet on the slaves it's failing to lauch even the SparkPi example with a JNI error. It does run with a local master. A day of debugg

Re: JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
dfs (and the two configs) from spark15.tgz to spark-1.5.0-bin-os1.tgz... Success!!! The same trick with 1.4 doesn't work, but now that I have something that does I can make progress. Hopefully this helps someone else :-) Adrian On 09/09/2015 16:59, Adrian Bridgett wrote: I'm trying to

Re: JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
then it should work. Tim On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <mailto:adr...@opensignal.com>> wrote: 5mins later... Trying 1.5 with a fairly plain build: ./make-distribution.sh --tgz --name os1 -Phadoop-2.6 and on my first attempt stderr showed: I0909 15

hdfs-ha on mesos - odd bug

2015-09-14 Thread Adrian Bridgett
:18 DEBUG Client: IPC Client (2055067800) connection to mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0 15/09/14 13:47:18 DEBUG ProtobufRpcEngi

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
hen you should resolve the namenode via hdfs:/// On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett mailto:adr...@opensignal.com>> wrote: I'm hitting an odd issue with running spark on mesos together with HA-HDFS, with an even odder workaround. In particu

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
spark side (or my spark config). http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Configuration_details On 15/09/2015 10:24, Steve Loughran wrote: On 15 Sep 2015, at 08:55, Adrian Bridgett wrote: Hi Sam, in short, no, it's a traditiona

very high maxresults setting (no collect())

2016-09-19 Thread Adrian Bridgett
Hi, We've recently started seeing a huge increase in spark.driver.maxResultSize - we are starting to set it at 3GB (and increase our driver memory a lot to 12GB or so). This is on v1.6.1 with Mesos scheduler. All the docs I can see is that this is to do with .collect() being called on a la

Re: very high maxresults setting (no collect())

2016-09-22 Thread Adrian Bridgett
Hi Michael, No spark upgrade, we've been changing some of our data pipelines so the data volumes have probably been getting a bit larger. Just in the last few weeks we've seen quite a few jobs needing a larger maxResultSize. Some jobs have gone from "fine with 1GB default" to 3GB. Wondering

Re: Issue with rogue data in csv file used in Spark application

2016-09-27 Thread Adrian Bridgett
We use the spark-csv (a successor of which is built in to spark 2.0) for this. It doesn't cause crashes, failed parsing is logged. We run on Mesos so I have to pull back all the logs from all the executors and search for failed lines (so that we can ensure that the failure rate isn't too hig

mesos in spark 2.0.1 - must call stop() otherwise app hangs

2016-10-05 Thread Adrian Bridgett
I saw this originally with 2.0.0 but as 2.0.1 is freshly out I thought I'd retry. With pyspark apps, it seems that in 2.x you must call .stop() on the spark context as otherwise the application doesn't stop (see log below - first gap is when it finishes, second gap is when I hit ^C). Thought

Re: mesos in spark 2.0.1 - must call stop() otherwise app hangs

2016-10-05 Thread Adrian Bridgett
Fab thanks all - I'll ensure we fix our code :-) On 05/10/2016 18:10, Sean Owen wrote: Being discussed as we speak at https://issues.apache.org/jira/browse/SPARK-17707 Calling stop() is definitely the right thing to do and always has been (see examples), but, may be possible to get rid of th

Re: mesos in spark 2.0.1 - must call stop() otherwise app hangs

2016-10-06 Thread Adrian Bridgett
Just one question - what about errors? Should we be wrapping our entire code in a ...finally spark.stop() clause (as per http://spark.apache.org/docs/latest/programming-guide.html#unit-testing)? BTW the .stop() requirement was news to quite a few people here, maybe it'd be a good idea to shou

Re: mesos in spark 2.0.1 - must call stop() otherwise app hangs

2016-10-06 Thread Adrian Bridgett
to get rid of this non-daemon thread if possible On Thu, Oct 6, 2016 at 9:02 AM Adrian Bridgett <mailto:adr...@opensignal.com>> wrote: Just one question - what about errors? Should we be wrapping our entire code in a ...finally spark.stop() clause (as per http://spark.a

coalesce ending up very unbalanced - but why?

2016-12-14 Thread Adrian Bridgett
I realise that coalesce() isn't guaranteed to be balanced and adding a repartition() does indeed fix this (at the cost of a large shuffle. I'm trying to understand _why_ it's so uneven (hopefully it helps someone else too). This is using spark v2.0.2 (pyspark). Essentially we're just readin

Re: coalesce ending up very unbalanced - but why?

2016-12-14 Thread Adrian Bridgett
ny clue :-) Adrian On 14/12/2016 13:58, Dirceu Semighini Filho wrote: Hi Adrian, Which kind of partitioning are you using? Have you already tried to coalesce it to a prime number? 2016-12-14 11:56 GMT-02:00 Adrian Bridgett <mailto:adr...@opensignal.com>>: I realise that co