I'm seeing output like this on our mesos spark slaves:
16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1137, boot =
-590, init = 593, finish = 1134
16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1652, boot =
-446, init = 481, finish = 1617
This seems to be coming from p
to be calculated in parallel and then this is _then_ coalesced
before being written. (It may be that the -getmerge approach will still
be faster)
df.coalesce(100).coalesce(1).write. doesn't look very likely to help!
Adrian
--
*Adrian Bridgett*
Just wondering if there were any rumoured release dates for either of
the above. I'm seeing some odd hangs with 2.0.0 and mesos (and I know
that the mesos integration has had a bit of updating in 2.1.x).
Looking at JIRA, there's no suggested release date and issues seem to be
added to a rele
Thanks Iulian, I'll retest with 1.6.x once it's released (probably won't
have enough spare time to test with the RC).
On 11/12/2015 15:00, Iulian DragoČ™ wrote:
On Wed, Dec 9, 2015 at 4:29 PM, Adrian Bridgett <mailto:adr...@opensignal.com>> wrote:
(resending, te
e driver (will retry setting that on the shuffle service):
spark.network.timeout 180s
spark.shuffle.io.connectionTimeout 240s
Adrian
--
*Adrian Bridgett*
to be the core issue.
On 29/12/2015 21:17, Ted Yu wrote:
Have you searched log for 'f02cb67a-3519-4655-b23a-edc0dd082bf1-S1/4' ?
In the snippet you posted, I don't see registration of this Executor.
Cheers
On Tue, Dec 29, 2015 at 12:43 PM, Adrian Bridgett
mailto:adr...@op
l spark (which I thought the Driver used).
Anyhow I'll do more testing and then raise a JIRA.
Adrian
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_
Office: First Floor, Scriptor Court, 155-157 F
To wrap this up, it's the shuffle manager sending the FIN so setting
spark.shuffle.io.connectionTimeout to 3600s is the only workaround right
now. SPARK-12583 raised.
Adrian
--
*Adrian Bridgett*
get rid of this and help on understanding this
behaviour.
Thanks !!!
Abhi
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_
Office: 3rd Floor, The Angel Office, 2 Angel Square, London, EC1V 1NY
Pho
9650])
with ID 20151117-115458-164233482-5050-24333-S22/5
15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor
20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1)
>>> print (sc.defaultParallel
kExecutor@ip-10-1-200-147.ec2.internal:41194/user/Executor#-1021429650])
with ID 20151117-115458-164233482-5050-24333-S22/5
15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor
20151117-115458-164233482-5050-24333-S22/5 has registered (ne
I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed
the instructions (uploaded spark tarball to HDFS, set executor uri in
both places etc) and yet on the slaves it's failing to lauch even the
SparkPi example with a JNI error. It does run with a local master. A
day of debugg
dfs (and the two configs) from
spark15.tgz to spark-1.5.0-bin-os1.tgz...
Success!!!
The same trick with 1.4 doesn't work, but now that I have something that
does I can make progress.
Hopefully this helps someone else :-)
Adrian
On 09/09/2015 16:59, Adrian Bridgett wrote:
I'm trying to
then it should work.
Tim
On Wed, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <mailto:adr...@opensignal.com>> wrote:
5mins later...
Trying 1.5 with a fairly plain build:
./make-distribution.sh --tgz --name os1 -Phadoop-2.6
and on my first attempt stderr showed:
I0909 15
:18 DEBUG Client: IPC Client (2055067800) connection to
mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
15/09/14 13:47:18 DEBUG ProtobufRpcEngi
hen you should resolve the namenode via hdfs:///
On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett
mailto:adr...@opensignal.com>> wrote:
I'm hitting an odd issue with running spark on mesos together with
HA-HDFS, with an even odder workaround.
In particu
spark side (or my spark config).
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Configuration_details
On 15/09/2015 10:24, Steve Loughran wrote:
On 15 Sep 2015, at 08:55, Adrian Bridgett wrote:
Hi Sam, in short, no, it's a traditiona
Hi,
We've recently started seeing a huge increase in
spark.driver.maxResultSize - we are starting to set it at 3GB (and
increase our driver memory a lot to 12GB or so). This is on v1.6.1 with
Mesos scheduler.
All the docs I can see is that this is to do with .collect() being
called on a la
Hi Michael,
No spark upgrade, we've been changing some of our data pipelines so the
data volumes have probably been getting a bit larger. Just in the last
few weeks we've seen quite a few jobs needing a larger maxResultSize.
Some jobs have gone from "fine with 1GB default" to 3GB. Wondering
We use the spark-csv (a successor of which is built in to spark 2.0) for
this. It doesn't cause crashes, failed parsing is logged. We run on
Mesos so I have to pull back all the logs from all the executors and
search for failed lines (so that we can ensure that the failure rate
isn't too hig
I saw this originally with 2.0.0 but as 2.0.1 is freshly out I thought
I'd retry.
With pyspark apps, it seems that in 2.x you must call .stop() on the
spark context as otherwise the application doesn't stop (see log below -
first gap is when it finishes, second gap is when I hit ^C).
Thought
Fab thanks all - I'll ensure we fix our code :-)
On 05/10/2016 18:10, Sean Owen wrote:
Being discussed as we speak at
https://issues.apache.org/jira/browse/SPARK-17707
Calling stop() is definitely the right thing to do and always has been
(see examples), but, may be possible to get rid of th
Just one question - what about errors? Should we be wrapping our entire
code in a ...finally spark.stop() clause (as per
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing)?
BTW the .stop() requirement was news to quite a few people here, maybe
it'd be a good idea to shou
to get rid of this non-daemon thread if possible
On Thu, Oct 6, 2016 at 9:02 AM Adrian Bridgett <mailto:adr...@opensignal.com>> wrote:
Just one question - what about errors? Should we be wrapping our
entire
code in a ...finally spark.stop() clause (as per
http://spark.a
I realise that coalesce() isn't guaranteed to be balanced and adding a
repartition() does indeed fix this (at the cost of a large shuffle.
I'm trying to understand _why_ it's so uneven (hopefully it helps
someone else too). This is using spark v2.0.2 (pyspark).
Essentially we're just readin
ny clue :-)
Adrian
On 14/12/2016 13:58, Dirceu Semighini Filho wrote:
Hi Adrian,
Which kind of partitioning are you using?
Have you already tried to coalesce it to a prime number?
2016-12-14 11:56 GMT-02:00 Adrian Bridgett <mailto:adr...@opensignal.com>>:
I realise that co
26 matches
Mail list logo