Re: TCP/IP speedup

2015-08-02 Thread Steve Loughran

On 1 Aug 2015, at 18:26, Ruslan Dautkhanov 
mailto:dautkha...@gmail.com>> wrote:

If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000)
may increase bandwidth up to ~20%.

http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"

+1

you can also get better checksums of packets, so that the (very small but 
non-zero) risk of corrupted network packets drops a bit more.


If Spark workload is not network bandwidth-bound, I can see it'll be a few 
percent to no improvement.



Put differently: it shouldn't hurt. The shuffle phase is the most network 
heavy, especially as it can span the entire cluster that backbone bandwidth 
"bisection bandwidth" can become the bottleneck, and mean that jobs can 
interfere

scheduling of work close to the HDFS data means that HDFS reads should often be 
local (the TCP stack gets bypassed entirely), or at least rack-local (sharing 
the switch, not any backbone)


but there's other things there, as the slide talks about


-stragglers: often a sign of pending HDD failure, as reads are retries. the 
classic hadoop MR engine detects these, can spin up alternate mappers (if you 
enable speculation), and will blacklist the node for further work. Sometimes 
though that straggling is just unbalanced data -some bits of work may be 
computationally a lot harder, slowing things down.

-contention for work on the nodes. In YARN you request how many "virtual cores" 
you want (ops get to define the map of virtual to physical), with each node 
having a finite set of cores

but ...
  -Unless CPU throttling is turned on, competing processes can take up more CPU 
than they asked for.
  -that virtual:physical core setting may be of

There's also disk IOP contention; two jobs trying to get at the same spindle, 
even though there are lots of disks on the server. There's not much you can do 
about that (today).

A key takeaway from that talk, which applies to all work-tuning talks is: get 
data from your real workloads, There's some good htrace instrumentation in HDFS 
these days, I haven't looked @ spark's instrumentation to see how they hook up. 
You can also expect to have some network monitoring (sflow, ...) which you 
could use to see if the backbone is overloaded. Don't forget the Linux tooling 
either, iotop &c. There's lots of room to play here -once you've got the data 
you can see where to focus, then decide how much time to spend trying to tune 
it.

-steve


--
Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus 
mailto:edel...@gmail.com>> wrote:
H

2% huh.


-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra 
mailto:m...@clearstorydata.com>> wrote:
https://spark-summit.org/2015/events/making-sense-of-spark-performance/

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus 
mailto:edel...@gmail.com>> wrote:
Hi All!

How important would be a significant performance improvement to TCP/IP itself, 
in terms of
overall job performance improvement. Which part would be most significantly 
accelerated?
Would it be HDFS?

-- ttfn
Simon Edelhaus
California 2015






Re: TCP/IP speedup

2015-08-02 Thread Michael Segel
This may seem like a silly question… but in following Mark’s link, the 
presentation talks about the TPC-DS benchmark. 

Here’s my question… what benchmark results? 

If you go over to the TPC.org  website they have no TPC-DS 
benchmarks listed. 
(Either audited or unaudited) 

So what gives? 

Note: There are TPCx-HS benchmarks listed… 

Thx

-Mike

> On Aug 1, 2015, at 5:45 PM, Mark Hamstra  wrote:
> 
> https://spark-summit.org/2015/events/making-sense-of-spark-performance/ 
> 
> 
> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus  > wrote:
> Hi All!
> 
> How important would be a significant performance improvement to TCP/IP 
> itself, in terms of 
> overall job performance improvement. Which part would be most significantly 
> accelerated? 
> Would it be HDFS?
> 
> -- ttfn
> Simon Edelhaus
> California 2015
> 




Re: TCP/IP speedup

2015-08-01 Thread Ruslan Dautkhanov
If your network is bandwidth-bound, you'll see setting jumbo frames (MTU
9000)
may increase bandwidth up to ~20%.

http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"

If Spark workload is not network bandwidth-bound, I can see it'll be a few
percent to no improvement.



-- 
Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus  wrote:

> H
>
> 2% huh.
>
>
> -- ttfn
> Simon Edelhaus
> California 2015
>
> On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra 
> wrote:
>
>> https://spark-summit.org/2015/events/making-sense-of-spark-performance/
>>
>> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus  wrote:
>>
>>> Hi All!
>>>
>>> How important would be a significant performance improvement to TCP/IP
>>> itself, in terms of
>>> overall job performance improvement. Which part would be most
>>> significantly accelerated?
>>> Would it be HDFS?
>>>
>>> -- ttfn
>>> Simon Edelhaus
>>> California 2015
>>>
>>
>>
>


Re: TCP/IP speedup

2015-08-01 Thread Simon Edelhaus
H

2% huh.


-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra 
wrote:

> https://spark-summit.org/2015/events/making-sense-of-spark-performance/
>
> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus  wrote:
>
>> Hi All!
>>
>> How important would be a significant performance improvement to TCP/IP
>> itself, in terms of
>> overall job performance improvement. Which part would be most
>> significantly accelerated?
>> Would it be HDFS?
>>
>> -- ttfn
>> Simon Edelhaus
>> California 2015
>>
>
>


Re: TCP/IP speedup

2015-08-01 Thread Mark Hamstra
https://spark-summit.org/2015/events/making-sense-of-spark-performance/

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus  wrote:

> Hi All!
>
> How important would be a significant performance improvement to TCP/IP
> itself, in terms of
> overall job performance improvement. Which part would be most
> significantly accelerated?
> Would it be HDFS?
>
> -- ttfn
> Simon Edelhaus
> California 2015
>