Re: TorrentBroadcast slow performance

2014-10-09 Thread Matei Zaharia
Thanks for the feedback. For 1, there is an open patch: 
https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use 
MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but 
they're faster to access otherwise.

Matei

On Oct 9, 2014, at 12:11 PM, Guillaume Pitel guillaume.pi...@exensa.com wrote:

 Hi,
 
 Thanks to your answer, we've found the problem. It was on reverse IP 
 resolution on the drivers we used (wrong configuration of the local bind9). 
 Apparently, not being able to reverse-resolve the IP address of the nodes was 
 the culprit of the 10s delay.
 
 We've hit two other secondary problems with TorrentBroadcast though, in case 
 you're interested  :
 
 1 - Broadcasting a variable of about 2GB (1.8GB exactly) triggers a 
 java.lang.OutOfMemoryError: Requested array size exceeds VM limit, which is 
 not the case with HttpBroadcast (I guess HttpBroadcast splits the serialized 
 variable in small chunks)
 2 - Memory use of Torrent seems to be higher than Http (i.e. switching from 
 Http to Torrent triggers several OOM).
 
 Additionally, a question : while HttpBroadcast stores the broadcast pieces on 
 disk (in spark.local.dir/spark-... ), TorrentBroadcast seems not to use disk 
 backend storage. Does it mean that HttpBroadcast can handle bigger broadcast 
 out of memory ? If so, it's too bad that this design choice wasn't used for 
 Torrent.
 
 That being said, hats off to the people in charge of the broadcast unloading 
 wrt the lineage, this stuff works great !
 
 Guillaume
 
 
 Maybe there is a firewall issue that makes it slow for your nodes to connect 
 through the IP addresses they're configured with. I see there's this 10 
 second pause between Updated info of block broadcast_84_piece1 and 
 ensureFreeSpace(4194304) called (where it actually receives the block). 
 HTTP broadcast used only HTTP fetches from the executors to the driver, but 
 TorrentBroadcast has connections between the executors themselves and 
 between executors and the driver over a different port. Where are you 
 running your driver app and nodes?
 
 Matei
 
 On Oct 7, 2014, at 11:42 AM, Davies Liu dav...@databricks.com wrote:
 
 Could you create a JIRA for it? maybe it's a regression after
 https://issues.apache.org/jira/browse/SPARK-3119.
 
 We will appreciate that if you could tell how to reproduce it.
 
 On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
 guillaume.pi...@exensa.com wrote:
 Hi,
 
 I've had no answer to this on u...@spark.apache.org, so I post it on dev
 before filing a JIRA (in case the problem or solution is already 
 identified)
 
 We've had some performance issues since switching to 1.1.0, and we finally
 found the origin : TorrentBroadcast seems to be very slow in our setting
 (and it became default with 1.1.0)
 
 The logs of a 4MB variable with TorrentBroadcast : (15s)
 
 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 
 stored
 as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece1
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called
 with curMem=1401611984, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 
 stored
 as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece0
 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
 variable 84 took 15.202260006 s
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called
 with curMem=1405806288, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
 values in memory (estimated size 4.2 MB, free 7.2 GB)
 
 (notice that a 10s lag happens after the Updated info of block
 broadcast_... and before the MemoryStore log
 
 And with HttpBroadcast (0.3s):
 
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
 variable 147
 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called
 with curMem=1373493232, maxMem=9168696115
 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
 values in memory (estimated size 4.2 MB, free 7.3 GB)
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
 block broadcast_147 locally
 
 Since Torrent is supposed to perform much better than Http, we suspect a
 configuration error from our side, but are unable to pin it down. Does
 someone have any idea of the origin of the problem ?
 
 For now we're sticking with the HttpBroadcast workaround.
 
 Guillaume
 --
 Guillaume PITEL, Président
 +33(0)626 222 431
 
 eXenSa S.A.S.
 41, rue Périer - 92120 Montrouge - FRANCE
 Tel +33(0)184 163 677 / Fax +33(0)972 283 705
 -
 To unsubscribe, 

Re: TorrentBroadcast slow performance

2014-10-09 Thread Matei Zaharia
Oops I forgot to add, for 2, maybe we can add a flag to use DISK_ONLY for 
TorrentBroadcast, or if the broadcasts are bigger than some size.

Matei

On Oct 9, 2014, at 3:04 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

 Thanks for the feedback. For 1, there is an open patch: 
 https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually 
 use MEMORY_AND_DISK storage, so they will spill to disk if you have low 
 memory, but they're faster to access otherwise.
 
 Matei
 
 On Oct 9, 2014, at 12:11 PM, Guillaume Pitel guillaume.pi...@exensa.com 
 wrote:
 
 Hi,
 
 Thanks to your answer, we've found the problem. It was on reverse IP 
 resolution on the drivers we used (wrong configuration of the local bind9). 
 Apparently, not being able to reverse-resolve the IP address of the nodes 
 was the culprit of the 10s delay.
 
 We've hit two other secondary problems with TorrentBroadcast though, in case 
 you're interested  :
 
 1 - Broadcasting a variable of about 2GB (1.8GB exactly) triggers a 
 java.lang.OutOfMemoryError: Requested array size exceeds VM limit, which 
 is not the case with HttpBroadcast (I guess HttpBroadcast splits the 
 serialized variable in small chunks)
 2 - Memory use of Torrent seems to be higher than Http (i.e. switching from 
 Http to Torrent triggers several OOM).
 
 Additionally, a question : while HttpBroadcast stores the broadcast pieces 
 on disk (in spark.local.dir/spark-... ), TorrentBroadcast seems not to use 
 disk backend storage. Does it mean that HttpBroadcast can handle bigger 
 broadcast out of memory ? If so, it's too bad that this design choice wasn't 
 used for Torrent.
 
 That being said, hats off to the people in charge of the broadcast unloading 
 wrt the lineage, this stuff works great !
 
 Guillaume
 
 
 Maybe there is a firewall issue that makes it slow for your nodes to 
 connect through the IP addresses they're configured with. I see there's 
 this 10 second pause between Updated info of block broadcast_84_piece1 
 and ensureFreeSpace(4194304) called (where it actually receives the 
 block). HTTP broadcast used only HTTP fetches from the executors to the 
 driver, but TorrentBroadcast has connections between the executors 
 themselves and between executors and the driver over a different port. 
 Where are you running your driver app and nodes?
 
 Matei
 
 On Oct 7, 2014, at 11:42 AM, Davies Liu dav...@databricks.com wrote:
 
 Could you create a JIRA for it? maybe it's a regression after
 https://issues.apache.org/jira/browse/SPARK-3119.
 
 We will appreciate that if you could tell how to reproduce it.
 
 On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
 guillaume.pi...@exensa.com wrote:
 Hi,
 
 I've had no answer to this on u...@spark.apache.org, so I post it on dev
 before filing a JIRA (in case the problem or solution is already 
 identified)
 
 We've had some performance issues since switching to 1.1.0, and we finally
 found the origin : TorrentBroadcast seems to be very slow in our setting
 (and it became default with 1.1.0)
 
 The logs of a 4MB variable with TorrentBroadcast : (15s)
 
 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 
 stored
 as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece1
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) 
 called
 with curMem=1401611984, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 
 stored
 as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece0
 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
 variable 84 took 15.202260006 s
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) 
 called
 with curMem=1405806288, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
 values in memory (estimated size 4.2 MB, free 7.2 GB)
 
 (notice that a 10s lag happens after the Updated info of block
 broadcast_... and before the MemoryStore log
 
 And with HttpBroadcast (0.3s):
 
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
 variable 147
 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) 
 called
 with curMem=1373493232, maxMem=9168696115
 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
 values in memory (estimated size 4.2 MB, free 7.3 GB)
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
 block broadcast_147 locally
 
 Since Torrent is supposed to perform much better than Http, we suspect a
 configuration error from our side, but are unable to pin it down. Does
 someone have any idea of the origin of the problem ?
 
 For now we're sticking with the HttpBroadcast workaround.
 
 Guillaume
 --
 

Re: TorrentBroadcast slow performance

2014-10-07 Thread Davies Liu
Could you create a JIRA for it? maybe it's a regression after
https://issues.apache.org/jira/browse/SPARK-3119.

We will appreciate that if you could tell how to reproduce it.

On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
guillaume.pi...@exensa.com wrote:
 Hi,

 I've had no answer to this on u...@spark.apache.org, so I post it on dev
 before filing a JIRA (in case the problem or solution is already identified)

 We've had some performance issues since switching to 1.1.0, and we finally
 found the origin : TorrentBroadcast seems to be very slow in our setting
 (and it became default with 1.1.0)

 The logs of a 4MB variable with TorrentBroadcast : (15s)

 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 stored
 as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece1
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called
 with curMem=1401611984, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 stored
 as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece0
 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
 variable 84 took 15.202260006 s
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called
 with curMem=1405806288, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
 values in memory (estimated size 4.2 MB, free 7.2 GB)

 (notice that a 10s lag happens after the Updated info of block
 broadcast_... and before the MemoryStore log

 And with HttpBroadcast (0.3s):

 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
 variable 147
 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called
 with curMem=1373493232, maxMem=9168696115
 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
 values in memory (estimated size 4.2 MB, free 7.3 GB)
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
 block broadcast_147 locally

 Since Torrent is supposed to perform much better than Http, we suspect a
 configuration error from our side, but are unable to pin it down. Does
 someone have any idea of the origin of the problem ?

 For now we're sticking with the HttpBroadcast workaround.

 Guillaume
 --
 Guillaume PITEL, Président
 +33(0)626 222 431

 eXenSa S.A.S.
 41, rue Périer - 92120 Montrouge - FRANCE
 Tel +33(0)184 163 677 / Fax +33(0)972 283 705

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: TorrentBroadcast slow performance

2014-10-07 Thread Matei Zaharia
Maybe there is a firewall issue that makes it slow for your nodes to connect 
through the IP addresses they're configured with. I see there's this 10 second 
pause between Updated info of block broadcast_84_piece1 and 
ensureFreeSpace(4194304) called (where it actually receives the block). HTTP 
broadcast used only HTTP fetches from the executors to the driver, but 
TorrentBroadcast has connections between the executors themselves and between 
executors and the driver over a different port. Where are you running your 
driver app and nodes?

Matei

On Oct 7, 2014, at 11:42 AM, Davies Liu dav...@databricks.com wrote:

 Could you create a JIRA for it? maybe it's a regression after
 https://issues.apache.org/jira/browse/SPARK-3119.
 
 We will appreciate that if you could tell how to reproduce it.
 
 On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel
 guillaume.pi...@exensa.com wrote:
 Hi,
 
 I've had no answer to this on u...@spark.apache.org, so I post it on dev
 before filing a JIRA (in case the problem or solution is already identified)
 
 We've had some performance issues since switching to 1.1.0, and we finally
 found the origin : TorrentBroadcast seems to be very slow in our setting
 (and it became default with 1.1.0)
 
 The logs of a 4MB variable with TorrentBroadcast : (15s)
 
 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 stored
 as bytes in memory (estimated size 171.6 KB, free 7.2 GB)
 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece1
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called
 with curMem=1401611984, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 stored
 as bytes in memory (estimated size 4.0 MB, free 7.2 GB)
 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block
 broadcast_84_piece0
 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast
 variable 84 took 15.202260006 s
 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called
 with curMem=1405806288, maxMem=9168696115
 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as
 values in memory (estimated size 4.2 MB, free 7.2 GB)
 
 (notice that a 10s lag happens after the Updated info of block
 broadcast_... and before the MemoryStore log
 
 And with HttpBroadcast (0.3s):
 
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast
 variable 147
 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called
 with curMem=1373493232, maxMem=9168696115
 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as
 values in memory (estimated size 4.2 MB, free 7.3 GB)
 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable
 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found
 block broadcast_147 locally
 
 Since Torrent is supposed to perform much better than Http, we suspect a
 configuration error from our side, but are unable to pin it down. Does
 someone have any idea of the origin of the problem ?
 
 For now we're sticking with the HttpBroadcast workaround.
 
 Guillaume
 --
 Guillaume PITEL, Président
 +33(0)626 222 431
 
 eXenSa S.A.S.
 41, rue Périer - 92120 Montrouge - FRANCE
 Tel +33(0)184 163 677 / Fax +33(0)972 283 705
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org