Hi,

Running spark 1.5.0 in yarn-client mode, and am curios in why there are so
many broadcast being done when loading datasets with large number of
partitions/files. Have datasets with thousands of partitions, i.e. hdfs
files in the avro folder, and sometime loading hundreds of these large
datasets. Believe I have located the broadcast to line
SparkContext.scala:1006. It seems to just broadcast the hadoop
configuration, and I don't see why it should be necessary to broadcast that
for EVERY file? Wouldn't it be possible to reuse the same broadcast
configuration? It hardly the case the the configuration would be different
between each file in a single dataset. Seems to be wasting lots of memory
and needs to persist unnecessarily to disk (see below again).

Thanks,
Anders

15/09/24 17:11:11 INFO BlockManager: Writing block broadcast_1871_piece0 to
disk                                              [19/49086]15/09/24
17:11:11 INFO BlockManagerInfo: Added broadcast_1871_piece0 on disk on
10.254.35.24:49428 (size: 23.1 KB)
15/09/24 17:11:11 INFO MemoryStore: Block broadcast_4803_piece0 stored as
bytes in memory (estimated size 23.1 KB, free 2.4 KB)
15/09/24 17:11:11 INFO BlockManagerInfo: Added broadcast_4803_piece0 in
memory on 10.254.35.24:49428 (size: 23.1 KB, free: 464.0 MB)
15/09/24 17:11:11 INFO SpotifySparkContext: Created broadcast 4803 from
hadoopFile at AvroRelation.scala:121
15/09/24 17:11:11 WARN MemoryStore: Failed to reserve initial memory
threshold of 1024.0 KB for computing block broadcast_4804 in memory
.
15/09/24 17:11:11 WARN MemoryStore: Not enough space to cache
broadcast_4804 in memory! (computed 496.0 B so far)
15/09/24 17:11:11 INFO MemoryStore: Memory use = 530.3 MB (blocks) + 0.0 B
(scratch space shared across 0 tasks(s)) = 530.3 MB. Storage
limit = 530.3 MB.
15/09/24 17:11:11 WARN MemoryStore: Persisting block broadcast_4804 to disk
instead.
15/09/24 17:11:11 INFO MemoryStore: ensureFreeSpace(23703) called with
curMem=556036460, maxMem=556038881
15/09/24 17:11:11 INFO MemoryStore: 1 blocks selected for dropping
15/09/24 17:11:11 INFO BlockManager: Dropping block broadcast_1872_piece0
from memory
15/09/24 17:11:11 INFO BlockManager: Writing block broadcast_1872_piece0 to
disk

Reply via email to