We  want to extract data from mysql, and calculate in sparksql.
The sql explain like below.


REGIONKEY#177,N_COMMENT#178] PushedFilters: [], ReadSchema:
struct<N_NATIONKEY:int,N_NAME:string,N_REGIONKEY:int,N_COMMENT:string>
                  +- *(20) Sort [r_regionkey#203 ASC NULLS FIRST], false, 0
                     +- Exchange(coordinator id: 266374831)
hashpartitioning(r_regionkey#203, 200), coordinator[target post-shuffle
partition size: 67108864]
                        +- *(19) Project [R_REGIONKEY#203]
                           +- *(19) Filter ((isnotnull(r_name#204) &&
(r_name#204 = AFRICA)) && isnotnull(r_regionkey#203))
                              +- InMemoryTableScan [R_REGIONKEY#203,
r_name#204], [isnotnull(r_name#204), (r_name#204 = AFRICA),
isnotnull(r_regionkey#203)]
                                    +- InMemoryRelation [R_REGIONKEY#203,
R_NAME#204, R_COMMENT#205], true, 10000, StorageLevel(disk, memory, 1
replicas)
                                          +- *(1) Scan JDBCRelation(region)
[numPartitions=1] [R_REGIONKEY#203,R_NAME#204,R_COMMENT#205] PushedFilters:
[], ReadSchema: struct<R_REGIONKEY:int,R_NAME:string,R_COMMENT:string>


As you see, all JDBCRelation convert to InMemoryRelation. Cause the JDBC
table is so big, the all data can not be filled into memory,  OOM occurs.
If there is some option to make SparkSQL use Disk if memory not enough?

Reply via email to