Hi,
I was exploring SPARK. And in the process, I was trying to search a column
containing URL.
Basically we are doing a contains operator on the column. This is taking
around >3 min to return the results. Is there any way to optimize this
query ?
.filter( line=>line.contains("someUrl"))
I currently have a system in standalone mode with *8GB ram*.
Everything is stored in memory in De-serialized format. The data size in
memory( De-serialized ) is around *1 GB.*
Any suggestions ?
Thanks in advance.
Regards,
SB