Hi,

I was exploring SPARK. And in the process, I was trying to search a column
containing URL.

Basically we are doing a contains operator on the column. This is taking
around >3 min  to return the results. Is there any way to optimize this
query ?

.filter( line=>line.contains("someUrl"))

I currently have a system in standalone mode with *8GB ram*.
Everything is stored in memory in De-serialized format. The data size in
memory( De-serialized ) is around *1 GB.*


Any suggestions ?

Thanks in advance.

Regards,
SB

Reply via email to