Hi all, so I have a model which has been stored in S3. And I have a Scala webapp which for certain requests loads the model and transforms submitted data against it.
I'm not sure how to run this quickly on a single instance though. At the moment Spark is being bundled up with the web app in an uberjar (sbt assembly). But the process is quite slow. I'm aiming for responses < 1 sec so that the webapp can respond quickly to requests. When I look the Spark UI I see: Summary Metrics for 1 Completed Tasks Metric Min 25th percentile Median 75th percentile Max Duration 94 ms 94 ms 94 ms 94 ms 94 ms Scheduler Delay 0 ms 0 ms 0 ms 0 ms 0 ms Task Deserialization Time 3 s 3 s 3 s 3 s 3 s GC Time 2 s 2 s 2 s 2 s 2 s Result Serialization Time 0 ms 0 ms 0 ms 0 ms 0 ms Getting Result Time 0 ms 0 ms 0 ms 0 ms 0 ms Peak Execution Memory 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B I don't really understand why deserialization and GC should take so long when the models are already loaded. Is this evidence I am doing something wrong? And where can I get a better understanding on how Spark works under the hood here, and how best to do a standalone/bundled jar deployment? Thanks! Nic