Reviving this thread hoping I might be able to get an exact snippet for the
correct way to do this in Scala. I had a solution for OpenCV that I thought
was correct, but half the time the library was not loaded by time it was
needed.
Keep in mind that I am completely new at Scala, so you're going
I'm trying to use Spark to process some data using some native function's
I've integrated using JNI and I pass around a lot of memory I've allocated
inside these functions. I'm not very familiar with the JVM, so I have a
couple of questions.
(1) Performance seemed terrible until I LD_PRELOAD'ed
I have some RDD's stored as s3://-backed sequence files sharded into 1000
parts. The startup time is pretty long (~10's of minutes). It's
communicating with S3, but I don't know what it's doing. Is it just
fetching the metadata from S3 for each part? Is there a way to pipeline
this with the
I need some configuration / debugging recommendations to work around no
space left on device. I am completely new to Spark, but I have some
experience with Hadoop.
I have a task where I read images stored in sequence files from s3://,
process them with a map in scala, and write the result back