Re: OpenCV + Spark : Where to put System.loadLibrary ?

2014-08-19 Thread kmatzen
Reviving this thread hoping I might be able to get an exact snippet for the correct way to do this in Scala. I had a solution for OpenCV that I thought was correct, but half the time the library was not loaded by time it was needed. Keep in mind that I am completely new at Scala, so you're going

JVM heap and native allocation questions

2014-08-19 Thread kmatzen
I'm trying to use Spark to process some data using some native function's I've integrated using JNI and I pass around a lot of memory I've allocated inside these functions. I'm not very familiar with the JVM, so I have a couple of questions. (1) Performance seemed terrible until I LD_PRELOAD'ed

s3:// sequence file startup time

2014-08-16 Thread kmatzen
I have some RDD's stored as s3://-backed sequence files sharded into 1000 parts. The startup time is pretty long (~10's of minutes). It's communicating with S3, but I don't know what it's doing. Is it just fetching the metadata from S3 for each part? Is there a way to pipeline this with the

No space left on device

2014-08-09 Thread kmatzen
I need some configuration / debugging recommendations to work around no space left on device. I am completely new to Spark, but I have some experience with Hadoop. I have a task where I read images stored in sequence files from s3://, process them with a map in scala, and write the result back