Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Davies Liu
Thanks for let us now.

On Fri, Jun 5, 2015 at 8:34 AM, Sam Stoelinga  wrote:
> Please ignore this whole thread. It's working out of nowhere. I'm not sure
> what was the root cause. After I restarted the VM the previous SIFT code
> also started working.
>
> On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga 
> wrote:
>>
>> Thanks Davies. I will file a bug later with code and single image as
>> dataset. Next to that I can give anybody access to my vagrant VM that
>> already has spark with OpenCV and the dataset available.
>>
>> Or you can setup the same vagrant machine at your place. All is automated
>> ^^
>> git clone https://github.com/samos123/computer-vision-cloud-platform
>> cd computer-vision-cloud-platform
>> ./scripts/setup.sh
>> vagrant ssh
>>
>> (Expect failures, I haven't cleaned up and tested it for other people) btw
>> I study at Tsinghua also currently.
>>
>> On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu  wrote:
>>>
>>> Please file a bug here: https://issues.apache.org/jira/browse/SPARK/
>>>
>>> Could you also provide a way to reproduce this bug (including some
>>> datasets)?
>>>
>>> On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga 
>>> wrote:
>>> > I've changed the SIFT feature extraction to SURF feature extraction and
>>> > it
>>> > works...
>>> >
>>> > Following line was changed:
>>> > sift = cv2.xfeatures2d.SIFT_create()
>>> >
>>> > to
>>> >
>>> > sift = cv2.xfeatures2d.SURF_create()
>>> >
>>> > Where should I file this as a bug? When not running on Spark it works
>>> > fine
>>> > so I'm saying it's a spark bug.
>>> >
>>> > On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga 
>>> > wrote:
>>> >>
>>> >> Yea should have emphasized that. I'm running the same code on the same
>>> >> VM.
>>> >> It's a VM with spark in standalone mode and I run the unit test
>>> >> directly on
>>> >> that same VM. So OpenCV is working correctly on that same machine but
>>> >> when
>>> >> moving the exact same OpenCV code to spark it just crashes.
>>> >>
>>> >> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu 
>>> >> wrote:
>>> >>>
>>> >>> Could you run the single thread version in worker machine to make
>>> >>> sure
>>> >>> that OpenCV is installed and configured correctly?
>>> >>>
>>> >>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga
>>> >>> 
>>> >>> wrote:
>>> >>> > I've verified the issue lies within Spark running OpenCV code and
>>> >>> > not
>>> >>> > within
>>> >>> > the sequence file BytesWritable formatting.
>>> >>> >
>>> >>> > This is the code which can reproduce that spark is causing the
>>> >>> > failure
>>> >>> > by
>>> >>> > not using the sequencefile as input at all but running the same
>>> >>> > function
>>> >>> > with same input on spark but fails:
>>> >>> >
>>> >>> > def extract_sift_features_opencv(imgfile_imgbytes):
>>> >>> > imgfilename, discardsequencefile = imgfile_imgbytes
>>> >>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>>> >>> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
>>> >>> > img = cv2.imdecode(nparr, 1)
>>> >>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>>> >>> > sift = cv2.xfeatures2d.SIFT_create()
>>> >>> > kp, descriptors = sift.detectAndCompute(gray, None)
>>> >>> > return (imgfilename, "test")
>>> >>> >
>>> >>> > And corresponding tests.py:
>>> >>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>>> >>> >
>>> >>> >
>>> >>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga
>>> >>> > 
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Thanks for the advice! The following line causes spark to crash:
>>> >>> >>
>>> >>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>>> >>> >>
>>> >>> >> But I do need this line to be executed and the code does not crash
>>> >>> >> when
>>> >>> >> running outside of Spark but passing the same parameters. You're
>>> >>> >> saying
>>> >>> >> maybe the bytes from the sequencefile got somehow transformed and
>>> >>> >> don't
>>> >>> >> represent an image anymore causing OpenCV to crash the whole
>>> >>> >> python
>>> >>> >> executor.
>>> >>> >>
>>> >>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu
>>> >>> >> 
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> Could you try to comment out some lines in
>>> >>> >>> `extract_sift_features_opencv` to find which line cause the
>>> >>> >>> crash?
>>> >>> >>>
>>> >>> >>> If the bytes came from sequenceFile() is broken, it's easy to
>>> >>> >>> crash a
>>> >>> >>> C library in Python (OpenCV).
>>> >>> >>>
>>> >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
>>> >>> >>> 
>>> >>> >>> wrote:
>>> >>> >>> > Hi sparkers,
>>> >>> >>> >
>>> >>> >>> > I am working on a PySpark application which uses the OpenCV
>>> >>> >>> > library. It
>>> >>> >>> > runs
>>> >>> >>> > fine when running the code locally but when I try to run it on
>>> >>> >>> > Spark on
>>> >>> >>> > the
>>> >>> >>> > same Machine it crashes the worker.
>>> >>> >>> >
>>> >>> >>> > The code can be found here:
>>> >>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>>> >>> >>> >
>>> >>> >>> > This is the 

Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Please ignore this whole thread. It's working out of nowhere. I'm not sure
what was the root cause. After I restarted the VM the previous SIFT code
also started working.

On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga 
wrote:

> Thanks Davies. I will file a bug later with code and single image as
> dataset. Next to that I can give anybody access to my vagrant VM that
> already has spark with OpenCV and the dataset available.
>
> Or you can setup the same vagrant machine at your place. All is automated
> ^^
> git clone https://github.com/samos123/computer-vision-cloud-platform
> cd computer-vision-cloud-platform
> ./scripts/setup.sh
> vagrant ssh
>
> (Expect failures, I haven't cleaned up and tested it for other people) btw
> I study at Tsinghua also currently.
>
> On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu  wrote:
>
>> Please file a bug here: https://issues.apache.org/jira/browse/SPARK/
>>
>> Could you also provide a way to reproduce this bug (including some
>> datasets)?
>>
>> On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga 
>> wrote:
>> > I've changed the SIFT feature extraction to SURF feature extraction and
>> it
>> > works...
>> >
>> > Following line was changed:
>> > sift = cv2.xfeatures2d.SIFT_create()
>> >
>> > to
>> >
>> > sift = cv2.xfeatures2d.SURF_create()
>> >
>> > Where should I file this as a bug? When not running on Spark it works
>> fine
>> > so I'm saying it's a spark bug.
>> >
>> > On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga 
>> wrote:
>> >>
>> >> Yea should have emphasized that. I'm running the same code on the same
>> VM.
>> >> It's a VM with spark in standalone mode and I run the unit test
>> directly on
>> >> that same VM. So OpenCV is working correctly on that same machine but
>> when
>> >> moving the exact same OpenCV code to spark it just crashes.
>> >>
>> >> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu 
>> wrote:
>> >>>
>> >>> Could you run the single thread version in worker machine to make sure
>> >>> that OpenCV is installed and configured correctly?
>> >>>
>> >>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga > >
>> >>> wrote:
>> >>> > I've verified the issue lies within Spark running OpenCV code and
>> not
>> >>> > within
>> >>> > the sequence file BytesWritable formatting.
>> >>> >
>> >>> > This is the code which can reproduce that spark is causing the
>> failure
>> >>> > by
>> >>> > not using the sequencefile as input at all but running the same
>> >>> > function
>> >>> > with same input on spark but fails:
>> >>> >
>> >>> > def extract_sift_features_opencv(imgfile_imgbytes):
>> >>> > imgfilename, discardsequencefile = imgfile_imgbytes
>> >>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>> >>> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
>> >>> > img = cv2.imdecode(nparr, 1)
>> >>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>> >>> > sift = cv2.xfeatures2d.SIFT_create()
>> >>> > kp, descriptors = sift.detectAndCompute(gray, None)
>> >>> > return (imgfilename, "test")
>> >>> >
>> >>> > And corresponding tests.py:
>> >>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>> >>> >
>> >>> >
>> >>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <
>> sammiest...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Thanks for the advice! The following line causes spark to crash:
>> >>> >>
>> >>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>> >>> >>
>> >>> >> But I do need this line to be executed and the code does not crash
>> >>> >> when
>> >>> >> running outside of Spark but passing the same parameters. You're
>> >>> >> saying
>> >>> >> maybe the bytes from the sequencefile got somehow transformed and
>> >>> >> don't
>> >>> >> represent an image anymore causing OpenCV to crash the whole python
>> >>> >> executor.
>> >>> >>
>> >>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu > >
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Could you try to comment out some lines in
>> >>> >>> `extract_sift_features_opencv` to find which line cause the crash?
>> >>> >>>
>> >>> >>> If the bytes came from sequenceFile() is broken, it's easy to
>> crash a
>> >>> >>> C library in Python (OpenCV).
>> >>> >>>
>> >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
>> >>> >>> 
>> >>> >>> wrote:
>> >>> >>> > Hi sparkers,
>> >>> >>> >
>> >>> >>> > I am working on a PySpark application which uses the OpenCV
>> >>> >>> > library. It
>> >>> >>> > runs
>> >>> >>> > fine when running the code locally but when I try to run it on
>> >>> >>> > Spark on
>> >>> >>> > the
>> >>> >>> > same Machine it crashes the worker.
>> >>> >>> >
>> >>> >>> > The code can be found here:
>> >>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>> >>> >>> >
>> >>> >>> > This is the error message taken from STDERR of the worker log:
>> >>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
>> >>> >>> >
>> >>> >>> > Would like pointers or tips on how to debug further? Would be
>> nice
>> >>> >>> > to
>> >>> >>> > know
>> >>> >>> > the reason why the wor

Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Thanks Davies. I will file a bug later with code and single image as
dataset. Next to that I can give anybody access to my vagrant VM that
already has spark with OpenCV and the dataset available.

Or you can setup the same vagrant machine at your place. All is automated ^^
git clone https://github.com/samos123/computer-vision-cloud-platform
cd computer-vision-cloud-platform
./scripts/setup.sh
vagrant ssh

(Expect failures, I haven't cleaned up and tested it for other people) btw
I study at Tsinghua also currently.

On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu  wrote:

> Please file a bug here: https://issues.apache.org/jira/browse/SPARK/
>
> Could you also provide a way to reproduce this bug (including some
> datasets)?
>
> On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga 
> wrote:
> > I've changed the SIFT feature extraction to SURF feature extraction and
> it
> > works...
> >
> > Following line was changed:
> > sift = cv2.xfeatures2d.SIFT_create()
> >
> > to
> >
> > sift = cv2.xfeatures2d.SURF_create()
> >
> > Where should I file this as a bug? When not running on Spark it works
> fine
> > so I'm saying it's a spark bug.
> >
> > On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga 
> wrote:
> >>
> >> Yea should have emphasized that. I'm running the same code on the same
> VM.
> >> It's a VM with spark in standalone mode and I run the unit test
> directly on
> >> that same VM. So OpenCV is working correctly on that same machine but
> when
> >> moving the exact same OpenCV code to spark it just crashes.
> >>
> >> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu 
> wrote:
> >>>
> >>> Could you run the single thread version in worker machine to make sure
> >>> that OpenCV is installed and configured correctly?
> >>>
> >>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga 
> >>> wrote:
> >>> > I've verified the issue lies within Spark running OpenCV code and not
> >>> > within
> >>> > the sequence file BytesWritable formatting.
> >>> >
> >>> > This is the code which can reproduce that spark is causing the
> failure
> >>> > by
> >>> > not using the sequencefile as input at all but running the same
> >>> > function
> >>> > with same input on spark but fails:
> >>> >
> >>> > def extract_sift_features_opencv(imgfile_imgbytes):
> >>> > imgfilename, discardsequencefile = imgfile_imgbytes
> >>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
> >>> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
> >>> > img = cv2.imdecode(nparr, 1)
> >>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
> >>> > sift = cv2.xfeatures2d.SIFT_create()
> >>> > kp, descriptors = sift.detectAndCompute(gray, None)
> >>> > return (imgfilename, "test")
> >>> >
> >>> > And corresponding tests.py:
> >>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
> >>> >
> >>> >
> >>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <
> sammiest...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Thanks for the advice! The following line causes spark to crash:
> >>> >>
> >>> >> kp, descriptors = sift.detectAndCompute(gray, None)
> >>> >>
> >>> >> But I do need this line to be executed and the code does not crash
> >>> >> when
> >>> >> running outside of Spark but passing the same parameters. You're
> >>> >> saying
> >>> >> maybe the bytes from the sequencefile got somehow transformed and
> >>> >> don't
> >>> >> represent an image anymore causing OpenCV to crash the whole python
> >>> >> executor.
> >>> >>
> >>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu 
> >>> >> wrote:
> >>> >>>
> >>> >>> Could you try to comment out some lines in
> >>> >>> `extract_sift_features_opencv` to find which line cause the crash?
> >>> >>>
> >>> >>> If the bytes came from sequenceFile() is broken, it's easy to
> crash a
> >>> >>> C library in Python (OpenCV).
> >>> >>>
> >>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
> >>> >>> 
> >>> >>> wrote:
> >>> >>> > Hi sparkers,
> >>> >>> >
> >>> >>> > I am working on a PySpark application which uses the OpenCV
> >>> >>> > library. It
> >>> >>> > runs
> >>> >>> > fine when running the code locally but when I try to run it on
> >>> >>> > Spark on
> >>> >>> > the
> >>> >>> > same Machine it crashes the worker.
> >>> >>> >
> >>> >>> > The code can be found here:
> >>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
> >>> >>> >
> >>> >>> > This is the error message taken from STDERR of the worker log:
> >>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
> >>> >>> >
> >>> >>> > Would like pointers or tips on how to debug further? Would be
> nice
> >>> >>> > to
> >>> >>> > know
> >>> >>> > the reason why the worker crashed.
> >>> >>> >
> >>> >>> > Thanks,
> >>> >>> > Sam Stoelinga
> >>> >>> >
> >>> >>> >
> >>> >>> > org.apache.spark.SparkException: Python worker exited
> unexpectedly
> >>> >>> > (crashed)
> >>> >>> > at
> >>> >>> >
> >>> >>> >
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
> >>> >>> > at
> >>> >>> >
> >>> >>> >
> >>> >>> >
> org.apache.spark.

Re: PySpark with OpenCV causes python worker to crash

2015-06-04 Thread Davies Liu
Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

Could you also provide a way to reproduce this bug (including some datasets)?

On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga  wrote:
> I've changed the SIFT feature extraction to SURF feature extraction and it
> works...
>
> Following line was changed:
> sift = cv2.xfeatures2d.SIFT_create()
>
> to
>
> sift = cv2.xfeatures2d.SURF_create()
>
> Where should I file this as a bug? When not running on Spark it works fine
> so I'm saying it's a spark bug.
>
> On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga  wrote:
>>
>> Yea should have emphasized that. I'm running the same code on the same VM.
>> It's a VM with spark in standalone mode and I run the unit test directly on
>> that same VM. So OpenCV is working correctly on that same machine but when
>> moving the exact same OpenCV code to spark it just crashes.
>>
>> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu  wrote:
>>>
>>> Could you run the single thread version in worker machine to make sure
>>> that OpenCV is installed and configured correctly?
>>>
>>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga 
>>> wrote:
>>> > I've verified the issue lies within Spark running OpenCV code and not
>>> > within
>>> > the sequence file BytesWritable formatting.
>>> >
>>> > This is the code which can reproduce that spark is causing the failure
>>> > by
>>> > not using the sequencefile as input at all but running the same
>>> > function
>>> > with same input on spark but fails:
>>> >
>>> > def extract_sift_features_opencv(imgfile_imgbytes):
>>> > imgfilename, discardsequencefile = imgfile_imgbytes
>>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>>> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
>>> > img = cv2.imdecode(nparr, 1)
>>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>>> > sift = cv2.xfeatures2d.SIFT_create()
>>> > kp, descriptors = sift.detectAndCompute(gray, None)
>>> > return (imgfilename, "test")
>>> >
>>> > And corresponding tests.py:
>>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>>> >
>>> >
>>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
>>> > wrote:
>>> >>
>>> >> Thanks for the advice! The following line causes spark to crash:
>>> >>
>>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>>> >>
>>> >> But I do need this line to be executed and the code does not crash
>>> >> when
>>> >> running outside of Spark but passing the same parameters. You're
>>> >> saying
>>> >> maybe the bytes from the sequencefile got somehow transformed and
>>> >> don't
>>> >> represent an image anymore causing OpenCV to crash the whole python
>>> >> executor.
>>> >>
>>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu 
>>> >> wrote:
>>> >>>
>>> >>> Could you try to comment out some lines in
>>> >>> `extract_sift_features_opencv` to find which line cause the crash?
>>> >>>
>>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>>> >>> C library in Python (OpenCV).
>>> >>>
>>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
>>> >>> 
>>> >>> wrote:
>>> >>> > Hi sparkers,
>>> >>> >
>>> >>> > I am working on a PySpark application which uses the OpenCV
>>> >>> > library. It
>>> >>> > runs
>>> >>> > fine when running the code locally but when I try to run it on
>>> >>> > Spark on
>>> >>> > the
>>> >>> > same Machine it crashes the worker.
>>> >>> >
>>> >>> > The code can be found here:
>>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>>> >>> >
>>> >>> > This is the error message taken from STDERR of the worker log:
>>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
>>> >>> >
>>> >>> > Would like pointers or tips on how to debug further? Would be nice
>>> >>> > to
>>> >>> > know
>>> >>> > the reason why the worker crashed.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Sam Stoelinga
>>> >>> >
>>> >>> >
>>> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>>> >>> > (crashed)
>>> >>> > at
>>> >>> >
>>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
>>> >>> > at
>>> >>> > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>>> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> >>> > at
>>> >>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> >>> > at
>>> >>> >
>>> >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>> > at java.lang.Thread.run(Thread.java:745)
>>> >>> > Caused by: java.io.E

Re: PySpark with OpenCV causes python worker to crash

2015-06-04 Thread Sam Stoelinga
I've changed the SIFT feature extraction to SURF feature extraction and it
works...

Following line was changed:
sift = cv2.xfeatures2d.SIFT_create()

to

sift = cv2.xfeatures2d.SURF_create()

Where should I file this as a bug? When not running on Spark it works fine
so I'm saying it's a spark bug.

On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga  wrote:

> Yea should have emphasized that. I'm running the same code on the same VM.
> It's a VM with spark in standalone mode and I run the unit test directly on
> that same VM. So OpenCV is working correctly on that same machine but when
> moving the exact same OpenCV code to spark it just crashes.
>
> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu  wrote:
>
>> Could you run the single thread version in worker machine to make sure
>> that OpenCV is installed and configured correctly?
>>
>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga 
>> wrote:
>> > I've verified the issue lies within Spark running OpenCV code and not
>> within
>> > the sequence file BytesWritable formatting.
>> >
>> > This is the code which can reproduce that spark is causing the failure
>> by
>> > not using the sequencefile as input at all but running the same function
>> > with same input on spark but fails:
>> >
>> > def extract_sift_features_opencv(imgfile_imgbytes):
>> > imgfilename, discardsequencefile = imgfile_imgbytes
>> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
>> > img = cv2.imdecode(nparr, 1)
>> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>> > sift = cv2.xfeatures2d.SIFT_create()
>> > kp, descriptors = sift.detectAndCompute(gray, None)
>> > return (imgfilename, "test")
>> >
>> > And corresponding tests.py:
>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>> >
>> >
>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
>> > wrote:
>> >>
>> >> Thanks for the advice! The following line causes spark to crash:
>> >>
>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>> >>
>> >> But I do need this line to be executed and the code does not crash when
>> >> running outside of Spark but passing the same parameters. You're saying
>> >> maybe the bytes from the sequencefile got somehow transformed and don't
>> >> represent an image anymore causing OpenCV to crash the whole python
>> >> executor.
>> >>
>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu 
>> wrote:
>> >>>
>> >>> Could you try to comment out some lines in
>> >>> `extract_sift_features_opencv` to find which line cause the crash?
>> >>>
>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>> >>> C library in Python (OpenCV).
>> >>>
>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga > >
>> >>> wrote:
>> >>> > Hi sparkers,
>> >>> >
>> >>> > I am working on a PySpark application which uses the OpenCV
>> library. It
>> >>> > runs
>> >>> > fine when running the code locally but when I try to run it on
>> Spark on
>> >>> > the
>> >>> > same Machine it crashes the worker.
>> >>> >
>> >>> > The code can be found here:
>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>> >>> >
>> >>> > This is the error message taken from STDERR of the worker log:
>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
>> >>> >
>> >>> > Would like pointers or tips on how to debug further? Would be nice
>> to
>> >>> > know
>> >>> > the reason why the worker crashed.
>> >>> >
>> >>> > Thanks,
>> >>> > Sam Stoelinga
>> >>> >
>> >>> >
>> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>> >>> > (crashed)
>> >>> > at
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>> >>> > at
>> >>> >
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
>> >>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> >>> > at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> >>> > at
>> >>> >
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> >>> > at
>> >>> >
>> >>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>> > at
>> >>> >
>> >>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>> > at java.lang.Thread.run(Thread.java:745)
>> >>> > Caused by: java.io.EOFException
>> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> >>> > at
>> >>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>> >>> >
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>


Re: PySpark with OpenCV causes python worker to crash

2015-06-04 Thread Sam Stoelinga
Yea should have emphasized that. I'm running the same code on the same VM.
It's a VM with spark in standalone mode and I run the unit test directly on
that same VM. So OpenCV is working correctly on that same machine but when
moving the exact same OpenCV code to spark it just crashes.

On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu  wrote:

> Could you run the single thread version in worker machine to make sure
> that OpenCV is installed and configured correctly?
>
> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga 
> wrote:
> > I've verified the issue lies within Spark running OpenCV code and not
> within
> > the sequence file BytesWritable formatting.
> >
> > This is the code which can reproduce that spark is causing the failure by
> > not using the sequencefile as input at all but running the same function
> > with same input on spark but fails:
> >
> > def extract_sift_features_opencv(imgfile_imgbytes):
> > imgfilename, discardsequencefile = imgfile_imgbytes
> > imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
> > nparr = np.fromstring(buffer(imgbytes), np.uint8)
> > img = cv2.imdecode(nparr, 1)
> > gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
> > sift = cv2.xfeatures2d.SIFT_create()
> > kp, descriptors = sift.detectAndCompute(gray, None)
> > return (imgfilename, "test")
> >
> > And corresponding tests.py:
> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
> >
> >
> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
> > wrote:
> >>
> >> Thanks for the advice! The following line causes spark to crash:
> >>
> >> kp, descriptors = sift.detectAndCompute(gray, None)
> >>
> >> But I do need this line to be executed and the code does not crash when
> >> running outside of Spark but passing the same parameters. You're saying
> >> maybe the bytes from the sequencefile got somehow transformed and don't
> >> represent an image anymore causing OpenCV to crash the whole python
> >> executor.
> >>
> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu 
> wrote:
> >>>
> >>> Could you try to comment out some lines in
> >>> `extract_sift_features_opencv` to find which line cause the crash?
> >>>
> >>> If the bytes came from sequenceFile() is broken, it's easy to crash a
> >>> C library in Python (OpenCV).
> >>>
> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga 
> >>> wrote:
> >>> > Hi sparkers,
> >>> >
> >>> > I am working on a PySpark application which uses the OpenCV library.
> It
> >>> > runs
> >>> > fine when running the code locally but when I try to run it on Spark
> on
> >>> > the
> >>> > same Machine it crashes the worker.
> >>> >
> >>> > The code can be found here:
> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
> >>> >
> >>> > This is the error message taken from STDERR of the worker log:
> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
> >>> >
> >>> > Would like pointers or tips on how to debug further? Would be nice to
> >>> > know
> >>> > the reason why the worker crashed.
> >>> >
> >>> > Thanks,
> >>> > Sam Stoelinga
> >>> >
> >>> >
> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly
> >>> > (crashed)
> >>> > at
> >>> >
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
> >>> > at
> >>> >
> >>> >
> org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
> >>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> >>> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >>> > at
> >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >>> > at
> >>> >
> >>> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>> > at
> >>> >
> >>> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>> > at java.lang.Thread.run(Thread.java:745)
> >>> > Caused by: java.io.EOFException
> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >>> > at
> >>> >
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
> >>> >
> >>> >
> >>> >
> >>
> >>
> >
>


Re: PySpark with OpenCV causes python worker to crash

2015-06-01 Thread Davies Liu
Could you run the single thread version in worker machine to make sure
that OpenCV is installed and configured correctly?

On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga  wrote:
> I've verified the issue lies within Spark running OpenCV code and not within
> the sequence file BytesWritable formatting.
>
> This is the code which can reproduce that spark is causing the failure by
> not using the sequencefile as input at all but running the same function
> with same input on spark but fails:
>
> def extract_sift_features_opencv(imgfile_imgbytes):
> imgfilename, discardsequencefile = imgfile_imgbytes
> imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
> nparr = np.fromstring(buffer(imgbytes), np.uint8)
> img = cv2.imdecode(nparr, 1)
> gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
> sift = cv2.xfeatures2d.SIFT_create()
> kp, descriptors = sift.detectAndCompute(gray, None)
> return (imgfilename, "test")
>
> And corresponding tests.py:
> https://gist.github.com/samos123/d383c26f6d47d34d32d6
>
>
> On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
> wrote:
>>
>> Thanks for the advice! The following line causes spark to crash:
>>
>> kp, descriptors = sift.detectAndCompute(gray, None)
>>
>> But I do need this line to be executed and the code does not crash when
>> running outside of Spark but passing the same parameters. You're saying
>> maybe the bytes from the sequencefile got somehow transformed and don't
>> represent an image anymore causing OpenCV to crash the whole python
>> executor.
>>
>> On Fri, May 29, 2015 at 2:06 AM, Davies Liu  wrote:
>>>
>>> Could you try to comment out some lines in
>>> `extract_sift_features_opencv` to find which line cause the crash?
>>>
>>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>>> C library in Python (OpenCV).
>>>
>>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga 
>>> wrote:
>>> > Hi sparkers,
>>> >
>>> > I am working on a PySpark application which uses the OpenCV library. It
>>> > runs
>>> > fine when running the code locally but when I try to run it on Spark on
>>> > the
>>> > same Machine it crashes the worker.
>>> >
>>> > The code can be found here:
>>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>>> >
>>> > This is the error message taken from STDERR of the worker log:
>>> > https://gist.github.com/samos123/3300191684aee7fc8013
>>> >
>>> > Would like pointers or tips on how to debug further? Would be nice to
>>> > know
>>> > the reason why the worker crashed.
>>> >
>>> > Thanks,
>>> > Sam Stoelinga
>>> >
>>> >
>>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>>> > (crashed)
>>> > at
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>>> > at
>>> >
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
>>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> > at
>>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> > at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> > at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> > at java.lang.Thread.run(Thread.java:745)
>>> > Caused by: java.io.EOFException
>>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> > at
>>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>>> >
>>> >
>>> >
>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
I've verified the issue lies within Spark running OpenCV code and not
within the sequence file BytesWritable formatting.

This is the code which can reproduce that spark is causing the failure by
not using the sequencefile as input at all but running the same function
with same input on spark but fails:

def extract_sift_features_opencv(imgfile_imgbytes):
imgfilename, discardsequencefile = imgfile_imgbytes
imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
nparr = np.fromstring(buffer(imgbytes), np.uint8)
img = cv2.imdecode(nparr, 1)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(gray, None)
return (imgfilename, "test")

And corresponding tests.py:
https://gist.github.com/samos123/d383c26f6d47d34d32d6


On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
wrote:

> Thanks for the advice! The following line causes spark to crash:
>
> kp, descriptors = sift.detectAndCompute(gray, None)
>
> But I do need this line to be executed and the code does not crash when
> running outside of Spark but passing the same parameters. You're saying
> maybe the bytes from the sequencefile got somehow transformed and don't
> represent an image anymore causing OpenCV to crash the whole python
> executor.
>
> On Fri, May 29, 2015 at 2:06 AM, Davies Liu  wrote:
>
>> Could you try to comment out some lines in
>> `extract_sift_features_opencv` to find which line cause the crash?
>>
>> If the bytes came from sequenceFile() is broken, it's easy to crash a
>> C library in Python (OpenCV).
>>
>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga 
>> wrote:
>> > Hi sparkers,
>> >
>> > I am working on a PySpark application which uses the OpenCV library. It
>> runs
>> > fine when running the code locally but when I try to run it on Spark on
>> the
>> > same Machine it crashes the worker.
>> >
>> > The code can be found here:
>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>> >
>> > This is the error message taken from STDERR of the worker log:
>> > https://gist.github.com/samos123/3300191684aee7fc8013
>> >
>> > Would like pointers or tips on how to debug further? Would be nice to
>> know
>> > the reason why the worker crashed.
>> >
>> > Thanks,
>> > Sam Stoelinga
>> >
>> >
>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>> (crashed)
>> > at
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>> > at
>> >
>> org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
>> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.io.EOFException
>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> > at
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>> >
>> >
>> >
>>
>
>


Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
Thanks for the advice! The following line causes spark to crash:

kp, descriptors = sift.detectAndCompute(gray, None)

But I do need this line to be executed and the code does not crash when
running outside of Spark but passing the same parameters. You're saying
maybe the bytes from the sequencefile got somehow transformed and don't
represent an image anymore causing OpenCV to crash the whole python
executor.

On Fri, May 29, 2015 at 2:06 AM, Davies Liu  wrote:

> Could you try to comment out some lines in
> `extract_sift_features_opencv` to find which line cause the crash?
>
> If the bytes came from sequenceFile() is broken, it's easy to crash a
> C library in Python (OpenCV).
>
> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga 
> wrote:
> > Hi sparkers,
> >
> > I am working on a PySpark application which uses the OpenCV library. It
> runs
> > fine when running the code locally but when I try to run it on Spark on
> the
> > same Machine it crashes the worker.
> >
> > The code can be found here:
> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
> >
> > This is the error message taken from STDERR of the worker log:
> > https://gist.github.com/samos123/3300191684aee7fc8013
> >
> > Would like pointers or tips on how to debug further? Would be nice to
> know
> > the reason why the worker crashed.
> >
> > Thanks,
> > Sam Stoelinga
> >
> >
> > org.apache.spark.SparkException: Python worker exited unexpectedly
> (crashed)
> > at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
> > at
> > org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.EOFException
> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
> > at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
> >
> >
> >
>


Re: PySpark with OpenCV causes python worker to crash

2015-05-28 Thread Davies Liu
Could you try to comment out some lines in
`extract_sift_features_opencv` to find which line cause the crash?

If the bytes came from sequenceFile() is broken, it's easy to crash a
C library in Python (OpenCV).

On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga  wrote:
> Hi sparkers,
>
> I am working on a PySpark application which uses the OpenCV library. It runs
> fine when running the code locally but when I try to run it on Spark on the
> same Machine it crashes the worker.
>
> The code can be found here:
> https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>
> This is the error message taken from STDERR of the worker log:
> https://gist.github.com/samos123/3300191684aee7fc8013
>
> Would like pointers or tips on how to debug further? Would be nice to know
> the reason why the worker crashed.
>
> Thanks,
> Sam Stoelinga
>
>
> org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.(PythonRDD.scala:176)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org