Re: Nested UDFs

2016-11-17 Thread Perttu Ranta-aho
; > You would need a Udf if you would wanted to do something on the string > value of a single row (e.g. return data + “bla”) > > > > Assaf. > > > > *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi] > *Sent:* Thursday, November 17, 2016 9:15 AM > *To:* user@spark.ap

Nested UDFs

2016-11-16 Thread Perttu Ranta-aho
Hi, Shouldn't this work? from pyspark.sql.functions import regexp_replace, udf def my_f(data): return regexp_replace(data, 'a', 'X') my_udf = udf(my_f) test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',)) test_data.select(my_udf(test_data.name)).show() But instead of

Re: UDF with column value comparison fails with PySpark

2016-11-10 Thread Perttu Ranta-aho
So it was something obvious, thanks! -Perttu to 10. marraskuuta 2016 klo 21.19 Davies Liu kirjoitti: > On Thu, Nov 10, 2016 at 11:14 AM, Perttu Ranta-aho > wrote: > > Hello, > > > > I want to create an UDF which modifies one column value depending on > value &g

UDF with column value comparison fails with PySpark

2016-11-10 Thread Perttu Ranta-aho
Hello, I want to create an UDF which modifies one column value depending on value of some other column. But Python version of the code fails always in column value comparison. Below are simple examples, scala version works as expected but Python version throws an execption. Am I missing something

Re: Running a spark-submit compatible app in spark-shell

2014-05-26 Thread Perttu Ranta-aho
Hi Roger, Were you able to solve this? -Perttu On Tue, Apr 29, 2014 at 8:11 AM, Roger Hoover wrote: > Patrick, > > Thank you for replying. That didn't seem to work either. I see the > option parsed using verbose mode. > > Parsed arguments: > ... > driverExtraClassPath > /Users/rhoover/Wo

Re: PySpark & Mesos random crashes

2014-05-26 Thread Perttu Ranta-aho
t cause of your problem though, > which has to be found in whatever is causing the DAGScheduler to need to > shutdown in the first place. > > > On Sun, May 25, 2014 at 12:10 PM, Perttu Ranta-aho < > perttu.ranta...@gmail.com> wrote: > >> Hi, >> >> We have

PySpark & Mesos random crashes

2014-05-25 Thread Perttu Ranta-aho
Hi, We have a small Mesos (0.18.1) cluster with 4 nodes. Upgraded to Spark 1.0.0-rc9, to overcome some PySpark bugs. But now we are experiencing random crashes with almost every job. Local jobs run fine, but same code with same data set in Mesos cluster leads to errors like: 14/05/22 15:03:34 ERR