Hi all,
I'm using spark sql in python and want to write a udf that takes an entire
Row as the argument.
I tried something like:

def functionName(row):
    ...
    return a_string

udfFunctionName=udf(functionName, StringType())
df.withColumn('columnName', udfFunctionName('*'))

but this gives an error message:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
line 1311, in withColumn
    return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
  File
"/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 813, in __call__
  File
"/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py",
line 51, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"unresolved operator 'Project
[address#0,name#1,PythonUDF#functionName(*) AS columnName#26];"

Does anyone know how this can be done or whether this is possible?

Thank you,
Nisrina.

Reply via email to