subject:"\[jira\] \[Commented\] \(SPARK\-23883\) Error with conversion to arrow while using pandas

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-09 Thread Omri (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430148#comment-16430148
 ] 

Omri commented on SPARK-23883:
--

I've opened a new issue: https://issues.apache.org/jira/browse/SPARK-23929

 

> Error with conversion to arrow while using pandas_udf
> -
>
> Key: SPARK-23883
> URL: https://issues.apache.org/jira/browse/SPARK-23883
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0
> Python 3.5
> Java 1.8.0_161-b12
>Reporter: Omri
>Priority: Major
>
> Hi,
> I have a code that works on DataBricks but doesn't work on a local spark 
> installation.
> This is the code I'm running:
> {code:java}
> from pyspark.sql.functions import pandas_udf
> import pandas as pd
> import numpy as np
> from pyspark.sql.types import *
> schema = StructType([
>   StructField("Distance", FloatType()),
>   StructField("CarId", IntegerType())
> ])
> def haversine(lon1, lat1, lon2, lat2):
> #Calculate distance, return scalar
> return 3.5 # Removed logic to facilitate reading
> @pandas_udf(schema)
> def totalDistance(oneCar):
> dist = haversine(oneCar.Longtitude.shift(1),
>  oneCar.Latitude.shift(1),
>  oneCar.loc[1:, 'Longitude'], 
>  oneCar.loc[1:, 'Latitude'])
> return 
> pd.DataFrame({"CarId":oneCar['CarId'].iloc[0],"Distance":np.sum(dist)},index 
> = [0])
> ## Calculate the overall distance made by each car
> distancePerCar= df.groupBy('CarId').apply(totalDistance)
> {code}
> I'm getting this exception, about Arrow not able to deal with this input:
> {noformat}
> ---
> TypeError Traceback (most recent call last)
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 114 try:
> --> 115 to_arrow_type(self._returnType_placeholder)
> 116 except TypeError:
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\types.py in 
> to_arrow_type(dt)
>1641 else:
> -> 1642 raise TypeError("Unsupported type in conversion to Arrow: " + 
> str(dt))
>1643 return arrow_type
> TypeError: Unsupported type in conversion to Arrow: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
> During handling of the above exception, another exception occurred:
> NotImplementedError   Traceback (most recent call last)
>  in ()
>  18 km = 6367 * c
>  19 return km
> ---> 20 @pandas_udf("CarId: int, Distance: float")
>  21 def totalDistance(oneUser):
>  22 dist = haversine(oneUser.Longtitude.shift(1), 
> oneUser.Latitude.shift(1),
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _create_udf(f, returnType, evalType)
>  62 udf_obj = UserDefinedFunction(
>  63 f, returnType=returnType, name=None, evalType=evalType, 
> deterministic=True)
> ---> 64 return udf_obj._wrapped()
>  65 
>  66 
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _wrapped(self)
> 184 
> 185 wrapper.func = self.func
> --> 186 wrapper.returnType = self.returnType
> 187 wrapper.evalType = self.evalType
> 188 wrapper.deterministic = self.deterministic
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 117 raise NotImplementedError(
> 118 "Invalid returnType with scalar Pandas UDFs: %s 
> is "
> --> 119 "not supported" % 
> str(self._returnType_placeholder))
> 120 elif self.evalType == 
> PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF:
> 121 if isinstance(self._returnType_placeholder, StructType):
> NotImplementedError: Invalid returnType with scalar Pandas UDFs: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
>  is not supported{noformat}
> I've also tried changing the schema to
> {code:java}
> @pandas_udf("") {code}
> and
> {code:java}
> @pandas_udf("CarId:int,Distance:float"){code}
>  
> As mentioned, this is working on a DataBricks instance in Azure, but not 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-08 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429941#comment-16429941
 ] 

Hyukjin Kwon commented on SPARK-23883:
--

We need to choose one side. I think documenting it should be good enough for 
now, if I understood correctly.

> Error with conversion to arrow while using pandas_udf
> -
>
> Key: SPARK-23883
> URL: https://issues.apache.org/jira/browse/SPARK-23883
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0
> Python 3.5
> Java 1.8.0_161-b12
>Reporter: Omri
>Priority: Major
>
> Hi,
> I have a code that works on DataBricks but doesn't work on a local spark 
> installation.
> This is the code I'm running:
> {code:java}
> from pyspark.sql.functions import pandas_udf
> import pandas as pd
> import numpy as np
> from pyspark.sql.types import *
> schema = StructType([
>   StructField("Distance", FloatType()),
>   StructField("CarId", IntegerType())
> ])
> def haversine(lon1, lat1, lon2, lat2):
> #Calculate distance, return scalar
> return 3.5 # Removed logic to facilitate reading
> @pandas_udf(schema)
> def totalDistance(oneCar):
> dist = haversine(oneCar.Longtitude.shift(1),
>  oneCar.Latitude.shift(1),
>  oneCar.loc[1:, 'Longitude'], 
>  oneCar.loc[1:, 'Latitude'])
> return 
> pd.DataFrame({"CarId":oneCar['CarId'].iloc[0],"Distance":np.sum(dist)},index 
> = [0])
> ## Calculate the overall distance made by each car
> distancePerCar= df.groupBy('CarId').apply(totalDistance)
> {code}
> I'm getting this exception, about Arrow not able to deal with this input:
> {noformat}
> ---
> TypeError Traceback (most recent call last)
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 114 try:
> --> 115 to_arrow_type(self._returnType_placeholder)
> 116 except TypeError:
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\types.py in 
> to_arrow_type(dt)
>1641 else:
> -> 1642 raise TypeError("Unsupported type in conversion to Arrow: " + 
> str(dt))
>1643 return arrow_type
> TypeError: Unsupported type in conversion to Arrow: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
> During handling of the above exception, another exception occurred:
> NotImplementedError   Traceback (most recent call last)
>  in ()
>  18 km = 6367 * c
>  19 return km
> ---> 20 @pandas_udf("CarId: int, Distance: float")
>  21 def totalDistance(oneUser):
>  22 dist = haversine(oneUser.Longtitude.shift(1), 
> oneUser.Latitude.shift(1),
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _create_udf(f, returnType, evalType)
>  62 udf_obj = UserDefinedFunction(
>  63 f, returnType=returnType, name=None, evalType=evalType, 
> deterministic=True)
> ---> 64 return udf_obj._wrapped()
>  65 
>  66 
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _wrapped(self)
> 184 
> 185 wrapper.func = self.func
> --> 186 wrapper.returnType = self.returnType
> 187 wrapper.evalType = self.evalType
> 188 wrapper.deterministic = self.deterministic
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 117 raise NotImplementedError(
> 118 "Invalid returnType with scalar Pandas UDFs: %s 
> is "
> --> 119 "not supported" % 
> str(self._returnType_placeholder))
> 120 elif self.evalType == 
> PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF:
> 121 if isinstance(self._returnType_placeholder, StructType):
> NotImplementedError: Invalid returnType with scalar Pandas UDFs: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
>  is not supported{noformat}
> I've also tried changing the schema to
> {code:java}
> @pandas_udf("") {code}
> and
> {code:java}
> @pandas_udf("CarId:int,Distance:float"){code}
>  
> As mentioned, this is working on a DataBricks instance in Azure, but not 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-08 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429940#comment-16429940
 ] 

Hyukjin Kwon commented on SPARK-23883:
--

Let's resolve it and open a new ticket. So basically you mean the output should 
be mapped by name but not position right?

> Error with conversion to arrow while using pandas_udf
> -
>
> Key: SPARK-23883
> URL: https://issues.apache.org/jira/browse/SPARK-23883
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0
> Python 3.5
> Java 1.8.0_161-b12
>Reporter: Omri
>Priority: Major
>
> Hi,
> I have a code that works on DataBricks but doesn't work on a local spark 
> installation.
> This is the code I'm running:
> {code:java}
> from pyspark.sql.functions import pandas_udf
> import pandas as pd
> import numpy as np
> from pyspark.sql.types import *
> schema = StructType([
>   StructField("Distance", FloatType()),
>   StructField("CarId", IntegerType())
> ])
> def haversine(lon1, lat1, lon2, lat2):
> #Calculate distance, return scalar
> return 3.5 # Removed logic to facilitate reading
> @pandas_udf(schema)
> def totalDistance(oneCar):
> dist = haversine(oneCar.Longtitude.shift(1),
>  oneCar.Latitude.shift(1),
>  oneCar.loc[1:, 'Longitude'], 
>  oneCar.loc[1:, 'Latitude'])
> return 
> pd.DataFrame({"CarId":oneCar['CarId'].iloc[0],"Distance":np.sum(dist)},index 
> = [0])
> ## Calculate the overall distance made by each car
> distancePerCar= df.groupBy('CarId').apply(totalDistance)
> {code}
> I'm getting this exception, about Arrow not able to deal with this input:
> {noformat}
> ---
> TypeError Traceback (most recent call last)
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 114 try:
> --> 115 to_arrow_type(self._returnType_placeholder)
> 116 except TypeError:
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\types.py in 
> to_arrow_type(dt)
>1641 else:
> -> 1642 raise TypeError("Unsupported type in conversion to Arrow: " + 
> str(dt))
>1643 return arrow_type
> TypeError: Unsupported type in conversion to Arrow: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
> During handling of the above exception, another exception occurred:
> NotImplementedError   Traceback (most recent call last)
>  in ()
>  18 km = 6367 * c
>  19 return km
> ---> 20 @pandas_udf("CarId: int, Distance: float")
>  21 def totalDistance(oneUser):
>  22 dist = haversine(oneUser.Longtitude.shift(1), 
> oneUser.Latitude.shift(1),
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _create_udf(f, returnType, evalType)
>  62 udf_obj = UserDefinedFunction(
>  63 f, returnType=returnType, name=None, evalType=evalType, 
> deterministic=True)
> ---> 64 return udf_obj._wrapped()
>  65 
>  66 
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _wrapped(self)
> 184 
> 185 wrapper.func = self.func
> --> 186 wrapper.returnType = self.returnType
> 187 wrapper.evalType = self.evalType
> 188 wrapper.deterministic = self.deterministic
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 117 raise NotImplementedError(
> 118 "Invalid returnType with scalar Pandas UDFs: %s 
> is "
> --> 119 "not supported" % 
> str(self._returnType_placeholder))
> 120 elif self.evalType == 
> PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF:
> 121 if isinstance(self._returnType_placeholder, StructType):
> NotImplementedError: Invalid returnType with scalar Pandas UDFs: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
>  is not supported{noformat}
> I've also tried changing the schema to
> {code:java}
> @pandas_udf("") {code}
> and
> {code:java}
> @pandas_udf("CarId:int,Distance:float"){code}
>  
> As mentioned, this is working on a DataBricks instance in Azure, but not 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-08 Thread Omri (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429864#comment-16429864
 ] 

Omri commented on SPARK-23883:
--

Yes it does. Thank you! I missed that part on the documentation.

I did find a weird behavior related to the order of the objects in the struct 
(if you wish, I can open a new issue on this).

When I define the schema as this one:

 
{code:java}
StructType([
  StructField("CarId", IntegerType()),
  StructField("Distance", FloatType())
])
{code}
It doesn't use the names of the returned data frame by the pandas_udf, which 
results in a wrong assignment of types. The CarId would get a float value and 
the Distance would get cast into Integer.

 

Here's the result for example:
{code:java}
+-++
|CarId|Distance|
+-++
|3|29.0|
|3|65.0|
|3|   191.0|
|3|   222.0|
|3|19.0|
{code}
The pandas_udf returns 3.5 which gets truncated into 3.

When I replace the order of the struct into
{code:java}
schema = StructType([
  StructField("Distance", FloatType()),
  StructField("CarId", IntegerType())
])
{code}
I get this result:
{code:java}
++-+
|Distance|CarId|
++-+
| 3.5|   29|
| 3.5|   65|
| 3.5|  191|
| 3.5|  222|
| 3.5|   19|
{code}
I would assume that Spark would map the names of the returned pandas data frame 
columns with the StructField names.

 

Thanks again

> Error with conversion to arrow while using pandas_udf
> -
>
> Key: SPARK-23883
> URL: https://issues.apache.org/jira/browse/SPARK-23883
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0
> Python 3.5
> Java 1.8.0_161-b12
>Reporter: Omri
>Priority: Major
>
> Hi,
> I have a code that works on DataBricks but doesn't work on a local spark 
> installation.
> This is the code I'm running:
> {code:java}
> from pyspark.sql.functions import pandas_udf
> import pandas as pd
> import numpy as np
> from pyspark.sql.types import *
> schema = StructType([
>   StructField("Distance", FloatType()),
>   StructField("CarId", IntegerType())
> ])
> def haversine(lon1, lat1, lon2, lat2):
> #Calculate distance, return scalar
> return 3.5 # Removed logic to facilitate reading
> @pandas_udf(schema)
> def totalDistance(oneCar):
> dist = haversine(oneCar.Longtitude.shift(1),
>  oneCar.Latitude.shift(1),
>  oneCar.loc[1:, 'Longitude'], 
>  oneCar.loc[1:, 'Latitude'])
> return 
> pd.DataFrame({"CarId":oneCar['CarId'].iloc[0],"Distance":np.sum(dist)},index 
> = [0])
> ## Calculate the overall distance made by each car
> distancePerCar= df.groupBy('CarId').apply(totalDistance)
> {code}
> I'm getting this exception, about Arrow not able to deal with this input:
> {noformat}
> ---
> TypeError Traceback (most recent call last)
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 114 try:
> --> 115 to_arrow_type(self._returnType_placeholder)
> 116 except TypeError:
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\types.py in 
> to_arrow_type(dt)
>1641 else:
> -> 1642 raise TypeError("Unsupported type in conversion to Arrow: " + 
> str(dt))
>1643 return arrow_type
> TypeError: Unsupported type in conversion to Arrow: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
> During handling of the above exception, another exception occurred:
> NotImplementedError   Traceback (most recent call last)
>  in ()
>  18 km = 6367 * c
>  19 return km
> ---> 20 @pandas_udf("CarId: int, Distance: float")
>  21 def totalDistance(oneUser):
>  22 dist = haversine(oneUser.Longtitude.shift(1), 
> oneUser.Latitude.shift(1),
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _create_udf(f, returnType, evalType)
>  62 udf_obj = UserDefinedFunction(
>  63 f, returnType=returnType, name=None, evalType=evalType, 
> deterministic=True)
> ---> 64 return udf_obj._wrapped()
>  65 
>  66 
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _wrapped(self)
> 184 
> 185 wrapper.func = self.func
> --> 186 wrapper.returnType = self.returnType
> 187 wrapper.evalType = self.evalType
> 188 wrapper.deterministic = self.deterministic
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 117 raise NotImplementedError(
> 118

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

2018-04-06 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429123#comment-16429123
 ] 

Bryan Cutler commented on SPARK-23883:
--

I think the problem might be that since the {{pandas_udf}} is for a 
groupby-apply, you need to specify the functionType as PandasUDFType.GROUPED_MAP

for example, @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)

see 
[https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]
 for usage

Can you try that and see if it works?

> Error with conversion to arrow while using pandas_udf
> -
>
> Key: SPARK-23883
> URL: https://issues.apache.org/jira/browse/SPARK-23883
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0
> Python 3.5
> Java 1.8.0_161-b12
>Reporter: Omri
>Priority: Major
>
> Hi,
> I have a code that works on DataBricks but doesn't work on a local spark 
> installation.
> This is the code I'm running:
> {code:java}
> from pyspark.sql.functions import pandas_udf
> import pandas as pd
> import numpy as np
> from pyspark.sql.types import *
> schema = StructType([
>   StructField("Distance", FloatType()),
>   StructField("CarId", IntegerType())
> ])
> def haversine(lon1, lat1, lon2, lat2):
> #Calculate distance, return scalar
> return 3.5 # Removed logic to facilitate reading
> @pandas_udf(schema)
> def totalDistance(oneCar):
> dist = haversine(oneCar.Longtitude.shift(1),
>  oneCar.Latitude.shift(1),
>  oneCar.loc[1:, 'Longitude'], 
>  oneCar.loc[1:, 'Latitude'])
> return 
> pd.DataFrame({"CarId":oneCar['CarId'].iloc[0],"Distance":np.sum(dist)},index 
> = [0])
> ## Calculate the overall distance made by each car
> distancePerCar= df.groupBy('CarId').apply(totalDistance)
> {code}
> I'm getting this exception, about Arrow not able to deal with this input:
> {noformat}
> ---
> TypeError Traceback (most recent call last)
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 114 try:
> --> 115 to_arrow_type(self._returnType_placeholder)
> 116 except TypeError:
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\types.py in 
> to_arrow_type(dt)
>1641 else:
> -> 1642 raise TypeError("Unsupported type in conversion to Arrow: " + 
> str(dt))
>1643 return arrow_type
> TypeError: Unsupported type in conversion to Arrow: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
> During handling of the above exception, another exception occurred:
> NotImplementedError   Traceback (most recent call last)
>  in ()
>  18 km = 6367 * c
>  19 return km
> ---> 20 @pandas_udf("CarId: int, Distance: float")
>  21 def totalDistance(oneUser):
>  22 dist = haversine(oneUser.Longtitude.shift(1), 
> oneUser.Latitude.shift(1),
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _create_udf(f, returnType, evalType)
>  62 udf_obj = UserDefinedFunction(
>  63 f, returnType=returnType, name=None, evalType=evalType, 
> deterministic=True)
> ---> 64 return udf_obj._wrapped()
>  65 
>  66 
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> _wrapped(self)
> 184 
> 185 wrapper.func = self.func
> --> 186 wrapper.returnType = self.returnType
> 187 wrapper.evalType = self.evalType
> 188 wrapper.deterministic = self.deterministic
> C:\opt\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\udf.py in 
> returnType(self)
> 117 raise NotImplementedError(
> 118 "Invalid returnType with scalar Pandas UDFs: %s 
> is "
> --> 119 "not supported" % 
> str(self._returnType_placeholder))
> 120 elif self.evalType == 
> PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF:
> 121 if isinstance(self._returnType_placeholder, StructType):
> NotImplementedError: Invalid returnType with scalar Pandas UDFs: 
> StructType(List(StructField(CarId,IntegerType,true),StructField(Distance,FloatType,true)))
>  is not supported{noformat}
> I've also tried changing the schema to
> {code:java}
> @pandas_udf("") {code}
> and
> {code:java}
> @pandas_udf("CarId:int,Distance:float"){code}
>  
> As mentioned, this is working on a DataBricks instance in Azure, but not 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

[jira] [Commented] (SPARK-23883) Error with conversion to arrow while using pandas_udf

5 matches

Site Navigation

Mail list logo

Footer information