Re: [I] User Defined Functions crash Spark Dataframes created directly, but not for ones made from Pandas on Spark. [spark]

via GitHub Tue, 19 May 2026 05:53:13 -0700


IMarvinTPA commented on issue #55882:
URL: https://github.com/apache/spark/issues/55882#issuecomment-4487916638


   > Could you reproduce the problem on Linux/Mac? I don't think spark in 
general cares too much about Windows. We don't have any Windows related test 
and most of our users do not use it on Windows. I tried this on my macbook and 
it works fine.
   
   My work machine is Windows.  So that's where I'm running into the problem.  
I have a library that lets me run code in both Databricks and against SQL 
server or Postgres on a local machine with minimal changes in the working 
script by writing the correct SQL translations and I have functions that 
convert between Pandas, Spark on Pandas API, and Spark.  And UDFs are important 
for being able to manipulate the data for custom manipulations.
   
   > The code is a bit weird though - `cols2 = list(map(list, zip(*cols)))` 
what are you trying to achieve here?
   My `cols` variable in my main code is a list of lists where the nested list 
is a list of all of the values for that column.  Spark wants each row of the 
outer list to contain values for each column in the row with a name.  So this 
just pivots the data from `[[val_for_col1_row1, val_for_col1_row2], 
[val_for_col2_row1, val_for_col2_row2]]` into `[{"col1" : val_for_col1_row1, 
"col2": val_for_col2_row1}, {"col1" : val_for_col1_row2, "col2": 
val_for_col2_row2}]`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] User Defined Functions crash Spark Dataframes created directly, but not for ones made from Pandas on Spark. [spark]

Reply via email to