You can have a list of all the columns and pass it to a recursive recursive
function to fit and make the transformation.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-a-Spark-Equivalent-for-Pandas-get-dummies-tp28064p28079.html
Sent from the
gorical variables into dummy variables, then save the
>> transformed data back to CSV. That is why I'm so interested in get_dummies
>> but it's not scalable enough for my data size (500-600GB per file).
>>
>> Thanks in advance.
>>
>> Nick
>>
>>
med data back to CSV. That is why I'm so interested in get_dummies
> but it's not scalable enough for my data size (500-600GB per file).
>
> Thanks in advance.
>
> Nick
>
> ----------
> View this message in context: Finding a Spark Equivalent for Pa
I have a dataset that I need to convert some of the the variables to dummy
variables. The get_dummies function in Pandas works perfectly on smaller
datasets but since it collects I'll always be bottlenecked by the master
node.
I've looked at Spark's OHE feature and while that will work in theory
I have a dataset that I need to convert some of the the variables to dummy
variables. The get_dummies function in Pandas works perfectly on smaller
datasets but since it collects I'll always be bottlenecked by the master
node.
I've looked at Spark's OHE feature and while that will work in theory