Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-15 Thread neil90
You can have a list of all the columns and pass it to a recursive recursive function to fit and make the transformation. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-a-Spark-Equivalent-for-Pandas-get-dummies-tp28064p28079.html Sent from the

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
gorical variables into dummy variables, then save the >> transformed data back to CSV. That is why I'm so interested in get_dummies >> but it's not scalable enough for my data size (500-600GB per file). >> >> Thanks in advance. >> >> Nick >> >>

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nick Pentreath
med data back to CSV. That is why I'm so interested in get_dummies > but it's not scalable enough for my data size (500-600GB per file). > > Thanks in advance. > > Nick > > ---------- > View this message in context: Finding a Spark Equivalent for Pa

Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread nsharkey
I have a dataset that I need to convert some of the the variables to dummy variables. The get_dummies function in Pandas works perfectly on smaller datasets but since it collects I'll always be bottlenecked by the master node. I've looked at Spark's OHE feature and while that will work in theory

Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I have a dataset that I need to convert some of the the variables to dummy variables. The get_dummies function in Pandas works perfectly on smaller datasets but since it collects I'll always be bottlenecked by the master node. I've looked at Spark's OHE feature and while that will work in theory