Does it deserve to be a JIRA in Spark / Spark MLLib?
How do you guys normally determine data types?
Frameworks like h2o automatically determine data type scanning a sample of
data, or whole dataset.
So then one can decide e.g. if a variable should be a categorical variable
or numerical.
Another u
Wanted to take something like this
https://github.com/fitzscott/AirQuality/blob/master/HiveDataTypeGuesser.java
and create a Hive UDAF to create an aggregate function that returns a data
type guess.
Am I inventing a wheel?
Does Spark have something like this already built-in?
Would be very useful f