Hi there, I'm experimenting with Spark ml classification in Python and would like to raise some questions
We have input data with format like {label: "string", field1: "string", field2: "string", field3: "array[string]"} The idea is to build the text field with specified combinations on these parameters by concatenation before feeding it to tokeniser and tf-idf e.g.: 3 X field1 + field2 + concat(field3) I can easily preprocess array before using it in Spark and turn field3 into single string, but wondering if there is value on having function in Spark for doing it Maybe worth have some of array ops like http://www.postgresql.org/docs/9.5/static/functions-array.html#ARRAY-FUNCTIONS-TABLE ? If so, I can probably help implementing it with a little guidance from somebody Thanks, Viktor