Hello Sean, Thank you for the heads-up ! Interaction transform won't help for my use case as it returns a vector that I won't be able to hash. I will definitely dig further into custom transformations though.
Thanks ! David Le ven. 1 oct. 2021 à 15:49, Sean Owen <sro...@gmail.com> a écrit : > Are you looking for > https://spark.apache.org/docs/latest/ml-features.html#interaction ? > That's the closest built in thing I can think of. Otherwise you can make > custom transformations. > > On Fri, Oct 1, 2021, 8:44 AM David Diebold <davidjdieb...@gmail.com> > wrote: > >> Hello everyone, >> >> In MLLib, I’m trying to rely essentially on pipelines to create features >> out of the Titanic dataset, and show-case the power of feature hashing. I >> want to: >> >> - Apply bucketization on some columns (QuantileDiscretizer is >> fine) >> >> - Then I want to cross all my columns with each other to have >> cross features. >> >> - Then I would like to hash all of these cross features into a >> vector. >> >> - Then give it to a logistic regression. >> >> Looking at the documentation, it looks like the only way to hash features >> is the *FeatureHasher* transformation. It takes multiple columns as >> input, type can be numeric, bool, string (but no vector/array). >> >> But now I’m left wondering how I can create my cross-feature columns. I’m >> looking at a transformation that could take two columns as input, and >> return a numeric, bool, or string. I didn't manage to find anything that >> does the job. There are multiple transformations such as VectorAssembler, >> that operate on vector, but this is not a typeaccepted by the FeatureHasher. >> >> Of course, I could try to combine columns directly in my dataframe >> (before the pipeline kicks-in), but then I would not be able to benefit any >> more from QuantileDiscretizer and other cool functions. >> >> >> Am I missing something in the transformation api ? Or is my approach to >> hashing wrong ? Or should we consider to extend the api somehow ? >> >> >> >> Thank you, kind regards, >> >> David >> >