Hello Frank, Thanks a lot for your message and I'm sorry for my late response to this.
So, if I have categorical features ending up in large vectors after one-hot encoding, is there a way to run glm without generating a huge denormalized representation of the features? Nantia On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <fmcquil...@vmware.com> wrote: > Hi Nantia, > > I replied to this but somehow I don't think my response got to the mailing > list. > > The GLM method > http://madlib.apache.org/docs/latest/group__grp__glm.html > does not support SVEC inputs for the parameter `independent_varname` . > That parameter can be any expressions that resolves to an array, as in the > example from the user docs: > > SELECT glm('warpbreaks_dummy', > 'glm_model', > 'breaks', > 'ARRAY[1.0,"wool_B","tension_M", "tension_H"]', > 'family=poisson, link=log'); > > Frank > > ------------------------------ > *From:* Nantia Makrynioti <nantiam...@gmail.com> > *Sent:* Saturday, March 13, 2021 10:46 AM > *To:* user@madlib.apache.org <user@madlib.apache.org> > *Subject:* GLM with svec column in independent variables > > Hello, > > Is there a way to run the glm training function using a svec (sparse > vector) column in the independent variables? I'm using the > encode_categorical_variables function to transform a set of categorical > features to a sparse vector for every tuple, but glm does not seem to > accept this column as an independent variable. > > Thank you very much in advance, > Nantia >