Thanks for those suggestions, Jarrod.  They all sound pretty useful - would
you mind taking a crack at numbering them 1,2,3... etc, in the order of
priority as you see it?

Also it seems like some of these could be applied to the Pivot function as
well, e.g., UDF for column naming.

Frank



On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey <jvawd...@pivotal.io> wrote:

> Hey Frank,
>
> How are special character values handled today? It is often not ideal to
> end up with column names that require double quotes to call due to
> downstream scripts.
>
> A couple of features that would be useful
>
> * Option to define resulting column names. Please see pdltools
> implementation - the ability to pass in a function is especially useful (
> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> * Option to dummy code only the top n most frequently occurring values in
> any column
> * Option to exclude original column from results table
> * Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2
> ...) instead of values in column names + secondary mapping table
>
> Thank you
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal
> (650) 315-8905
> https://pivotal.io/
>
> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
>> For the module encoding categorical variables
>> http://madlib.incubator.apache.org/docs/latest/group__grp__
>> data__prep.html
>> does anyone have any suggestions on improvements that we could make?
>>
>> Here is a video on how encoding categorical variables works for those not
>> familiar with it
>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6
>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
>>
>
>

Reply via email to