Hi Chris, This looks like a bug in summary table creation. I don't think MADlib has any test cases with JSON/JSONB types in input data. Based on the error message you have posted, it looks like the actual training has completed, but when it's trying to write out a summary table, it is not handling the dependent_varname correctly (I am guessing the quote around (data->>y)::int is the issue). I have created a JIRA to track this issue, please feel free to add more information in it if you'd like: https://issues.apache.org/jira/browse/MADLIB-1284
I am not aware of a recipe or example in MADlib docs that uses JSON/JSONB. The not so good work-around for this particular instance would probably be to create a new table that includes all existing columns of the current input table, and new columns that are created by expanding out the JSON blob (using something like jsonb_populate_record() in https://www.postgresql.org/docs/9.5/functions-json.html). Then the dependent variable to use in linregr_train using this new table would be `'y'`, and the independent variables would be `'x1,x2'`, since they are all columns in the new table. NJ On Fri, Nov 16, 2018 at 10:15 PM Chris Verges <cver...@medallia.com> wrote: > Hi MADLib users, > > I have a table that contains a JSONB field (Postgres 10.x) and am now > looking to analyze all that rich data with MADLib. Example query: > > SELECT madlib.linregr_train ( > 'regr_example', -- source table > 'regr_example_model', -- output model table > '(data->>''y'')::int', -- dependent variable > 'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]' -- > independent variables > ); > > > However, it looks like MADLib isn't liking using these fields when it > comes to creating the temporary table: > > ERROR: spiexceptions.SyntaxError: syntax error at or near "')::int'" > LINE 7: , '(data->>'y')::int'::varchar as d... > ^ > QUERY: > create table regr_example_model_summary as > select > 'linregr'::varchar as method > , 'regr_example'::varchar as source_table > , 'regr_example_model'::varchar as > out_table > , '(data->>'y')::int'::varchar as > dependent_varname > , 'ARRAY[1, (data->>'x1')::int, > (data->>'x2')::int]'::varchar as independent_varname > , 0::integer as num_rows_processed > , 4::integer as num_missing_rows_skipped > , NULL::text as grouping_col > > CONTEXT: Traceback (most recent call last): > PL/Python function "linregr_train", line 20, in <module> > return linear.linregr_train(**globals()) > PL/Python function "linregr_train", line 146, in linregr_train > PL/Python function "linregr_train" > > > Are there recipes or examples of using JSONB with MADLib? Is this a known > limit? > > Thanks for the guidance, > Chris >