Hi Nandish, Thank you for the rapid reply and confirmation! I've added a note to the ticket that we were able to find a short-term workaround using a PostgreSQL *VIEW* to effectively marshal the JSONB object keys to look like traditional columns in a virtual table. The *view* also allows us to impose any data filtering desired, which we couldn't figure out how to do with the straight-up MADLib calls.
I'll monitor the MADLIB-1284 ticket for updates. Again, thanks! Chris On Sat, Nov 17, 2018 at 10:05 AM Nandish Jayaram <njaya...@pivotal.io> wrote: > > Hi Chris, > > This looks like a bug in summary table creation. I don't think MADlib has any > test cases with JSON/JSONB types in input data. > Based on the error message you have posted, it looks like the actual training > has completed, but when it's trying to write out a summary table, it is not > handling the dependent_varname correctly (I am guessing the quote around > (data->>y)::int is the issue). I have created a JIRA to track this issue, > please feel free to add more information in it if you'd like: > https://issues.apache.org/jira/browse/MADLIB-1284 > > I am not aware of a recipe or example in MADlib docs that uses JSON/JSONB. > The not so good work-around for this particular instance would probably be to > create a new table that includes all existing columns of the current input > table, and new columns that are created by expanding out the JSON blob (using > something like jsonb_populate_record() in > https://www.postgresql.org/docs/9.5/functions-json.html). Then the dependent > variable to use in linregr_train using this new table would be `'y'`, and the > independent variables would be `'x1,x2'`, since they are all columns in the > new table. > > NJ > > On Fri, Nov 16, 2018 at 10:15 PM Chris Verges <cver...@medallia.com> wrote: >> >> Hi MADLib users, >> >> I have a table that contains a JSONB field (Postgres 10.x) and am now >> looking to analyze all that rich data with MADLib. Example query: >> >> SELECT madlib.linregr_train ( >> 'regr_example', -- source table >> 'regr_example_model', -- output model table >> '(data->>''y'')::int', -- dependent variable >> 'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]' -- independent >> variables >> ); >> >> >> However, it looks like MADLib isn't liking using these fields when it comes >> to creating the temporary table: >> >> ERROR: spiexceptions.SyntaxError: syntax error at or near "')::int'" >> LINE 7: , '(data->>'y')::int'::varchar as d... >> ^ >> QUERY: >> create table regr_example_model_summary as >> select >> 'linregr'::varchar as method >> , 'regr_example'::varchar as source_table >> , 'regr_example_model'::varchar as out_table >> , '(data->>'y')::int'::varchar as dependent_varname >> , 'ARRAY[1, (data->>'x1')::int, >> (data->>'x2')::int]'::varchar as independent_varname >> , 0::integer as num_rows_processed >> , 4::integer as num_missing_rows_skipped >> , NULL::text as grouping_col >> >> CONTEXT: Traceback (most recent call last): >> PL/Python function "linregr_train", line 20, in <module> >> return linear.linregr_train(**globals()) >> PL/Python function "linregr_train", line 146, in linregr_train >> PL/Python function "linregr_train" >> >> >> Are there recipes or examples of using JSONB with MADLib? Is this a known >> limit? >> >> Thanks for the guidance, >> Chris