Hi Nandish,

Thank you for the rapid reply and confirmation!  I've added a note to
the ticket that we were able to find a short-term workaround using a
PostgreSQL *VIEW* to effectively marshal the JSONB object keys to look
like traditional columns in a virtual table.  The *view* also allows
us to impose any data filtering desired, which we couldn't figure out
how to do with the straight-up MADLib calls.

I'll monitor the MADLIB-1284 ticket for updates.  Again, thanks!

Chris


On Sat, Nov 17, 2018 at 10:05 AM Nandish Jayaram <njaya...@pivotal.io> wrote:
>
> Hi Chris,
>
> This looks like a bug in summary table creation. I don't think MADlib has any 
> test cases with JSON/JSONB types in input data.
> Based on the error message you have posted, it looks like the actual training 
> has completed, but when it's trying to write out a summary table, it is not 
> handling the dependent_varname correctly (I am guessing the quote around 
> (data->>y)::int is the issue). I have created a JIRA to track this issue, 
> please feel free to add more information in it if you'd like:
> https://issues.apache.org/jira/browse/MADLIB-1284
>
> I am not aware of a recipe or example in MADlib docs that uses JSON/JSONB. 
> The not so good work-around for this particular instance would probably be to 
> create a new table that includes all existing columns of the current input 
> table, and new columns that are created by expanding out the JSON blob (using 
> something like jsonb_populate_record() in 
> https://www.postgresql.org/docs/9.5/functions-json.html). Then the dependent 
> variable to use in linregr_train using this new table would be `'y'`, and the 
> independent variables would be `'x1,x2'`, since they are all columns in the 
> new table.
>
> NJ
>
> On Fri, Nov 16, 2018 at 10:15 PM Chris Verges <cver...@medallia.com> wrote:
>>
>> Hi MADLib users,
>>
>> I have a table that contains a JSONB field (Postgres 10.x) and am now 
>> looking to analyze all that rich data with MADLib.  Example query:
>>
>> SELECT madlib.linregr_train (
>>   'regr_example',         -- source table
>>   'regr_example_model',   -- output model table
>>   '(data->>''y'')::int',     -- dependent variable
>>   'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]'      -- independent 
>> variables
>> );
>>
>>
>> However, it looks like MADLib isn't liking using these fields when it comes 
>> to creating the temporary table:
>>
>> ERROR:  spiexceptions.SyntaxError: syntax error at or near "')::int'"
>> LINE 7:                     , '(data->>'y')::int'::varchar      as d...
>>                                          ^
>> QUERY:
>>             create table regr_example_model_summary as
>>                 select
>>                       'linregr'::varchar                  as method
>>                     , 'regr_example'::varchar           as source_table
>>                     , 'regr_example_model'::varchar              as out_table
>>                     , '(data->>'y')::int'::varchar      as dependent_varname
>>                     , 'ARRAY[1, (data->>'x1')::int, 
>> (data->>'x2')::int]'::varchar    as independent_varname
>>                     , 0::integer       as num_rows_processed
>>                     , 4::integer         as num_missing_rows_skipped
>>                     , NULL::text                as grouping_col
>>
>> CONTEXT:  Traceback (most recent call last):
>>   PL/Python function "linregr_train", line 20, in <module>
>>     return linear.linregr_train(**globals())
>>   PL/Python function "linregr_train", line 146, in linregr_train
>> PL/Python function "linregr_train"
>>
>>
>> Are there recipes or examples of using JSONB with MADLib?  Is this a known 
>> limit?
>>
>> Thanks for the guidance,
>> Chris

Reply via email to