I would recommend to not use a count(*) but rather pick a column to use for count.
The missing function implementation is interesting though, would be good to understand exactly what that means. —Andries On Mar 27, 2015, at 10:49 AM, Christopher Matta <[email protected]> wrote: > I’m using the yelp academic data set <https://www.yelp.com/academic_dataset> > for my testing purposes. > > Say I want to list all businesses in the “Pets” category, this query > doesn’t work: > > 0: jdbc:drill:zk=sen11:5181,sen12:5181> select `name`, > flatten(`categories`) from > maprfs.cmatta.`yelp/yelp_academic_dataset_business.json` WHERE > flatten(`categories`) = 'Pets' limit 10; > Query failed: Query stopped., Failure while trying to materialize > incoming schema. Errors: > > Error in expression at index -1. Error: Missing function > implementation: [flatten(VARCHAR-REPEATED)]. Full expression: > --UNKNOWN EXPRESSION--.. [ daf9f62a-f39b-4745-a0e3-0160875f5cb7 on > se-node13.se.lab:31010 ] > > Error: exception while executing query: Failure while executing query. > (state=,code=0) > > But this query with a sub-select does work: > > 0: jdbc:drill:zk=sen11:5181,sen12:5181> select flat.`name`, > flat.`categories` from (select `name`, flatten(`categories`) as > categories from > maprfs.cmatta.`yelp/yelp_academic_dataset_business.json`) as flat > WHERE lower(flat.`categories`) = 'pets' limit 10; > +------------+------------+ > | name | categories | > +------------+------------+ > | Loving Hands Pet Care | Pets | > | Amec Mid-City Animal Hospital | Pets | > | PetSmart | Pets | > | A Dog's Life Photography | Pets | > | Goober Pet Direct | Pets | > | Pet Planet | Pets | > | All-Star Animal Hospital | Pets | > | Team Canine, Inc | Pets | > | Foothills Pet Resort | Pets | > | Petco | Pets | > +------------+------------+ > 10 rows selected (0.141 seconds) > > Now that I can pick out which businesses are in the “Pets” category, I may > want to get the average star rating and order them by the number of > reviews. However, using the above technique in an aggregation query doesn’t > seem to work: > > 0: jdbc:drill:zk=sen11:5181,sen12:5181> select > . . . . . . . . . . . . . . . . . . . > a.`name`, > . . . . . . . . . . . . . . . . . . . > count(b.`text`), > . . . . . . . . . . . . . . . . . . . > avg(b.`stars`) > . . . . . . . . . . . . . . . . . . . > FROM ( > . . . . . . . . . . . . . . . . . . . > SELECT `business_id`, > `name`, flatten(`categories`) as `category` > . . . . . . . . . . . . . . . . . . . > FROM > maprfs.`cmatta`.`yelp/yelp_academic_dataset_business.json`) a > . . . . . . . . . . . . . . . . . . . > JOIN > maprfs.`cmatta`.`yelp/yelp_academic_dataset_review.json` b > . . . . . . . . . . . . . . . . . . . > ON a.`business_id` = > b.`business_id` > . . . . . . . . . . . . . . . . . . . > WHERE a.`category` = 'Pets' > . . . . . . . . . . . . . . . . . . . > GROUP BY a.`name` > . . . . . . . . . . . . . . . . . . . > ORDER BY count(*) DESC > . . . . . . . . . . . . . . . . . . . > LIMIT 10; > Query failed: Query failed: Failure while running fragment., Failure > while trying to materialize incoming schema. Errors: > > Error in expression at index 2. Error: Missing function > implementation: [flatten(VARCHAR-REPEATED)]. Full expression: null.. > [ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ] > [ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ] > > Error: exception while executing query: Failure while executing query. > (state=,code=0) > > What does *Error: Missing function implementation: > [flatten(VARCHAR-REPEATED)]* in this context? Am I going about this the > wrong way? > > Chris Matta > [email protected] > 215-701-3146 >
