Keep in mind the danger if testing Foo!=null. That doesn't work and catches me by surprise all the time. Foo is null and variants are what you need.
On Sat, Sep 14, 2019, 4:56 PM hanu mapr <[email protected]> wrote: > Hello Sebastian, > > By default Drill sets the field 'foo' to null for the files that don't > contain it. I am of the opinion that the condition where foo = 'bar' should > result in false for all those files which don't contain the field. > Please can you send across the queries which you have run and the observed > result. > > Just off the top of my head, some query like the below one might work > select file_name from dfs.`/bla/*/*` where foo != null. --- You might want > to remove duplicate entries. (of course this also results in the rows which > contain the field and are null). > > Hope this helps. > > Thanks > > > On Fri, Sep 13, 2019 at 10:53 PM Sebastian Fischmeister < > [email protected]> wrote: > > > Hi, > > > > When searching multiple directories, drill only searches fields that are > > common to all files (see the json data model). Is there a way to query a > > directory and list all files that contain a certain field? > > > > In other words, I would like to use the workaround in this way: > > > > select * from (select fqn from dfs.`/bla/*/*` where foo exists) where foo > > = 'bar' > > > > Or is there another way to do this? I dynamically get more files, so > > finding the files should be included in the query. > > > > An alternative would be to execute the query such that it sets the field > > 'foo' to null for all files that don't contain it. However, I don't know > > how to execute this. > > > > Thanks, > > Sebastian > > >
