Hi Daniel,
Looks like it's data schema change related issue. You should be getting this error with data uncompressed as well. Check for any schema change with json data structure and see if setting below property helps (Experimental feature as per doc). ALTER SESSION SET `exec.enable_union_type` = true; Below links may be helpful. http://drill.apache.org/docs/json-data-model/#limitations-and-workarounds https://issues.apache.org/jira/browse/DRILL-4520 <https://issues.apache.org/jira/browse/DRILL-4520>Thanks, Arjun ________________________________ From: Daniel McQuillen <[email protected]> Sent: Friday, October 20, 2017 2:27 PM To: [email protected] Subject: Re: S3 with mixed files Hi Arjun, Yes! Thanks. I didn't have my "log" storage plugin defined correctly (It was missing the "extensions" key set to value "log".) However, when I try to query a file like abc.log.gz select * from ibios3.root.`/tracking/abc.log.gz`; I get a different error org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. Fragment 0:0 [Error Id: 33dedb5f-2e3d-4e54-a918-0ad3553436ce on ip-10-0-0-24.us-west-1.compute.internal:31010] I've followed the docs and have my storage plugin defined as: "log": { "type": "json", "extensions": [ "gz" ] }, I also tried (thinking maybe I'm misreading the docs and .gz support is built it)... "log": { "type": "json", "extensions": [ "log" ] }, and "log": { "type": "json", "extensions": [ "log", "gz" ] }, with no luck. Thanks for any further direction you can provide! Best Regards, Daniel On Fri, Oct 20, 2017 at 6:52 PM, Arjun kr <[email protected]> wrote: > Hi Daniel, > > This error may occur if you don't have format defined in S3 storage plugin > that handles ".log" extension. > > For eg: > > -- I have file input.csv and have csv format defined in s3 storage plugin. > > 2 rows selected (1.233 seconds) > 0: jdbc:drill:schema=dfs> select * from s3.root.`test-dir/input.csv`; > +--------------------------------------------------+ > | columns | > +--------------------------------------------------+ > | ["\"Pespsi,Pepsi\",\"Pespsi,Pepsi [100.00]",""] | > | ["Pespsi,Pepsi\",\"Pespsi,Pepsi [100.00]",""] | > | ["Pespsi,Pepsi","Pespsi,Pepsi [100.00]"] | > +--------------------------------------------------+ > 3 rows selected (3.418 seconds) > > -- Renamed S3 file input.csv to input.log > > 0: jdbc:drill:schema=dfs> select * from s3.root.`test-dir/input.log`; > Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 16: > Table 's3.root.test-dir/input.log' not found > > SQL Query null > > [Error Id: 5996db7d-c886-45a8-bddf-99f11159db66 on arjun-lab-73:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs> > > Thanks, > > Arjun > > > ________________________________ > From: Divya Gehlot <[email protected]> > Sent: Friday, October 20, 2017 12:50 PM > To: [email protected] > Subject: Re: S3 with mixed files > > Hi Daniel, > Can you try select * from ibios3.root.`./tracking/tracking.log`; > instead of > select * from ibios3.root.`tracking/tracking.log`; > > Thanks, > Divya > > > On 20 October 2017 at 13:13, Daniel McQuillen <[email protected]> > wrote: > > > Thanks for your help, Padma! > > > > Just tried the following, per your suggestion: > > > > select * from ibios3.root.`tracking/tracking.log`; > > > > Still getting an error (although as I mentioned before I can do a 'show > > files;' ok so the credentials must be working): > > > > "org.apache.drill.common.exceptions.UserRemoteException: VALIDATION > > ERROR: > > From line 1, column 15 to line 1, column 20: Table > > 'ibios3.root.tracking/tracking.log' not found SQL Query null [Error Id: > > fbd59cf8-d6ec-4022-b682-9b51d33f8302 on > > ip-10-0-0-24.us-west-1.compute.internal:31010] > > > > > > I tried from both the embedded command line and the web interface. Do you > > have any other suggestions? Thanks in advance. > > > > Best Regards, > > > > Daniel > > > > > > > > On Fri, Oct 20, 2017 at 12:25 PM, Padma Penumarthy <[email protected] > > > > wrote: > > > > > From your error log, it seems like you may be specifying the table > > > incorrectly. > > > Instead of 'ibios3.root.tracking/tracking.log’, can you try > > > ibios3.root.`tracking/tracking.log` > > > > > > i.e. for example, select * from ibios3.root.`tracking/tracking.log` > > > > > > Thanks > > > Padma > > > > > > > > > > On Oct 18, 2017, at 7:15 PM, Daniel McQuillen < > > > [email protected]> wrote: > > > > > > > > Hi, > > > > > > > > Attempting to use Apache Drill to parse Open edX tracking log files I > > > have > > > > stored on S3. > > > > > > > > I've successfully set up an S3 connection and I can see my different > > > > directories in the target S3 bucket when I type `show files;` in > > embedded > > > > drill. Hooray! > > > > > > > > However, I can't seem to do a query. I keep getting a "not found" > error > > > > > > > > SEVERE: org.apache.calcite.runtime.CalciteContextException: From > line > > 1, > > > > column 15 to line 1, column 20: Table 'ibios3.root.tracking/ > > > tracking.log' > > > > not found > > > > > > > > The "tracking" subdirectory has a most recent `tracking.log` file as > > well > > > > as a bunch of gzipped older files, e.g. `tracking-log-20170518-1234. > > gz` > > > > ... could this be confusing Drill? I've tried querying an individual > > file > > > > (tracking.log) as well as the directory itself, but not luck. > > > > > > > > Thanks for any thoughts! > > > > > > > > > > > > - Daniel > > > > > > > > >
