Re: Unable to connect to S3 parquet data using Drill

Jason Altekruse Wed, 20 Apr 2016 14:44:15 -0700

Looking here it appears you need to set up an empty bucket to store a
filesystem if you are going to use s3:// [1]. Are you trying to connect to
a bucket you have populated with the normal S3 bucket APIs and not just the
HDFS FileSystem API calls? Have you tried connecting instead with s3a? It
looks like from this doc page that s3n and s3a are designed to connect to
existing buckets filled with files, with s3a being a complete replacement
for s3n.


I believe the error you are seeing means it cannot find the path "/". It is
probably trying to look up the root of the filesystem wherever it puts
metadata in the bucket (maybe a hidden file or something?) and it isn't
finding it. This makes me think that your bucket isn't set up as it is
expected to be for a connection using s3://.

[1] - https://wiki.apache.org/hadoop/AmazonS3

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 2:34 PM, Nick Monetta <[email protected]> wrote:

> Thanks for the quick responses!
>
> I'm using drill 1.4.  I think I may have sorted out my S3 connections
> issues, but I'm not sure because I'm having trouble executing a query:
>
>
> My s3 connection (named "s3"):
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3://inrixprod-tapp/",
>   "workspaces": {
>     "root": {
>       "location": "/",
>       "writable": false,
>       "defaultInputFormat": null
>     }
>
> Query:
> SELECT * FROM
> s3.`data/year=2016/month=02/day=28/part-r-00000-f2b42e00-ff01-4d82-84e3-c75aafa007ae.gz.parquet`
> LIMIT 3;
>
>  Response:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IOException: / doesn't exist [Error Id:
> 9e076a2b-c4fa-4020-af2e-4d43c2e9588c on
> NickM-LPT02.inrix.corpnet.local:31010]
>
>
>
> Nick Monetta | INRIX |[email protected] |Movement Intelligence |
> www.inrix.com  | mobile +1 646-248-4105 |
>
>
> -----Original Message-----
> From: Jason Altekruse [mailto:[email protected]]
> Sent: Wednesday, April 20, 2016 4:45 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: Unable to connect to S3 parquet data using Drill
>
> Which version of Drill are you running? The config block for adding your
> credentials was added in a recent release, I believe 1.5.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Wed, Apr 20, 2016 at 1:38 PM, Nick Monetta <[email protected]> wrote:
>
> > Copying and pasting your JSON directly into a new configuration gets
> > me “Error (invalid JSON Mapping)”.
> >
> >
> >
> > What am I doing wrong?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Nick Monetta | INRIX |[email protected] |Movement Intelligence |
> > www.inrix.com  | mobile +1 646-248-4105 |
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:[email protected]]
> > Sent: Wednesday, April 20, 2016 4:27 PM
> > To: [email protected]
> > Cc: [email protected]
> > Subject: Re: Unable to connect to S3 parquet data using Drill
> >
> >
> >
> > {
> >
> >   "type": "file",
> >
> >   "enabled": true,
> >
> >   "connection": "s3a://PATH.TO.BUCKET/",
> >
> >   "config": {
> >
> >     "fs.s3a.access.key": "<YOUR ACCESS KEY HERE>",
> >
> >     "fs.s3a.secret.key": "<YOUR SECRET KEY HERE>"
> >
> >   },
> >
> >   "workspaces": {
> >
> >     "root": {
> >
> >       "location": "/",
> >
> >       "writable": false,
> >
> >       "defaultInputFormat": null
> >
> >     },
> >
> >     "tmp": {
> >
> >       "location": "/tmp",
> >
> >       "writable": true,
> >
> >       "defaultInputFormat": null
> >
> >     }
> >
> >   },
> >
> >   "formats": {
> >
> >     "psv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "tbl"
> >
> >       ],
> >
> >       "delimiter": "|"
> >
> >     },
> >
> >     "csv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "csv"
> >
> >       ],
> >
> >       "delimiter": ","
> >
> >     },
> >
> >     "tsv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "tsv"
> >
> >       ],
> >
> >       "delimiter": "\t"
> >
> >     },
> >
> >     "parquet": {
> >
> >       "type": "parquet"
> >
> >     },
> >
> >     "json": {
> >
> >       "type": "json",
> >
> >       "extensions": [
> >
> >         "json"
> >
> >       ]
> >
> >     },
> >
> >     "avro": {
> >
> >       "type": "avro"
> >
> >     },
> >
> >     "sequencefile": {
> >
> >       "type": "sequencefile",
> >
> >       "extensions": [
> >
> >         "seq"
> >
> >       ]
> >
> >     },
> >
> >     "csvh": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "csvh"
> >
> >       ],
> >
> >       "extractHeader": true,
> >
> >       "delimiter": ","
> >
> >     }
> >
> >   }
> >
> > }
> >
> >
> >
> > Jason Altekruse
> >
> > Software Engineer at Dremio
> >
> > Apache Drill Committer
> >
> >
> >
> > On Wed, Apr 20, 2016 at 1:24 PM, Nick Monetta <[email protected]> wrote:
> >
> >
> >
> > > Can you send me the full JSON for the new config example you provided?
> >
> > > I keep getting JSON errors.
> >
> > >
> >
> > >
> >
> > > Nick Monetta | INRIX |[email protected] |Movement Intelligence |
> >
> > > www.inrix.com  | mobile +1 646-248-4105 |
> >
> > >
> >
> > >
> >
> > > -----Original Message-----
> >
> > > From: Abhishek Girish [mailto:[email protected]
> > <[email protected]>]
> >
> > > Sent: Wednesday, April 20, 2016 12:57 PM
> >
> > > To: user <[email protected]>
> >
> > > Subject: Re: Unable to connect to S3 parquet data using Drill
> >
> > >
> >
> > > Hey Trang,
> >
> > >
> >
> > > A similar issue related to S3 config was discussed today on the
> >
> > > mailing list [1]. Can you see if that helps resolve the issue?
> >
> > >
> >
> > > [1]
> >
> > >
> >
> > > http://mail-archives.apache.org/mod_mbox/drill-dev/201604.mbox/%3CCA
> > > N6
> >
> > > ttnukzsAKgQE-RTF0RNCvBr1uWsB9SaxnS_7y-v0yBdUj%3Dw%40mail.gmail.com%3
> > > E
> >
> > >
> >
> > >
> >
> > > -Abhishek
> >
> > >
> >
> > > On Tue, Apr 19, 2016 at 6:38 PM, Trang Nguyen
> > > <[email protected]>
> >
> > > wrote:
> >
> > >
> >
> > > > Hi,
> >
> > > >
> >
> > > > I am having trouble to connect to an Amazon S3 bucket containing
> >
> > > > parquet files.
> >
> > > > I followed the instructions on
> >
> > > > https://drill.apache.org/docs/s3-storage-plugin/ to download
> >
> > > > jets3_0.9.3 on my Ubuntu VM.
> >
> > > > My storage configs:
> >
> > > > {
> >
> > > >   "type": "file",
> >
> > > >   "enabled": true,
> >
> > > >   "connection": "s3://inrixprod-tapp",
> >
> > > >   "config": null,
> >
> > > >   "workspaces": {
> >
> > > >     "root": {
> >
> > > >       "location": "/",
> >
> > > >       "writable": false,
> >
> > > >       "defaultInputFormat": null
> >
> > > >     },
> >
> > > >     "tmp": {
> >
> > > >       "location": "/tmp",
> >
> > > >       "writable": true,
> >
> > > >       "defaultInputFormat": null
> >
> > > >     }
> >
> > > >   },
> >
> > > > ...
> >
> > > > }
> >
> > > >
> >
> > > > I've started the embedded-drill instance but get the following
> > > > error
> >
> > > > trying to connect:
> >
> > > > 0: jdbc:drill:zk=local> use s3-trips.`root`;
> >
> > > > Error: SYSTEM ERROR: IOException: / doesn't exist
> >
> > > >
> >
> > > >
> >
> > > > [Error Id: 081c66e6-177d-48fa-8eca-4ee1370ae785 on
> >
> > > > ubuntu-VirtualBox:31010] (state=,code=0)
> >
> > > >
> >
> > > > Any advice would be appreciated!
> >
> > > >
> >
> > > > Thanks,
> >
> > > > Trang
> >
> > > >
> >
> > >
> >
>

Re: Unable to connect to S3 parquet data using Drill

Reply via email to