RE: Unable to connect to S3 parquet data using Drill

Trang Nguyen Wed, 20 Apr 2016 15:51:45 -0700

Hi Jason,

I am seeing the same error. The files are stored in parquet using native s3n 
format and we don't have issues to read the data via SparkSQL.
However, one difference is that with SparkSQL, I can drill down to the folder I 
need, whereas in Drill, I need to configure the storage plugin to point to the 
top level bucket, which has subfolders that could be causing the issue 
depending on how Drill is scanning it.
Is there a workaround to configure only a subfolder in Drill?
I've tried changing the workspace to "/data" as well as further nested 
subfolders but I would get the error: 
        IllegalArgumentException: Can not create a Path from an empty string


Any advice would be appreciated.

Thanks,
Trang

-----Original Message-----
From: Nick Monetta 
Sent: Wednesday, April 20, 2016 3:19 PM
To: [email protected]; Trang Nguyen <[email protected]>
Cc: [email protected]
Subject: RE: Unable to connect to S3 parquet data using Drill

+ Trang - see Jason's comments below. 

Jason - I updated the core-site file to have all s3,s3n, and s3a configs.

Now....I'm getting an  "IllegalArgumentException: Can not create a Path from an 
empty string" error now from all my queries below. 


My core-site file is edited with s3, s3n, and s3a configs - each with 
corresponding plugins on the web interface. 

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value> </value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value> </value>
</property>

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value> </value>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value> </value>
</property>

<property>
  <name>fs.s3a.awsAccessKeyId</name>
  <value> </value>
</property>

<property>
  <name>fs.s3a.awsSecretAccessKey</name>
  <value> </value>
</property>


SELECT * FROM s3a. 
`/data/year=2016/month=04/day=01/part-r-00000-2e33bf90-9e25-41a9-9a71-8ab758295686.gz.parquet`
 LIMIT 3; SELECT * FROM s3n. 
`/data/year=2016/month=04/day=01/part-r-00000-2e33bf90-9e25-41a9-9a71-8ab758295686.gz.parquet`
 LIMIT 3; SELECT * FROM s3. 
`/data/year=2016/month=04/day=01/part-r-00000-2e33bf90-9e25-41a9-9a71-8ab758295686.gz.parquet`
 LIMIT 3; SELECT * FROM s3a. `/data/year=2016/month=04/day=01/` LIMIT 3;

RESPONSE: IllegalArgumentException: Can not create a Path from an empty string. 




Nick Monetta | INRIX |[email protected] |Movement Intelligence | www.inrix.com  | 
mobile +1 646-248-4105 |  
 

-----Original Message-----
From: Jason Altekruse [mailto:[email protected]]
Sent: Wednesday, April 20, 2016 5:43 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Unable to connect to S3 parquet data using Drill

Looking here it appears you need to set up an empty bucket to store a 
filesystem if you are going to use s3:// [1]. Are you trying to connect to a 
bucket you have populated with the normal S3 bucket APIs and not just the HDFS 
FileSystem API calls? Have you tried connecting instead with s3a? It looks like 
from this doc page that s3n and s3a are designed to connect to existing buckets 
filled with files, with s3a being a complete replacement for s3n.

I believe the error you are seeing means it cannot find the path "/". It is 
probably trying to look up the root of the filesystem wherever it puts metadata 
in the bucket (maybe a hidden file or something?) and it isn't finding it. This 
makes me think that your bucket isn't set up as it is expected to be for a 
connection using s3://.

[1] - https://wiki.apache.org/hadoop/AmazonS3

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Apr 20, 2016 at 2:34 PM, Nick Monetta <[email protected]> wrote:

> Thanks for the quick responses!
>
> I'm using drill 1.4.  I think I may have sorted out my S3 connections 
> issues, but I'm not sure because I'm having trouble executing a query:
>
>
> My s3 connection (named "s3"):
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3://inrixprod-tapp/",
>   "workspaces": {
>     "root": {
>       "location": "/",
>       "writable": false,
>       "defaultInputFormat": null
>     }
>
> Query:
> SELECT * FROM
> s3.`data/year=2016/month=02/day=28/part-r-00000-f2b42e00-ff01-4d82-84e
> 3-c75aafa007ae.gz.parquet`
> LIMIT 3;
>
>  Response:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IOException: / doesn't exist [Error Id:
> 9e076a2b-c4fa-4020-af2e-4d43c2e9588c on 
> NickM-LPT02.inrix.corpnet.local:31010]
>
>
>
> Nick Monetta | INRIX |[email protected] |Movement Intelligence | 
> www.inrix.com  | mobile +1 646-248-4105 |
>
>
> -----Original Message-----
> From: Jason Altekruse [mailto:[email protected]]
> Sent: Wednesday, April 20, 2016 4:45 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: Unable to connect to S3 parquet data using Drill
>
> Which version of Drill are you running? The config block for adding 
> your credentials was added in a recent release, I believe 1.5.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Wed, Apr 20, 2016 at 1:38 PM, Nick Monetta <[email protected]> wrote:
>
> > Copying and pasting your JSON directly into a new configuration gets 
> > me “Error (invalid JSON Mapping)”.
> >
> >
> >
> > What am I doing wrong?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Nick Monetta | INRIX |[email protected] |Movement Intelligence | 
> > www.inrix.com  | mobile +1 646-248-4105 |
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:[email protected]]
> > Sent: Wednesday, April 20, 2016 4:27 PM
> > To: [email protected]
> > Cc: [email protected]
> > Subject: Re: Unable to connect to S3 parquet data using Drill
> >
> >
> >
> > {
> >
> >   "type": "file",
> >
> >   "enabled": true,
> >
> >   "connection": "s3a://PATH.TO.BUCKET/",
> >
> >   "config": {
> >
> >     "fs.s3a.access.key": "<YOUR ACCESS KEY HERE>",
> >
> >     "fs.s3a.secret.key": "<YOUR SECRET KEY HERE>"
> >
> >   },
> >
> >   "workspaces": {
> >
> >     "root": {
> >
> >       "location": "/",
> >
> >       "writable": false,
> >
> >       "defaultInputFormat": null
> >
> >     },
> >
> >     "tmp": {
> >
> >       "location": "/tmp",
> >
> >       "writable": true,
> >
> >       "defaultInputFormat": null
> >
> >     }
> >
> >   },
> >
> >   "formats": {
> >
> >     "psv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "tbl"
> >
> >       ],
> >
> >       "delimiter": "|"
> >
> >     },
> >
> >     "csv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "csv"
> >
> >       ],
> >
> >       "delimiter": ","
> >
> >     },
> >
> >     "tsv": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "tsv"
> >
> >       ],
> >
> >       "delimiter": "\t"
> >
> >     },
> >
> >     "parquet": {
> >
> >       "type": "parquet"
> >
> >     },
> >
> >     "json": {
> >
> >       "type": "json",
> >
> >       "extensions": [
> >
> >         "json"
> >
> >       ]
> >
> >     },
> >
> >     "avro": {
> >
> >       "type": "avro"
> >
> >     },
> >
> >     "sequencefile": {
> >
> >       "type": "sequencefile",
> >
> >       "extensions": [
> >
> >         "seq"
> >
> >       ]
> >
> >     },
> >
> >     "csvh": {
> >
> >       "type": "text",
> >
> >       "extensions": [
> >
> >         "csvh"
> >
> >       ],
> >
> >       "extractHeader": true,
> >
> >       "delimiter": ","
> >
> >     }
> >
> >   }
> >
> > }
> >
> >
> >
> > Jason Altekruse
> >
> > Software Engineer at Dremio
> >
> > Apache Drill Committer
> >
> >
> >
> > On Wed, Apr 20, 2016 at 1:24 PM, Nick Monetta <[email protected]> wrote:
> >
> >
> >
> > > Can you send me the full JSON for the new config example you provided?
> >
> > > I keep getting JSON errors.
> >
> > >
> >
> > >
> >
> > > Nick Monetta | INRIX |[email protected] |Movement Intelligence |
> >
> > > www.inrix.com  | mobile +1 646-248-4105 |
> >
> > >
> >
> > >
> >
> > > -----Original Message-----
> >
> > > From: Abhishek Girish [mailto:[email protected]
> > <[email protected]>]
> >
> > > Sent: Wednesday, April 20, 2016 12:57 PM
> >
> > > To: user <[email protected]>
> >
> > > Subject: Re: Unable to connect to S3 parquet data using Drill
> >
> > >
> >
> > > Hey Trang,
> >
> > >
> >
> > > A similar issue related to S3 config was discussed today on the
> >
> > > mailing list [1]. Can you see if that helps resolve the issue?
> >
> > >
> >
> > > [1]
> >
> > >
> >
> > > http://mail-archives.apache.org/mod_mbox/drill-dev/201604.mbox/%3C
> > > CA
> > > N6
> >
> > > ttnukzsAKgQE-RTF0RNCvBr1uWsB9SaxnS_7y-v0yBdUj%3Dw%40mail.gmail.com
> > > %3
> > > E
> >
> > >
> >
> > >
> >
> > > -Abhishek
> >
> > >
> >
> > > On Tue, Apr 19, 2016 at 6:38 PM, Trang Nguyen 
> > > <[email protected]>
> >
> > > wrote:
> >
> > >
> >
> > > > Hi,
> >
> > > >
> >
> > > > I am having trouble to connect to an Amazon S3 bucket containing
> >
> > > > parquet files.
> >
> > > > I followed the instructions on
> >
> > > > https://drill.apache.org/docs/s3-storage-plugin/ to download
> >
> > > > jets3_0.9.3 on my Ubuntu VM.
> >
> > > > My storage configs:
> >
> > > > {
> >
> > > >   "type": "file",
> >
> > > >   "enabled": true,
> >
> > > >   "connection": "s3://inrixprod-tapp",
> >
> > > >   "config": null,
> >
> > > >   "workspaces": {
> >
> > > >     "root": {
> >
> > > >       "location": "/",
> >
> > > >       "writable": false,
> >
> > > >       "defaultInputFormat": null
> >
> > > >     },
> >
> > > >     "tmp": {
> >
> > > >       "location": "/tmp",
> >
> > > >       "writable": true,
> >
> > > >       "defaultInputFormat": null
> >
> > > >     }
> >
> > > >   },
> >
> > > > ...
> >
> > > > }
> >
> > > >
> >
> > > > I've started the embedded-drill instance but get the following 
> > > > error
> >
> > > > trying to connect:
> >
> > > > 0: jdbc:drill:zk=local> use s3-trips.`root`;
> >
> > > > Error: SYSTEM ERROR: IOException: / doesn't exist
> >
> > > >
> >
> > > >
> >
> > > > [Error Id: 081c66e6-177d-48fa-8eca-4ee1370ae785 on
> >
> > > > ubuntu-VirtualBox:31010] (state=,code=0)
> >
> > > >
> >
> > > > Any advice would be appreciated!
> >
> > > >
> >
> > > > Thanks,
> >
> > > > Trang
> >
> > > >
> >
> > >
> >
>

RE: Unable to connect to S3 parquet data using Drill

Reply via email to