Thanks Andries, unfortunately NFS is not supported. This is an object storage system that can be accessed via s3 or swift api or straight http. I was not able to get the s3 API to work as it is not 100% AWS compatible, Also quite a few people offer data via published s3 buckets that are accessed via https without authentication (e.g. this data is used in many examples around the web http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) and it seems like a great idea to bring up a drill box in AWS to load this data directly. I don't thing the s3 API will work here if I don't have any credentials ?
I found this but my lack of drill experience or java coding let's me hesitate when I see a project that has only 4 commits https://github.com/kevinlynx/drill-storage-http On Mon, Oct 10, 2016 at 10:49 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Can you do a NFS connection to the webserver? > > Then maybe just use a local fs storage plugin with the NFS mount as the > workspace. > > I have not tried it myself, but it may be an option to test in your case. > > --Andries > > > > On Oct 7, 2016, at 11:39 AM, Di Pe <dip...@gmail.com> wrote: > > > > Hi, > > > > I have a couple of 100 csv files on a web server that I can just pull > down > > via https without any credentials, I wonder how I can write a storage > > plugin for drill that pull these files directly from the web web server > > without having to download them to the local file system. > > > > I have a couple of options: > > > > 1) the plugin could just do to a simple http directory listing to get > these > > files > > 2) I could provide a text file with the urls of the files, simply like > > https://mywebserver.com/myfolder/myfile1.csv > > https://mywebserver.com/myfolder/myfile2.csv > > 3) the web server supports json file listing like this > > curl -s https://mywebserver.com/myfolder?format=json | python -m > > json.tool > > [ > > { > > "hash": "e5f62378c79ec9c491aa130374dba93b", > > "last_modified": "2016-09-30T19:15:45.730950", > > "bytes": 211169, > > "name": "myfile1.csv", > > "content_type": "text/csv" > > }, > > { > > > > Option 3 would be the most elegant to me > > > > > > does something like this already exist or would I duplicate the s3 plugin > > and modify it? > > > > like this ? > > > > Thanks for your help! > > dipe > > > > > > { > > "type": "file", > > "enabled": true, > > "connection": "https://mywebserver.com/myfolder?format=json", > > "config": null, > > "workspaces": { > > "root": { > > "location": "/", > > "writable": false, > > "defaultInputFormat": null > > }, > > "tmp": { > > "location": "/tmp", > > "writable": true, > > "defaultInputFormat": null > > } > > }, > > "formats": { > > "psv": { > > "type": "text", > > "extensions": [ > > "tbl" > > ], > > "delimiter": "|" > > }, > > "csv": { > > "type": "text", > > "extensions": [ > > "csv" > > ], > > "delimiter": "," > > }, > > "tsv": { > > "type": "text", > > "extensions": [ > > "tsv" > > ], > > "delimiter": "\t" > > }, > > "parquet": { > > "type": "parquet" > > }, > > "json": { > > "type": "json", > > "extensions": [ > > "json" > > ] > > }, > > "avro": { > > "type": "avro" > > }, > > "sequencefile": { > > "type": "sequencefile", > > "extensions": [ > > "seq" > > ] > > }, > > "csvh": { > > "type": "text", > > "extensions": [ > > "csvh" > > ], > > "extractHeader": true, > > "delimiter": "," > > } > > } > > } > >