Thanks Andries,

unfortunately NFS is not supported. This is an object storage system that
can be accessed via s3 or swift api or straight http. I was not able to get
the s3 API to work as it is not 100% AWS compatible, Also quite a few
people offer data via published s3 buckets that are accessed via https
without authentication (e.g. this data is used in many examples around the
web http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) and it
seems like a great idea to bring up a drill box in AWS to load this data
directly. I don't thing the s3 API will work here if I don't have any
credentials ?

I found this but my lack of drill experience or java coding let's me
hesitate when I see a project that has only 4 commits
https://github.com/kevinlynx/drill-storage-http

On Mon, Oct 10, 2016 at 10:49 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Can you do a NFS connection to the webserver?
>
> Then maybe just use a local fs storage plugin with the NFS mount as the
> workspace.
>
> I have not tried it myself, but it may be an option to test in your case.
>
> --Andries
>
>
> > On Oct 7, 2016, at 11:39 AM, Di Pe <dip...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have a couple of 100 csv files on a web server that I can just pull
> down
> > via https without any credentials, I wonder how I can write a storage
> > plugin for drill that pull these files directly from the web web server
> > without having to download them to the local file system.
> >
> > I have a couple of options:
> >
> > 1) the plugin could just do to a simple http directory listing to get
> these
> > files
> > 2) I could provide a text file with the urls of the files, simply like
> >    https://mywebserver.com/myfolder/myfile1.csv
> >    https://mywebserver.com/myfolder/myfile2.csv
> > 3) the web server supports json file listing like this
> >    curl -s https://mywebserver.com/myfolder?format=json | python -m
> > json.tool
> > [
> >    {
> >        "hash": "e5f62378c79ec9c491aa130374dba93b",
> >        "last_modified": "2016-09-30T19:15:45.730950",
> >        "bytes": 211169,
> >        "name": "myfile1.csv",
> >        "content_type": "text/csv"
> >    },
> >    {
> >
> > Option 3 would be the most elegant to me
> >
> >
> > does something like this already exist or would I duplicate the s3 plugin
> > and modify it?
> >
> > like this ?
> >
> > Thanks for your help!
> > dipe
> >
> >
> > {
> >  "type": "file",
> >  "enabled": true,
> >  "connection": "https://mywebserver.com/myfolder?format=json";,
> >  "config": null,
> >  "workspaces": {
> >    "root": {
> >      "location": "/",
> >      "writable": false,
> >      "defaultInputFormat": null
> >    },
> >    "tmp": {
> >      "location": "/tmp",
> >      "writable": true,
> >      "defaultInputFormat": null
> >    }
> >  },
> >  "formats": {
> >    "psv": {
> >      "type": "text",
> >      "extensions": [
> >        "tbl"
> >      ],
> >      "delimiter": "|"
> >    },
> >    "csv": {
> >      "type": "text",
> >      "extensions": [
> >        "csv"
> >      ],
> >      "delimiter": ","
> >    },
> >    "tsv": {
> >      "type": "text",
> >      "extensions": [
> >        "tsv"
> >      ],
> >      "delimiter": "\t"
> >    },
> >    "parquet": {
> >      "type": "parquet"
> >    },
> >    "json": {
> >      "type": "json",
> >      "extensions": [
> >        "json"
> >      ]
> >    },
> >    "avro": {
> >      "type": "avro"
> >    },
> >    "sequencefile": {
> >      "type": "sequencefile",
> >      "extensions": [
> >        "seq"
> >      ]
> >    },
> >    "csvh": {
> >      "type": "text",
> >      "extensions": [
> >        "csvh"
> >      ],
> >      "extractHeader": true,
> >      "delimiter": ","
> >    }
> >  }
> > }
>
>

Reply via email to