Hi,

I have a couple of 100 csv files on a web server that I can just pull down
via https without any credentials, I wonder how I can write a storage
plugin for drill that pull these files directly from the web web server
without having to download them to the local file system.

I have a couple of options:

1) the plugin could just do to a simple http directory listing to get these
files
2) I could provide a text file with the urls of the files, simply like
    https://mywebserver.com/myfolder/myfile1.csv
    https://mywebserver.com/myfolder/myfile2.csv
3) the web server supports json file listing like this
    curl -s https://mywebserver.com/myfolder?format=json | python -m
json.tool
[
    {
        "hash": "e5f62378c79ec9c491aa130374dba93b",
        "last_modified": "2016-09-30T19:15:45.730950",
        "bytes": 211169,
        "name": "myfile1.csv",
        "content_type": "text/csv"
    },
    {

Option 3 would be the most elegant to me


does something like this already exist or would I duplicate the s3 plugin
and modify it?

like this ?

Thanks for your help!
dipe


{
  "type": "file",
  "enabled": true,
  "connection": "https://mywebserver.com/myfolder?format=json";,
  "config": null,
  "workspaces": {
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null
    },
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json",
      "extensions": [
        "json"
      ]
    },
    "avro": {
      "type": "avro"
    },
    "sequencefile": {
      "type": "sequencefile",
      "extensions": [
        "seq"
      ]
    },
    "csvh": {
      "type": "text",
      "extensions": [
        "csvh"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
  }
}

Reply via email to