Yes that's correct. I saw the work you started with the env vars, but for now I set the proxy in the plugin.
- Rafael -----Original Message----- From: Charles Givre <[email protected]> Sent: Wednesday, April 1, 2020 12:42 PM To: [email protected] Subject: Re: REST data source? Hey Rafael, Thanks for the feedback. My original idea was to pull the proxy from the environment vars in HTTP_PROXY/HtTTPS_PROXY and ALL_PROXY but that part isn't quite done yet. Did you set the proxy info via the plugin config? -- C > On Apr 1, 2020, at 10:22 AM, Jaimes, Rafael - 0993 - MITLL <[email protected]> wrote: > > Hi all, > > I built Charles' latest branch including the proxy setup. It appears to be > working quite well going through the proxy. > > I'll continue to test and report back if I find any issues. > > Note: Beyond Paul's repo recommendations, I had to skip checkstyle to get the > maven build to complete. You're probably already aware of that, I think it's > just specific to this branch. > > Thanks! > Rafael > > -----Original Message----- > From: Paul Rogers <[email protected]> > Sent: Wednesday, April 1, 2020 1:29 AM > To: user <[email protected]> > Subject: Re: REST data source? > > Thanks, Charles. > > As Charles suggested, I pushed a commit that replaces the "old" JSON reader > with the new EVF-based one. Eventually this will allow us to use a "provided > schema" to handle any JSON ambiguities. > > As we've been discussing, I'll try to add the ability to specify a path to > data: "response/payload/records" or whatever. With the present commit, that > path can be parsed in code, but I think a simple path spec would be easier. > > Thanks, > - Paul > > > > On Tuesday, March 31, 2020, 10:00:52 PM PDT, Charles Givre > <[email protected]> wrote: > > Hello all, > I pushed some updates to the REST PR to include initial work on proxy > configuration. I haven't updated the docs yet (until this is finalized). It > adds new config variables as shown below: > > { > "type": "http", > "cacheResults": true, > "connections": {}, > "timeout": 0, > "proxyHost": null, > "proxyPort": 0, > "proxyType": null, > "proxyUsername": null, > "proxyPassword": null, > "enabled": true > } > I started on getting Drill to recognize the proxy info from the environment, > but haven't quite finished that. The plan is for the plugin config to > override environment vars. > Feedback is welcome. > > @paul-rogers, I think you can push to my branch (or submit a PR?) and that > will be included in the main PR. > -- C > > > >> On Mar 31, 2020, at 10:40 PM, Rafael Jaimes III <[email protected]> wrote: >> >> Yes your initial assessment was correct, there is extra material other >> than the data field. >> The returned JSON has some top-level fields that don't go any deeper, >> akin to your "status" : ok field. In the example I'm running now, one >> is called MessageState which is set to "NEW". There's another field >> called MessageData, which, obviously, holds most of the data. There >> are some other top-level fields, and one is called MessageHeader which >> is nested. There's a lot of stuff here, and this is just one "table" I'm >> querying against now. >> Not sure how it will differ with the other services. >> >> The service is definitely returning multiple records - I believe it's >> a JSON array and Drill+HTTP/plugin appears to handle it quite well. >> >> You're right, Drill is handling most of the structure by modifying my >> SELECT statement as you suggested. >> >> For filter pushdown, expressions of that form would be great. That's >> what I had in mind too. >> >> Thanks, >> Rafael >> >> On Tue, Mar 31, 2020 at 10:14 PM Paul Rogers >> <[email protected]> >> wrote: >> >>> Hi Rafael, >>> >>> Thanks much for the info. We had already implemented filter push-down >>> for other plugins, and for a few custom REST APIs, so should be >>> possible to port it over to the HTTP plugin. If you can supply code, >>> then you can convert filters to anything you want, a specialized JSON >>> request body, etc. >>> To do this generically, we have to make some assumptions, such as >>> either 1) all fields can be pushed as query parameters, or 2) only >>> those in some config list. Either way, we know how to create >>> name=value pairs in either a GET or POST format. >>> >>> You mentioned that your "payload" objects are structured. Drill can >>> already handle this; your query can map them to the top level: >>> >>> SELECT t.characteristic.color.name AS color_name, >>> t.characteristic.color.confidence AS color_confidence, ... FROM >>> yourTable AS t >>> >>> You'll get that "out of box." Drill does assume that data is in >>> "record >>> format": a single list of objects which represent records. Code would >>> be needed to handle, say, two separate lists of objects or other, >>> more-general, JSON structures. >>> >>> >>> My specific question was more around the response from your web service. >>> Does that have extra material besides just the data records? Something >>> like: >>> >>> >>> { "status": "ok", "data": [ {characteristic: ... }, {...}] } >>> >>> Or, is the response directly an array of objects: >>> >>> [ {characteristic: ... }, {...}] >>> >>> >>> If it is just an array, then the "out of the box" plugin will work. >>> If there is other stuff, then you'll need the new feature to tell >>> Drill how to find the field to your data. The present version needs >>> code, but I'm thinking we can just use an array of names in the plugin >>> config: >>> >>> dataPath: [ "data" ], >>> >>> Or, in your case, do you get a single record per HTTP request? If a >>> single record, then either your queries will be super-simple, or >>> performance will be horrible when requesting multiple records. (The >>> HTTP plugin only does one request and assumes it will get back a set >>> of records as a JSON array or as whitespace-separated JSON objects as >>> in a JSON file.) >>> >>> Can you clarify a bit which of these cases your data follows? >>> >>> I like your idea of optionally supplying a parser class for the "hard" >>> cases: >>> >>> messageParserClass: "com.mycompany.drill.MyMessageParser", >>> >>> As long as the class is on the classpath, Java will find it. >>> >>> Finally, on the filter push-down, the existing code we're thinking of >>> using can handle expressions of the form: >>> >>> column op constant >>> >>> Where "op" is one of the relational operators: =, !=, < etc. Also >>> handles the obvious variations (const op constant, column BETWEEN >>> const1 AND const2, column IN (const1, const2, ...)). >>> >>> The code cannot handle expressions (due to a limitation in Drill itself). >>> That is, this won't work as a filter push-down: col = 10 + 2 or col + >>> 2 = 10. Nor can it handle multi-column expressions: column1 = column2, etc. >>> >>> >>> I'll write up something more specific so you can see exactly what we >>> propose. >>> >>> >>> Thanks, >>> - Paul >>> >>> >>> >>> On Tuesday, March 31, 2020, 6:39:57 PM PDT, Rafael Jaimes III < >>> [email protected]> wrote: >>> >>> Either a text description of the parse path or specifying the class >>> with the message parser could work. >>> I think the latter would be better, if it were simple as dropping the >>> JAR in 3rdparty after Drill is already built. >>> That way we can just continually add parsers ad-hoc. >>> >>> An example JSON response includes about 4 top-level fields, then 2 of >>> those fields have many sub-fields. >>> For example a field could be nested 3 levels deep and say: >>> >>> Characteristic: >>> >>> Color: >>> >>> Color name: "Red" >>> >>> Confidence: 100 >>> >>> Physical: >>> >>> Size: 405 >>> >>> Confidence: 95 >>> >>> As you can imagine, it would be difficult to flatten this because of >>> repeated sub-field names like "Confidence". >>> >>> I don't think it would be easily exportable into a CSV. >>> At least for me pandas dataframe is the ultimate destination for all >>> of this, which also don't handle nested fields well either. >>> I'll have to handle some parsing on my end. >>> >>> Filter pushdown would be huge and much desired. >>> Our other end-users are accustomed to using SQL in that manner and >>> the REST API we use fully support AND, OR, BETWEEN, =, <, >, etc (I >>> can get a full list if you're interested). >>> For example I think "between" is a ",". Converting the SQL statement >>> into the URL format would be awesome and help streamline querying >>> across data sources. >>> This is one of the main reasons why we're so interested in Drill. >>> >>> >>> Thanks, >>> >>> Rafael >>> >
smime.p7s
Description: S/MIME cryptographic signature
