2016-12-26 16:22 GMT+01:00 Bob Rudis <[email protected]>: > I guess I read more into "ANSI SQL" than I should have. While it does > handle some complex SQL generated by dplyr (which you can see at > https://github.com/hrbrmstr/sergeant if you scroll down) there are > instances when if, say you try to filter and left join two different > tables it generates valid SQL that Drill dislikes. You can see the > dplyr that fails here: http://rpubs.com/hrbrmstr/drillex > > dplyr generates the following SQL: > > SELECT * > FROM ( SELECT * > FROM ( SELECT * > FROM dfs.tmp.`/in/c.parquet` > WHERE ( cyl = 6.0 ) ) xpintqtlzw > LEFT JOIN dfs.tmp.`/in/b.json` > USING ( car ) ) fyxhnlrrca > LIMIT 10 > > The error message (if one doesn't want to go to the rpubs URL) is: > > SYSTEM ERROR: DrillRuntimeException: Join only supports implicit > casts between > 1. Numeric data > 2. Varchar, Varbinary data > 3. Date, Timestamp data Left type: INT, Right type: VARCHAR. > Add explicit casts to avoid this error > > "car" is the same in both data sets as it's just different column > slices of the seminal mtcars data set but with a common `car` column. > It fails if both files are the same file format. > > Also, when attempting to make a "data frame to parquet" function which > sends a CTAS query with > 1000 (VALUES((1,2,3,...)) rows Drill tosses > stack exceptions and hangs (in the embedded console, localhost console > and REST POST query). Said operation is the only thing that really > makes sense for such a convenience function since one could just > `jsonlite::stream_out(...)` a data frame and CTAS from the console > otherwise (an R wrapper for that makes little sense IMO). The SparkR > `write.parquet()` actually calls a Java function to do the conversion. > Note that this isn't a complaint, Drill is at heart an EDA/BI > workhorse and while it provides the data format transformation > functions such a use of VALUES is a bit extreme. >
In the meantime, is the "VALUES syntax" a good way ? I guess the planner needs to read all the SQL before writing data. Maybe data-frame -> CSV -> CTAS FROM csv is a better way in any cases ? > > On Sun, Dec 25, 2016 at 2:40 PM, Ted Dunning <[email protected]> > wrote: > > On Sun, Dec 25, 2016 at 11:04 AM, Bob Rudis <[email protected]> wrote: > > > >> Drill SQL seems to not be as "standard" as it claims (some > >> operations that are valid SQL after a series of dplyr transforms shld > >> work but do not). > >> > > > > Can you elaborate? What operations? > > > > Also, Drill claims that it is as compatible as possible where it > implements > > some function. It doesn't claim to be complete. >
