Just as a bit of explanation for anyone who finds the thread, what is happening here is that the csv parser will read files with no commas in them as a series of records with one value each. This method is a bit of a clever hack, but it will not work if any of your values have commas in them. It is possible to configure the Drill delimited text reader to set a less likely character as the delimiter. This can be done in the web interface, more info available here http://drill.apache.org/docs/storage-plugin-configuration/.
These files would best be converted to csv or json by an external tool for processing in Drill. It would also be possible to write a new storage plugin to read the files in their current form. One other small point, where I have seen fixed width files before, the values are generally separated by spaces (as well as padded by extra spaces where the values are too small). You can see that this is likely the case with the example provided with the lengths given above (field1 = characters 1-12, field2 = characters 13-22 etc.) So the example should probably be written with the data as shown below and appropriately adjusted bounds on the calls to the `right` function. With the nature of the format you would also likely need a call to the `trim` function before you try to cast the values to numeric types. Again, this is just a workaround, but if you configure the text reader to use a delimiter you are certain does not appear in the data it should be a robust solution. aaaaa bbbbbbbbbb 421.5 cc - Jason On Mon, Apr 20, 2015 at 4:37 PM, Abhishek Girish <[email protected]> wrote: > Hey, > > I just tried out something. Not sure if there is a better way than this. > See if this works for you: > > Assuming the text file contains the following line: > > $cat /tmp/drill/abc.csv > aaaaabbbbbbbbbb421.5cc > > (field1=aaaaa [chars 1-5], field2=bbbbbbbbbb [chars 6-15], field3=42 > [digits 16-17], field4=1.5 [digits 18-20], ... ) > > A query similar to the one shown below can work: > > > select > cast(columns[0] as char(5)), > `right`(cast(columns[0] as char(15)),10), > cast(`right`(cast(columns[0] as char(17)),2) as int), > cast(`right`(cast(columns[0] as char(20)),3) as double) > from dfs.tmp.`/drill/abc.csv`; > > +------------+------------+------------+------------+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | aaaaa | bbbbbbbbbb | 42 | 1.5 | > +------------+------------+------------+------------+ > > Regards, > Abhishek > On Mon, Apr 20, 2015 at 3:53 PM, Yousef Lasi <[email protected]> > wrote: > > > Does anyone have any suggestions on querying fixed-length data files with > > Drill? These are files that are received from a mainframe source and the > > fields within a row are defined by length. For example, field1 = > characters > > 1-12, field2 = characters 13-22 etc. > > > > Thanks > > >
