Don't know how big your data is, as this won't be the most efficient solution, but there is a hack that can enable this without modifying Drill.
Create a new format that actually won't parse anything but the lines, put in a delimiter that doesn't appear in your dataset. "not_csv": { "type": "text", "extensions": [ "not_csv" ], "delimiter": "^" }, Then you can just use string replace to make this look like JSON and pass it into our convert_from( field, 'JSON') function. select convert_from(concat(concat('["' , `replace`(`replace`(columns[0], ',','","'), ';', '","')),'"]'),'JSON') from dfs.`/tmp/mixed.not_csv`; On Fri, Feb 19, 2016 at 7:56 AM, Magnus Pierre <mpie...@maprtech.com> wrote: > Jacques, > > My code is based on easyformat stuff (if remember correctly) public class > XMLFormatPlugin extends EasyFormatPlugin<XMLFormatConfig>, but I wouldn’t > for the world be able to explain how it works :) > > Regards, > /Magnus > > > --------------- > Magnus Pierre | Systems Engineer > +46 72 710 1935 > mpie...@maprtech.com | http://www.mapr.com > > > > > 19 feb 2016 kl. 16:50 skrev Jacques Nadeau <jacq...@dremio.com>: > > > > My suggestion is to actually focus on creating an EasyFormatPlugin [1] > > rather than a StoragePlugin (they are easier to do and are only a few > > classes). Maybe a RegexSplitFormatPlugin? You should be able to use the > > table-with-options syntax to state what pattern to split records on. > > > > [1] > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java > < > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <mpie...@maprtech.com > <mailto:mpie...@maprtech.com>> wrote: > > > >> Hello Scott, > >> > >> I don’t know of any documentation with regards to writing storage > plugins > >> but I’ve been able to so that means it is not too hard :) Here you have > an > >> example of a storage plugin that extends in my case the JSON and > basically > >> fits in a homegrown SAX parser so I could convert XML to JSON and then > feed > >> it into the normal JSON storage plugin. Not very pretty but it works. > >> > >> In your case I would believe you have to extend the same amount of > classes > >> that I did. Please observe that I was lazy so I went in and changed one > >> thing in the reader class I extended to make it easier for me. Of course > >> this is not how would do it in real life :) > >> > >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ > < > >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ > <https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>> > >> > >> In case of any troubles I am sure the community will assist. > >> > >> Regards, > >> Magnus > >> --------------- > >> Magnus Pierre | Systems Engineer > >> +46 72 710 1935 > >> mpie...@maprtech.com | http://www.mapr.com > >> > >> > >> > >>> 19 feb 2016 kl. 16:29 skrev Wilburn, Scott > >> <scott.wilb...@verizonwireless.com.INVALID>: > >>> > >>> Magnus, > >>> Great suggestions. I would like to try to extend the functionality of > >> the "text" format type, used for csv files today. Do you know if there > are > >> any published instructions for this process? > >>> > >>> Thanks, > >>> Scott Wilburn > >>> > >>> > >>> -----Original Message----- > >>> From: Magnus Pierre [mailto:mpie...@maprtech.com] > >>> Sent: Friday, February 19, 2016 01:51 AM > >>> To: user@drill.apache.org > >>> Subject: Re: [E] Re: Multiple Delimiter Format > >>> > >>> Hello Scott, > >>> > >>> What you typically do is that you decide which separator will give you > >> the most granular split (in your case comma) and then use SQL > constructs to > >> further transform the returned columns of the set to the structure you > >> would like to have. In SQL you can always create additional fields. I > >> understand that is is probably the single thing you would like to avoid. > >> Alternatively you create your own plugin that takes a set of separators > as > >> a parameter to the storage plugin, split the ingoing records based on > the > >> set and basically reuse all the other existing classes for the storage > >> plugin. I personally believe it should be quite easy to accomplished by > >> extending existing code (even though I have not looked at the code for a > >> considerable time now) > >>> > >>> I added very primitive XML support using the existing JSON classes and > >> extending the reader as needed. > >>> > >>> Regards, > >>> Magnus > >>> > >>>> 19 feb 2016 kl. 04:18 skrev Jim Scott <jsc...@maprtech.com>: > >>>> > >>>> The delimited file reader does not support that. > >>>> > >>>> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < > >>>> scott.wilb...@verizonwireless.com.invalid> wrote: > >>>> > >>>>> Jim, > >>>>> Just to clarify, I'm trying to use Drill on files that contain > >>>>> records where the fields are delimited by multiple different > >> characters. > >>>>> > >>>>> Example record: 10-20-16,4477,99;98,aab,99;66,aab > >>>>> > >>>>> Desired result: > >>>>> columns[0] = 10-20-16 > >>>>> columns[1] = 4477 > >>>>> columns[2] = 99 > >>>>> columns[3] = 98 > >>>>> columns[4] = aab > >>>>> columns[5] = 99 > >>>>> columns[6] = 66 > >>>>> columns[7] = aab > >>>>> > >>>>> In this example, the record contains 8 fields when split by comma and > >>>>> by semicolon. > >>>>> Is something like this possible? > >>>>> > >>>>> Thanks, > >>>>> Scott Wilburn > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Jim Scott [mailto:jsc...@maprtech.com] > >>>>> Sent: Thursday, February 18, 2016 03:46 PM > >>>>> To: user > >>>>> Subject: [E] Re: Multiple Delimiter Format > >>>>> > >>>>> Scott, > >>>>> > >>>>> You would need a format defined for each file type. e.g. csv has > >>>>> commas, tsv has tabs, so on > >>>>> > >>>>> If you are looking for multiple delimiters within the same file or > >>>>> potentially with a single file extension that isn't supported. > >>>>> > >>>>> Jim > >>>>> > >>>>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < > >>>>> scott.wilb...@verizonwireless.com.invalid> wrote: > >>>>> > >>>>>> Hello, > >>>>>> Is there a way to specify multiple delimiters when configuring a > >>>>>> storage plugin record format? For example, could I split records > >>>>>> into fields by comma or by semicolon characters. > >>>>>> > >>>>>> Thanks, > >>>>>> Scott Wilburn > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> *Jim Scott* > >>>>> Director, Enterprise Strategy & Architecture > >>>>> +1 (347) 746-9281 > >>>>> @kingmesal <https://twitter.com/kingmesal> > >>>>> > >>>>> <http://www.mapr.com/> > >>>>> [image: MapR Technologies] <http://www.mapr.com> > >>>>> > >>>>> Now Available - Free Hadoop On-Demand Training < > >>>>> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut > >>>>> m_campaign=Free%20available > >>>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> *Jim Scott* > >>>> Director, Enterprise Strategy & Architecture > >>>> +1 (347) 746-9281 > >>>> @kingmesal <https://twitter.com/kingmesal> > >>>> > >>>> <http://www.mapr.com/> > >>>> [image: MapR Technologies] <http://www.mapr.com> > >>>> > >>>> Now Available - Free Hadoop On-Demand Training > >>>> < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut > >>>> m_campaign=Free%20available> > >