Jacques, My code is based on easyformat stuff (if remember correctly) public class XMLFormatPlugin extends EasyFormatPlugin<XMLFormatConfig>, but I wouldn’t for the world be able to explain how it works :)
Regards, /Magnus --------------- Magnus Pierre | Systems Engineer +46 72 710 1935 [email protected] | http://www.mapr.com > 19 feb 2016 kl. 16:50 skrev Jacques Nadeau <[email protected]>: > > My suggestion is to actually focus on creating an EasyFormatPlugin [1] > rather than a StoragePlugin (they are easier to do and are only a few > classes). Maybe a RegexSplitFormatPlugin? You should be able to use the > table-with-options syntax to state what pattern to split records on. > > [1] > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java > > <https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java> > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <[email protected] > <mailto:[email protected]>> wrote: > >> Hello Scott, >> >> I don’t know of any documentation with regards to writing storage plugins >> but I’ve been able to so that means it is not too hard :) Here you have an >> example of a storage plugin that extends in my case the JSON and basically >> fits in a homegrown SAX parser so I could convert XML to JSON and then feed >> it into the normal JSON storage plugin. Not very pretty but it works. >> >> In your case I would believe you have to extend the same amount of classes >> that I did. Please observe that I was lazy so I went in and changed one >> thing in the reader class I extended to make it easier for me. Of course >> this is not how would do it in real life :) >> >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ < >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ >> <https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>> >> >> In case of any troubles I am sure the community will assist. >> >> Regards, >> Magnus >> --------------- >> Magnus Pierre | Systems Engineer >> +46 72 710 1935 >> [email protected] | http://www.mapr.com >> >> >> >>> 19 feb 2016 kl. 16:29 skrev Wilburn, Scott >> <[email protected]>: >>> >>> Magnus, >>> Great suggestions. I would like to try to extend the functionality of >> the "text" format type, used for csv files today. Do you know if there are >> any published instructions for this process? >>> >>> Thanks, >>> Scott Wilburn >>> >>> >>> -----Original Message----- >>> From: Magnus Pierre [mailto:[email protected]] >>> Sent: Friday, February 19, 2016 01:51 AM >>> To: [email protected] >>> Subject: Re: [E] Re: Multiple Delimiter Format >>> >>> Hello Scott, >>> >>> What you typically do is that you decide which separator will give you >> the most granular split (in your case comma) and then use SQL constructs to >> further transform the returned columns of the set to the structure you >> would like to have. In SQL you can always create additional fields. I >> understand that is is probably the single thing you would like to avoid. >> Alternatively you create your own plugin that takes a set of separators as >> a parameter to the storage plugin, split the ingoing records based on the >> set and basically reuse all the other existing classes for the storage >> plugin. I personally believe it should be quite easy to accomplished by >> extending existing code (even though I have not looked at the code for a >> considerable time now) >>> >>> I added very primitive XML support using the existing JSON classes and >> extending the reader as needed. >>> >>> Regards, >>> Magnus >>> >>>> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>: >>>> >>>> The delimited file reader does not support that. >>>> >>>> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < >>>> [email protected]> wrote: >>>> >>>>> Jim, >>>>> Just to clarify, I'm trying to use Drill on files that contain >>>>> records where the fields are delimited by multiple different >> characters. >>>>> >>>>> Example record: 10-20-16,4477,99;98,aab,99;66,aab >>>>> >>>>> Desired result: >>>>> columns[0] = 10-20-16 >>>>> columns[1] = 4477 >>>>> columns[2] = 99 >>>>> columns[3] = 98 >>>>> columns[4] = aab >>>>> columns[5] = 99 >>>>> columns[6] = 66 >>>>> columns[7] = aab >>>>> >>>>> In this example, the record contains 8 fields when split by comma and >>>>> by semicolon. >>>>> Is something like this possible? >>>>> >>>>> Thanks, >>>>> Scott Wilburn >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Jim Scott [mailto:[email protected]] >>>>> Sent: Thursday, February 18, 2016 03:46 PM >>>>> To: user >>>>> Subject: [E] Re: Multiple Delimiter Format >>>>> >>>>> Scott, >>>>> >>>>> You would need a format defined for each file type. e.g. csv has >>>>> commas, tsv has tabs, so on >>>>> >>>>> If you are looking for multiple delimiters within the same file or >>>>> potentially with a single file extension that isn't supported. >>>>> >>>>> Jim >>>>> >>>>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < >>>>> [email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> Is there a way to specify multiple delimiters when configuring a >>>>>> storage plugin record format? For example, could I split records >>>>>> into fields by comma or by semicolon characters. >>>>>> >>>>>> Thanks, >>>>>> Scott Wilburn >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Jim Scott* >>>>> Director, Enterprise Strategy & Architecture >>>>> +1 (347) 746-9281 >>>>> @kingmesal <https://twitter.com/kingmesal> >>>>> >>>>> <http://www.mapr.com/> >>>>> [image: MapR Technologies] <http://www.mapr.com> >>>>> >>>>> Now Available - Free Hadoop On-Demand Training < >>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut >>>>> m_campaign=Free%20available >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> *Jim Scott* >>>> Director, Enterprise Strategy & Architecture >>>> +1 (347) 746-9281 >>>> @kingmesal <https://twitter.com/kingmesal> >>>> >>>> <http://www.mapr.com/> >>>> [image: MapR Technologies] <http://www.mapr.com> >>>> >>>> Now Available - Free Hadoop On-Demand Training >>>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut >>>> m_campaign=Free%20available>
