Hello Scott, I don’t know of any documentation with regards to writing storage plugins but I’ve been able to so that means it is not too hard :) Here you have an example of a storage plugin that extends in my case the JSON and basically fits in a homegrown SAX parser so I could convert XML to JSON and then feed it into the normal JSON storage plugin. Not very pretty but it works.
In your case I would believe you have to extend the same amount of classes that I did. Please observe that I was lazy so I went in and changed one thing in the reader class I extended to make it easier for me. Of course this is not how would do it in real life :) https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ <https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/> In case of any troubles I am sure the community will assist. Regards, Magnus --------------- Magnus Pierre | Systems Engineer +46 72 710 1935 [email protected] | http://www.mapr.com > 19 feb 2016 kl. 16:29 skrev Wilburn, Scott > <[email protected]>: > > Magnus, > Great suggestions. I would like to try to extend the functionality of the > "text" format type, used for csv files today. Do you know if there are any > published instructions for this process? > > Thanks, > Scott Wilburn > > > -----Original Message----- > From: Magnus Pierre [mailto:[email protected]] > Sent: Friday, February 19, 2016 01:51 AM > To: [email protected] > Subject: Re: [E] Re: Multiple Delimiter Format > > Hello Scott, > > What you typically do is that you decide which separator will give you the > most granular split (in your case comma) and then use SQL constructs to > further transform the returned columns of the set to the structure you would > like to have. In SQL you can always create additional fields. I understand > that is is probably the single thing you would like to avoid. Alternatively > you create your own plugin that takes a set of separators as a parameter to > the storage plugin, split the ingoing records based on the set and basically > reuse all the other existing classes for the storage plugin. I personally > believe it should be quite easy to accomplished by extending existing code > (even though I have not looked at the code for a considerable time now) > > I added very primitive XML support using the existing JSON classes and > extending the reader as needed. > > Regards, > Magnus > >> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>: >> >> The delimited file reader does not support that. >> >> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < >> [email protected]> wrote: >> >>> Jim, >>> Just to clarify, I'm trying to use Drill on files that contain >>> records where the fields are delimited by multiple different characters. >>> >>> Example record: 10-20-16,4477,99;98,aab,99;66,aab >>> >>> Desired result: >>> columns[0] = 10-20-16 >>> columns[1] = 4477 >>> columns[2] = 99 >>> columns[3] = 98 >>> columns[4] = aab >>> columns[5] = 99 >>> columns[6] = 66 >>> columns[7] = aab >>> >>> In this example, the record contains 8 fields when split by comma and >>> by semicolon. >>> Is something like this possible? >>> >>> Thanks, >>> Scott Wilburn >>> >>> >>> -----Original Message----- >>> From: Jim Scott [mailto:[email protected]] >>> Sent: Thursday, February 18, 2016 03:46 PM >>> To: user >>> Subject: [E] Re: Multiple Delimiter Format >>> >>> Scott, >>> >>> You would need a format defined for each file type. e.g. csv has >>> commas, tsv has tabs, so on >>> >>> If you are looking for multiple delimiters within the same file or >>> potentially with a single file extension that isn't supported. >>> >>> Jim >>> >>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < >>> [email protected]> wrote: >>> >>>> Hello, >>>> Is there a way to specify multiple delimiters when configuring a >>>> storage plugin record format? For example, could I split records >>>> into fields by comma or by semicolon characters. >>>> >>>> Thanks, >>>> Scott Wilburn >>>> >>>> >>> >>> >>> -- >>> *Jim Scott* >>> Director, Enterprise Strategy & Architecture >>> +1 (347) 746-9281 >>> @kingmesal <https://twitter.com/kingmesal> >>> >>> <http://www.mapr.com/> >>> [image: MapR Technologies] <http://www.mapr.com> >>> >>> Now Available - Free Hadoop On-Demand Training < >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut >>> m_campaign=Free%20available >>>> >>> >> >> >> >> -- >> *Jim Scott* >> Director, Enterprise Strategy & Architecture >> +1 (347) 746-9281 >> @kingmesal <https://twitter.com/kingmesal> >> >> <http://www.mapr.com/> >> [image: MapR Technologies] <http://www.mapr.com> >> >> Now Available - Free Hadoop On-Demand Training >> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut >> m_campaign=Free%20available> >
