My suggestion is to actually focus on creating an EasyFormatPlugin [1] rather than a StoragePlugin (they are easier to do and are only a few classes). Maybe a RegexSplitFormatPlugin? You should be able to use the table-with-options syntax to state what pattern to split records on.
[1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java -- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <[email protected]> wrote: > Hello Scott, > > I don’t know of any documentation with regards to writing storage plugins > but I’ve been able to so that means it is not too hard :) Here you have an > example of a storage plugin that extends in my case the JSON and basically > fits in a homegrown SAX parser so I could convert XML to JSON and then feed > it into the normal JSON storage plugin. Not very pretty but it works. > > In your case I would believe you have to extend the same amount of classes > that I did. Please observe that I was lazy so I went in and changed one > thing in the reader class I extended to make it easier for me. Of course > this is not how would do it in real life :) > > https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ < > https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/> > > In case of any troubles I am sure the community will assist. > > Regards, > Magnus > --------------- > Magnus Pierre | Systems Engineer > +46 72 710 1935 > [email protected] | http://www.mapr.com > > > > > 19 feb 2016 kl. 16:29 skrev Wilburn, Scott > <[email protected]>: > > > > Magnus, > > Great suggestions. I would like to try to extend the functionality of > the "text" format type, used for csv files today. Do you know if there are > any published instructions for this process? > > > > Thanks, > > Scott Wilburn > > > > > > -----Original Message----- > > From: Magnus Pierre [mailto:[email protected]] > > Sent: Friday, February 19, 2016 01:51 AM > > To: [email protected] > > Subject: Re: [E] Re: Multiple Delimiter Format > > > > Hello Scott, > > > > What you typically do is that you decide which separator will give you > the most granular split (in your case comma) and then use SQL constructs to > further transform the returned columns of the set to the structure you > would like to have. In SQL you can always create additional fields. I > understand that is is probably the single thing you would like to avoid. > Alternatively you create your own plugin that takes a set of separators as > a parameter to the storage plugin, split the ingoing records based on the > set and basically reuse all the other existing classes for the storage > plugin. I personally believe it should be quite easy to accomplished by > extending existing code (even though I have not looked at the code for a > considerable time now) > > > > I added very primitive XML support using the existing JSON classes and > extending the reader as needed. > > > > Regards, > > Magnus > > > >> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>: > >> > >> The delimited file reader does not support that. > >> > >> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < > >> [email protected]> wrote: > >> > >>> Jim, > >>> Just to clarify, I'm trying to use Drill on files that contain > >>> records where the fields are delimited by multiple different > characters. > >>> > >>> Example record: 10-20-16,4477,99;98,aab,99;66,aab > >>> > >>> Desired result: > >>> columns[0] = 10-20-16 > >>> columns[1] = 4477 > >>> columns[2] = 99 > >>> columns[3] = 98 > >>> columns[4] = aab > >>> columns[5] = 99 > >>> columns[6] = 66 > >>> columns[7] = aab > >>> > >>> In this example, the record contains 8 fields when split by comma and > >>> by semicolon. > >>> Is something like this possible? > >>> > >>> Thanks, > >>> Scott Wilburn > >>> > >>> > >>> -----Original Message----- > >>> From: Jim Scott [mailto:[email protected]] > >>> Sent: Thursday, February 18, 2016 03:46 PM > >>> To: user > >>> Subject: [E] Re: Multiple Delimiter Format > >>> > >>> Scott, > >>> > >>> You would need a format defined for each file type. e.g. csv has > >>> commas, tsv has tabs, so on > >>> > >>> If you are looking for multiple delimiters within the same file or > >>> potentially with a single file extension that isn't supported. > >>> > >>> Jim > >>> > >>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < > >>> [email protected]> wrote: > >>> > >>>> Hello, > >>>> Is there a way to specify multiple delimiters when configuring a > >>>> storage plugin record format? For example, could I split records > >>>> into fields by comma or by semicolon characters. > >>>> > >>>> Thanks, > >>>> Scott Wilburn > >>>> > >>>> > >>> > >>> > >>> -- > >>> *Jim Scott* > >>> Director, Enterprise Strategy & Architecture > >>> +1 (347) 746-9281 > >>> @kingmesal <https://twitter.com/kingmesal> > >>> > >>> <http://www.mapr.com/> > >>> [image: MapR Technologies] <http://www.mapr.com> > >>> > >>> Now Available - Free Hadoop On-Demand Training < > >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut > >>> m_campaign=Free%20available > >>>> > >>> > >> > >> > >> > >> -- > >> *Jim Scott* > >> Director, Enterprise Strategy & Architecture > >> +1 (347) 746-9281 > >> @kingmesal <https://twitter.com/kingmesal> > >> > >> <http://www.mapr.com/> > >> [image: MapR Technologies] <http://www.mapr.com> > >> > >> Now Available - Free Hadoop On-Demand Training > >> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut > >> m_campaign=Free%20available> > > > >
