My suggestion is to actually focus on creating an EasyFormatPlugin [1]
rather than a StoragePlugin (they are easier to do and are only a few
classes). Maybe a RegexSplitFormatPlugin? You should be able to use the
table-with-options syntax to state what pattern to split records on.

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <[email protected]> wrote:

> Hello Scott,
>
> I don’t know of any documentation with regards to writing storage plugins
> but I’ve been able to so that means it is not too hard :) Here you have an
> example of a storage plugin that extends in my case the JSON  and basically
> fits in a homegrown SAX parser so I could convert XML to JSON and then feed
> it into the normal JSON storage plugin. Not very pretty but it works.
>
> In your case I would believe you have to extend the same amount of classes
> that I did. Please observe that I was lazy so I went in and changed one
> thing in the reader class I extended to make it easier for me. Of course
> this is not how would do it in real life :)
>
> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ <
> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>
>
> In case of any troubles I am sure the community will assist.
>
> Regards,
> Magnus
> ---------------
> Magnus Pierre | Systems Engineer
> +46 72 710 1935
> [email protected] | http://www.mapr.com
>
>
>
> > 19 feb 2016 kl. 16:29 skrev Wilburn, Scott
> <[email protected]>:
> >
> > Magnus,
> > Great suggestions. I would like to try to extend the functionality of
> the "text" format type, used for csv files today. Do you know if there are
> any published instructions for this process?
> >
> > Thanks,
> > Scott Wilburn
> >
> >
> > -----Original Message-----
> > From: Magnus Pierre [mailto:[email protected]]
> > Sent: Friday, February 19, 2016 01:51 AM
> > To: [email protected]
> > Subject: Re: [E] Re: Multiple Delimiter Format
> >
> > Hello Scott,
> >
> > What you typically do is that you decide which separator will give you
> the most granular split (in your case comma) and then use SQL constructs to
> further transform the returned columns of the set to the structure you
> would like to have. In SQL you can always create additional fields. I
> understand that is is probably the single thing you would like to avoid.
> Alternatively you create your own plugin that takes a set of separators as
> a parameter to the storage plugin, split the ingoing records based on the
> set and basically reuse all the other existing classes for the storage
> plugin. I personally believe it should be quite easy to accomplished by
> extending existing code (even though I have not looked at the code for a
> considerable time now)
> >
> > I added very primitive XML support using the existing JSON classes and
> extending the reader as needed.
> >
> > Regards,
> > Magnus
> >
> >> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>:
> >>
> >> The delimited file reader does not support that.
> >>
> >> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott <
> >> [email protected]> wrote:
> >>
> >>> Jim,
> >>> Just to clarify, I'm trying to use Drill on files that contain
> >>> records where the fields are delimited by multiple different
> characters.
> >>>
> >>> Example record: 10-20-16,4477,99;98,aab,99;66,aab
> >>>
> >>> Desired result:
> >>> columns[0] = 10-20-16
> >>> columns[1] = 4477
> >>> columns[2] = 99
> >>> columns[3] = 98
> >>> columns[4] = aab
> >>> columns[5] = 99
> >>> columns[6] = 66
> >>> columns[7] = aab
> >>>
> >>> In this example, the record contains 8 fields when split by comma and
> >>> by semicolon.
> >>> Is something like this possible?
> >>>
> >>> Thanks,
> >>> Scott Wilburn
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Jim Scott [mailto:[email protected]]
> >>> Sent: Thursday, February 18, 2016 03:46 PM
> >>> To: user
> >>> Subject: [E] Re: Multiple Delimiter Format
> >>>
> >>> Scott,
> >>>
> >>> You would need a format defined for each file type. e.g. csv has
> >>> commas, tsv has tabs, so on
> >>>
> >>> If you are looking for multiple delimiters within the same file or
> >>> potentially with a single file extension that isn't supported.
> >>>
> >>> Jim
> >>>
> >>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott <
> >>> [email protected]> wrote:
> >>>
> >>>> Hello,
> >>>> Is there a way to specify multiple delimiters when configuring a
> >>>> storage plugin record format? For example, could I split records
> >>>> into fields by comma or by semicolon characters.
> >>>>
> >>>> Thanks,
> >>>> Scott Wilburn
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> *Jim Scott*
> >>> Director, Enterprise Strategy & Architecture
> >>> +1 (347) 746-9281
> >>> @kingmesal <https://twitter.com/kingmesal>
> >>>
> >>> <http://www.mapr.com/>
> >>> [image: MapR Technologies] <http://www.mapr.com>
> >>>
> >>> Now Available - Free Hadoop On-Demand Training <
> >>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
> >>> m_campaign=Free%20available
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> *Jim Scott*
> >> Director, Enterprise Strategy & Architecture
> >> +1 (347) 746-9281
> >> @kingmesal <https://twitter.com/kingmesal>
> >>
> >> <http://www.mapr.com/>
> >> [image: MapR Technologies] <http://www.mapr.com>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
> >> m_campaign=Free%20available>
> >
>
>

Reply via email to