Don't know how big your data is, as this won't be the most efficient
solution, but there is a hack that can enable this without modifying Drill.

Create a new format that actually won't parse anything but the lines, put
in a delimiter that doesn't appear in your dataset.

    "not_csv": {
      "type": "text",
      "extensions": [
        "not_csv"
      ],
      "delimiter": "^"
    },

Then you can just use string replace to make this look like JSON and pass
it into our convert_from( field, 'JSON') function.

select convert_from(concat(concat('["' , `replace`(`replace`(columns[0],
',','","'), ';', '","')),'"]'),'JSON') from dfs.`/tmp/mixed.not_csv`;



On Fri, Feb 19, 2016 at 7:56 AM, Magnus Pierre <mpie...@maprtech.com> wrote:

> Jacques,
>
> My code is based on easyformat stuff (if remember correctly) public class
> XMLFormatPlugin extends EasyFormatPlugin<XMLFormatConfig>, but I wouldn’t
> for the world be able to explain how it works :)
>
> Regards,
> /Magnus
>
>
> ---------------
> Magnus Pierre | Systems Engineer
> +46 72 710 1935
> mpie...@maprtech.com | http://www.mapr.com
>
>
>
> > 19 feb 2016 kl. 16:50 skrev Jacques Nadeau <jacq...@dremio.com>:
> >
> > My suggestion is to actually focus on creating an EasyFormatPlugin [1]
> > rather than a StoragePlugin (they are easier to do and are only a few
> > classes). Maybe a RegexSplitFormatPlugin? You should be able to use the
> > table-with-options syntax to state what pattern to split records on.
> >
> > [1]
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
> <
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <mpie...@maprtech.com
> <mailto:mpie...@maprtech.com>> wrote:
> >
> >> Hello Scott,
> >>
> >> I don’t know of any documentation with regards to writing storage
> plugins
> >> but I’ve been able to so that means it is not too hard :) Here you have
> an
> >> example of a storage plugin that extends in my case the JSON  and
> basically
> >> fits in a homegrown SAX parser so I could convert XML to JSON and then
> feed
> >> it into the normal JSON storage plugin. Not very pretty but it works.
> >>
> >> In your case I would believe you have to extend the same amount of
> classes
> >> that I did. Please observe that I was lazy so I went in and changed one
> >> thing in the reader class I extended to make it easier for me. Of course
> >> this is not how would do it in real life :)
> >>
> >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/
> <
> >> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/
> <https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>>
> >>
> >> In case of any troubles I am sure the community will assist.
> >>
> >> Regards,
> >> Magnus
> >> ---------------
> >> Magnus Pierre | Systems Engineer
> >> +46 72 710 1935
> >> mpie...@maprtech.com | http://www.mapr.com
> >>
> >>
> >>
> >>> 19 feb 2016 kl. 16:29 skrev Wilburn, Scott
> >> <scott.wilb...@verizonwireless.com.INVALID>:
> >>>
> >>> Magnus,
> >>> Great suggestions. I would like to try to extend the functionality of
> >> the "text" format type, used for csv files today. Do you know if there
> are
> >> any published instructions for this process?
> >>>
> >>> Thanks,
> >>> Scott Wilburn
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Magnus Pierre [mailto:mpie...@maprtech.com]
> >>> Sent: Friday, February 19, 2016 01:51 AM
> >>> To: user@drill.apache.org
> >>> Subject: Re: [E] Re: Multiple Delimiter Format
> >>>
> >>> Hello Scott,
> >>>
> >>> What you typically do is that you decide which separator will give you
> >> the most granular split (in your case comma) and then use SQL
> constructs to
> >> further transform the returned columns of the set to the structure you
> >> would like to have. In SQL you can always create additional fields. I
> >> understand that is is probably the single thing you would like to avoid.
> >> Alternatively you create your own plugin that takes a set of separators
> as
> >> a parameter to the storage plugin, split the ingoing records based on
> the
> >> set and basically reuse all the other existing classes for the storage
> >> plugin. I personally believe it should be quite easy to accomplished by
> >> extending existing code (even though I have not looked at the code for a
> >> considerable time now)
> >>>
> >>> I added very primitive XML support using the existing JSON classes and
> >> extending the reader as needed.
> >>>
> >>> Regards,
> >>> Magnus
> >>>
> >>>> 19 feb 2016 kl. 04:18 skrev Jim Scott <jsc...@maprtech.com>:
> >>>>
> >>>> The delimited file reader does not support that.
> >>>>
> >>>> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott <
> >>>> scott.wilb...@verizonwireless.com.invalid> wrote:
> >>>>
> >>>>> Jim,
> >>>>> Just to clarify, I'm trying to use Drill on files that contain
> >>>>> records where the fields are delimited by multiple different
> >> characters.
> >>>>>
> >>>>> Example record: 10-20-16,4477,99;98,aab,99;66,aab
> >>>>>
> >>>>> Desired result:
> >>>>> columns[0] = 10-20-16
> >>>>> columns[1] = 4477
> >>>>> columns[2] = 99
> >>>>> columns[3] = 98
> >>>>> columns[4] = aab
> >>>>> columns[5] = 99
> >>>>> columns[6] = 66
> >>>>> columns[7] = aab
> >>>>>
> >>>>> In this example, the record contains 8 fields when split by comma and
> >>>>> by semicolon.
> >>>>> Is something like this possible?
> >>>>>
> >>>>> Thanks,
> >>>>> Scott Wilburn
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Jim Scott [mailto:jsc...@maprtech.com]
> >>>>> Sent: Thursday, February 18, 2016 03:46 PM
> >>>>> To: user
> >>>>> Subject: [E] Re: Multiple Delimiter Format
> >>>>>
> >>>>> Scott,
> >>>>>
> >>>>> You would need a format defined for each file type. e.g. csv has
> >>>>> commas, tsv has tabs, so on
> >>>>>
> >>>>> If you are looking for multiple delimiters within the same file or
> >>>>> potentially with a single file extension that isn't supported.
> >>>>>
> >>>>> Jim
> >>>>>
> >>>>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott <
> >>>>> scott.wilb...@verizonwireless.com.invalid> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>> Is there a way to specify multiple delimiters when configuring a
> >>>>>> storage plugin record format? For example, could I split records
> >>>>>> into fields by comma or by semicolon characters.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Scott Wilburn
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> *Jim Scott*
> >>>>> Director, Enterprise Strategy & Architecture
> >>>>> +1 (347) 746-9281
> >>>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>>
> >>>>> <http://www.mapr.com/>
> >>>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>>
> >>>>> Now Available - Free Hadoop On-Demand Training <
> >>>>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
> >>>>> m_campaign=Free%20available
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Jim Scott*
> >>>> Director, Enterprise Strategy & Architecture
> >>>> +1 (347) 746-9281
> >>>> @kingmesal <https://twitter.com/kingmesal>
> >>>>
> >>>> <http://www.mapr.com/>
> >>>> [image: MapR Technologies] <http://www.mapr.com>
> >>>>
> >>>> Now Available - Free Hadoop On-Demand Training
> >>>> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
> >>>> m_campaign=Free%20available>
>
>

Reply via email to