Hello Scott,

I don’t know of any documentation with regards to writing storage plugins but 
I’ve been able to so that means it is not too hard :) Here you have an example 
of a storage plugin that extends in my case the JSON  and basically fits in a 
homegrown SAX parser so I could convert XML to JSON and then feed it into the 
normal JSON storage plugin. Not very pretty but it works.

In your case I would believe you have to extend the same amount of classes that 
I did. Please observe that I was lazy so I went in and changed one thing in the 
reader class I extended to make it easier for me. Of course this is not how 
would do it in real life :)

https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ 
<https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>

In case of any troubles I am sure the community will assist. 

Regards,
Magnus
---------------
Magnus Pierre | Systems Engineer
+46 72 710 1935
[email protected] | http://www.mapr.com



> 19 feb 2016 kl. 16:29 skrev Wilburn, Scott 
> <[email protected]>:
> 
> Magnus,
> Great suggestions. I would like to try to extend the functionality of the 
> "text" format type, used for csv files today. Do you know if there are any 
> published instructions for this process? 
> 
> Thanks,
> Scott Wilburn
> 
> 
> -----Original Message-----
> From: Magnus Pierre [mailto:[email protected]] 
> Sent: Friday, February 19, 2016 01:51 AM
> To: [email protected]
> Subject: Re: [E] Re: Multiple Delimiter Format
> 
> Hello Scott,
> 
> What you typically do is that you decide which separator will give you the 
> most granular split (in your case comma) and then use SQL constructs to 
> further transform the returned columns of the set to the structure you would 
> like to have. In SQL you can always create additional fields. I understand 
> that is is probably the single thing you would like to avoid. Alternatively 
> you create your own plugin that takes a set of separators as a parameter to 
> the storage plugin, split the ingoing records based on the set and basically 
> reuse all the other existing classes for the storage plugin. I personally 
> believe it should be quite easy to accomplished by extending existing code 
> (even though I have not looked at the code for a considerable time now)
> 
> I added very primitive XML support using the existing JSON classes and 
> extending the reader as needed.
> 
> Regards,
> Magnus
> 
>> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>:
>> 
>> The delimited file reader does not support that.
>> 
>> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < 
>> [email protected]> wrote:
>> 
>>> Jim,
>>> Just to clarify, I'm trying to use Drill on files that contain 
>>> records where the fields are delimited by multiple different characters.
>>> 
>>> Example record: 10-20-16,4477,99;98,aab,99;66,aab
>>> 
>>> Desired result:
>>> columns[0] = 10-20-16
>>> columns[1] = 4477
>>> columns[2] = 99
>>> columns[3] = 98
>>> columns[4] = aab
>>> columns[5] = 99
>>> columns[6] = 66
>>> columns[7] = aab
>>> 
>>> In this example, the record contains 8 fields when split by comma and 
>>> by semicolon.
>>> Is something like this possible?
>>> 
>>> Thanks,
>>> Scott Wilburn
>>> 
>>> 
>>> -----Original Message-----
>>> From: Jim Scott [mailto:[email protected]]
>>> Sent: Thursday, February 18, 2016 03:46 PM
>>> To: user
>>> Subject: [E] Re: Multiple Delimiter Format
>>> 
>>> Scott,
>>> 
>>> You would need a format defined for each file type. e.g. csv has 
>>> commas, tsv has tabs, so on
>>> 
>>> If you are looking for multiple delimiters within the same file or 
>>> potentially with a single file extension that isn't supported.
>>> 
>>> Jim
>>> 
>>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < 
>>> [email protected]> wrote:
>>> 
>>>> Hello,
>>>> Is there a way to specify multiple delimiters when configuring a 
>>>> storage plugin record format? For example, could I split records 
>>>> into fields by comma or by semicolon characters.
>>>> 
>>>> Thanks,
>>>> Scott Wilburn
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> *Jim Scott*
>>> Director, Enterprise Strategy & Architecture
>>> +1 (347) 746-9281
>>> @kingmesal <https://twitter.com/kingmesal>
>>> 
>>> <http://www.mapr.com/>
>>> [image: MapR Technologies] <http://www.mapr.com>
>>> 
>>> Now Available - Free Hadoop On-Demand Training < 
>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
>>> m_campaign=Free%20available
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> *Jim Scott*
>> Director, Enterprise Strategy & Architecture
>> +1 (347) 746-9281
>> @kingmesal <https://twitter.com/kingmesal>
>> 
>> <http://www.mapr.com/>
>> [image: MapR Technologies] <http://www.mapr.com>
>> 
>> Now Available - Free Hadoop On-Demand Training 
>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
>> m_campaign=Free%20available>
> 

Reply via email to