Jacques,

My code is based on easyformat stuff (if remember correctly) public class 
XMLFormatPlugin extends EasyFormatPlugin<XMLFormatConfig>, but I wouldn’t for 
the world be able to explain how it works :)

Regards,
/Magnus


---------------
Magnus Pierre | Systems Engineer
+46 72 710 1935
[email protected] | http://www.mapr.com



> 19 feb 2016 kl. 16:50 skrev Jacques Nadeau <[email protected]>:
> 
> My suggestion is to actually focus on creating an EasyFormatPlugin [1]
> rather than a StoragePlugin (they are easier to do and are only a few
> classes). Maybe a RegexSplitFormatPlugin? You should be able to use the
> table-with-options syntax to state what pattern to split records on.
> 
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
>  
> <https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java>
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Fri, Feb 19, 2016 at 7:44 AM, Magnus Pierre <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Hello Scott,
>> 
>> I don’t know of any documentation with regards to writing storage plugins
>> but I’ve been able to so that means it is not too hard :) Here you have an
>> example of a storage plugin that extends in my case the JSON  and basically
>> fits in a homegrown SAX parser so I could convert XML to JSON and then feed
>> it into the normal JSON storage plugin. Not very pretty but it works.
>> 
>> In your case I would believe you have to extend the same amount of classes
>> that I did. Please observe that I was lazy so I went in and changed one
>> thing in the reader class I extended to make it easier for me. Of course
>> this is not how would do it in real life :)
>> 
>> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ <
>> https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/ 
>> <https://github.com/magpierre/drill/tree/DRILL-3878/contrib/storage-xml/>>
>> 
>> In case of any troubles I am sure the community will assist.
>> 
>> Regards,
>> Magnus
>> ---------------
>> Magnus Pierre | Systems Engineer
>> +46 72 710 1935
>> [email protected] | http://www.mapr.com
>> 
>> 
>> 
>>> 19 feb 2016 kl. 16:29 skrev Wilburn, Scott
>> <[email protected]>:
>>> 
>>> Magnus,
>>> Great suggestions. I would like to try to extend the functionality of
>> the "text" format type, used for csv files today. Do you know if there are
>> any published instructions for this process?
>>> 
>>> Thanks,
>>> Scott Wilburn
>>> 
>>> 
>>> -----Original Message-----
>>> From: Magnus Pierre [mailto:[email protected]]
>>> Sent: Friday, February 19, 2016 01:51 AM
>>> To: [email protected]
>>> Subject: Re: [E] Re: Multiple Delimiter Format
>>> 
>>> Hello Scott,
>>> 
>>> What you typically do is that you decide which separator will give you
>> the most granular split (in your case comma) and then use SQL constructs to
>> further transform the returned columns of the set to the structure you
>> would like to have. In SQL you can always create additional fields. I
>> understand that is is probably the single thing you would like to avoid.
>> Alternatively you create your own plugin that takes a set of separators as
>> a parameter to the storage plugin, split the ingoing records based on the
>> set and basically reuse all the other existing classes for the storage
>> plugin. I personally believe it should be quite easy to accomplished by
>> extending existing code (even though I have not looked at the code for a
>> considerable time now)
>>> 
>>> I added very primitive XML support using the existing JSON classes and
>> extending the reader as needed.
>>> 
>>> Regards,
>>> Magnus
>>> 
>>>> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>:
>>>> 
>>>> The delimited file reader does not support that.
>>>> 
>>>> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott <
>>>> [email protected]> wrote:
>>>> 
>>>>> Jim,
>>>>> Just to clarify, I'm trying to use Drill on files that contain
>>>>> records where the fields are delimited by multiple different
>> characters.
>>>>> 
>>>>> Example record: 10-20-16,4477,99;98,aab,99;66,aab
>>>>> 
>>>>> Desired result:
>>>>> columns[0] = 10-20-16
>>>>> columns[1] = 4477
>>>>> columns[2] = 99
>>>>> columns[3] = 98
>>>>> columns[4] = aab
>>>>> columns[5] = 99
>>>>> columns[6] = 66
>>>>> columns[7] = aab
>>>>> 
>>>>> In this example, the record contains 8 fields when split by comma and
>>>>> by semicolon.
>>>>> Is something like this possible?
>>>>> 
>>>>> Thanks,
>>>>> Scott Wilburn
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jim Scott [mailto:[email protected]]
>>>>> Sent: Thursday, February 18, 2016 03:46 PM
>>>>> To: user
>>>>> Subject: [E] Re: Multiple Delimiter Format
>>>>> 
>>>>> Scott,
>>>>> 
>>>>> You would need a format defined for each file type. e.g. csv has
>>>>> commas, tsv has tabs, so on
>>>>> 
>>>>> If you are looking for multiple delimiters within the same file or
>>>>> potentially with a single file extension that isn't supported.
>>>>> 
>>>>> Jim
>>>>> 
>>>>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> Is there a way to specify multiple delimiters when configuring a
>>>>>> storage plugin record format? For example, could I split records
>>>>>> into fields by comma or by semicolon characters.
>>>>>> 
>>>>>> Thanks,
>>>>>> Scott Wilburn
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> *Jim Scott*
>>>>> Director, Enterprise Strategy & Architecture
>>>>> +1 (347) 746-9281
>>>>> @kingmesal <https://twitter.com/kingmesal>
>>>>> 
>>>>> <http://www.mapr.com/>
>>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>>> 
>>>>> Now Available - Free Hadoop On-Demand Training <
>>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
>>>>> m_campaign=Free%20available
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Jim Scott*
>>>> Director, Enterprise Strategy & Architecture
>>>> +1 (347) 746-9281
>>>> @kingmesal <https://twitter.com/kingmesal>
>>>> 
>>>> <http://www.mapr.com/>
>>>> [image: MapR Technologies] <http://www.mapr.com>
>>>> 
>>>> Now Available - Free Hadoop On-Demand Training
>>>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
>>>> m_campaign=Free%20available>

Reply via email to