Magnus,
Great suggestions. I would like to try to extend the functionality of the 
"text" format type, used for csv files today. Do you know if there are any 
published instructions for this process? 

Thanks,
Scott Wilburn


-----Original Message-----
From: Magnus Pierre [mailto:[email protected]] 
Sent: Friday, February 19, 2016 01:51 AM
To: [email protected]
Subject: Re: [E] Re: Multiple Delimiter Format

Hello Scott,

What you typically do is that you decide which separator will give you the most 
granular split (in your case comma) and then use SQL constructs to further 
transform the returned columns of the set to the structure you would like to 
have. In SQL you can always create additional fields. I understand that is is 
probably the single thing you would like to avoid. Alternatively you create 
your own plugin that takes a set of separators as a parameter to the storage 
plugin, split the ingoing records based on the set and basically reuse all the 
other existing classes for the storage plugin. I personally believe it should 
be quite easy to accomplished by extending existing code (even though I have 
not looked at the code for a considerable time now)

I added very primitive XML support using the existing JSON classes and 
extending the reader as needed.

Regards,
Magnus

> 19 feb 2016 kl. 04:18 skrev Jim Scott <[email protected]>:
> 
> The delimited file reader does not support that.
> 
> On Thu, Feb 18, 2016 at 5:58 PM, Wilburn, Scott < 
> [email protected]> wrote:
> 
>> Jim,
>> Just to clarify, I'm trying to use Drill on files that contain 
>> records where the fields are delimited by multiple different characters.
>> 
>> Example record: 10-20-16,4477,99;98,aab,99;66,aab
>> 
>> Desired result:
>> columns[0] = 10-20-16
>> columns[1] = 4477
>> columns[2] = 99
>> columns[3] = 98
>> columns[4] = aab
>> columns[5] = 99
>> columns[6] = 66
>> columns[7] = aab
>> 
>> In this example, the record contains 8 fields when split by comma and 
>> by semicolon.
>> Is something like this possible?
>> 
>> Thanks,
>> Scott Wilburn
>> 
>> 
>> -----Original Message-----
>> From: Jim Scott [mailto:[email protected]]
>> Sent: Thursday, February 18, 2016 03:46 PM
>> To: user
>> Subject: [E] Re: Multiple Delimiter Format
>> 
>> Scott,
>> 
>> You would need a format defined for each file type. e.g. csv has 
>> commas, tsv has tabs, so on
>> 
>> If you are looking for multiple delimiters within the same file or 
>> potentially with a single file extension that isn't supported.
>> 
>> Jim
>> 
>> On Thu, Feb 18, 2016 at 5:43 PM, Wilburn, Scott < 
>> [email protected]> wrote:
>> 
>>> Hello,
>>> Is there a way to specify multiple delimiters when configuring a 
>>> storage plugin record format? For example, could I split records 
>>> into fields by comma or by semicolon characters.
>>> 
>>> Thanks,
>>> Scott Wilburn
>>> 
>>> 
>> 
>> 
>> --
>> *Jim Scott*
>> Director, Enterprise Strategy & Architecture
>> +1 (347) 746-9281
>> @kingmesal <https://twitter.com/kingmesal>
>> 
>> <http://www.mapr.com/>
>> [image: MapR Technologies] <http://www.mapr.com>
>> 
>> Now Available - Free Hadoop On-Demand Training < 
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
>> m_campaign=Free%20available
>>> 
>> 
> 
> 
> 
> --
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training 
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&ut
> m_campaign=Free%20available>

Reply via email to