Re: CSV Reader on 1.3

Julien Le Dem Thu, 03 Dec 2015 15:54:48 -0800

Here's a PR for the doc:
https://github.com/apache/drill/pull/290


On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <[email protected]> wrote:

> Hi,
> I need to update the doc to for this. I'll send a PR soon.
> In the meantime you can look at the tests:
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> Basically there is one type for each Format plugin.
> It look at the classes that implement FormatPluginConfig just like for the
> json based configuration:
>
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
>
> For example for the "text" format:
>
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
>
> the type is "text" as defined by the annotation:@JsonTypeName("text")
> the available parameters are the same fields as in the json conf with the
> same defaults:
>     public String lineDelimiter = "\n";
>     public char fieldDelimiter = '\n';
>     public char quote = '"';
>     public char escape = '"';
>     public char comment = '#';
>     public boolean skipFirstLine = false;
>     public boolean extractHeader = false;
>
> On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <[email protected]>
> wrote:
>
>> I don't think we have anything posted right now, it was just merged last
>> week.
>>
>> Julien,
>> Did you have something written for a short bit of documentation on the
>> functionality and any current limitations?
>>
>> - Jason
>>
>> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
>> [email protected]>
>> wrote:
>>
>> > I didn't notice select with options is already available !!! did we add
>> it
>> > to the documentation ?
>> >
>> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
>> [email protected]
>> > > wrote:
>> >
>> >> Yes!
>> >>
>> >> Thanks to the new feature "select with options" it is possible to
>> >> configure
>> >> the text reader to have query specific options. You will need to build
>> the
>> >> tip of master or use the soon to be posted release candidate for 1.4 to
>> >> use
>> >> the feature.
>> >>
>> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
>> >> fieldDelimiter => ',', extractHeader => true))
>> >>
>> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <[email protected]>
>> wrote:
>> >>
>> >> > I can't reproduce, so I must have done something wrong the first
>> time,
>> >> > thank you for replying.
>> >> >
>> >> > Is there away to select from a csv directory with extract header for
>> >> only
>> >> > that query or table?  (Options?)
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
>> >> > [email protected]>
>> >> > wrote:
>> >> >
>> >> > > Hey John,
>> >> > >
>> >> > > What do you get when you run "select * from sys.version" ?
>> >> > >
>> >> > > extractHeader is false by default, so you need to explicitly set
>> it to
>> >> > > true.
>> >> > >
>> >> > > can you post your storage plugin configuration ?
>> >> > >
>> >> > > Thanks
>> >> > >
>> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <[email protected]>
>> >> wrote:
>> >> > >
>> >> > > > Hey all,
>> >> > > >
>> >> > > > Per my comment on
>> https://issues.apache.org/jira/browse/DRILL-4145,
>> >> I
>> >> > > am
>> >> > > > curious on why a CSV query (I am assuming with a default
>> >> configuration,
>> >> > > but
>> >> > > > I have asked the question) in S3 would interpret differently
>> than a
>> >> CSV
>> >> > > > query in MaprFS.
>> >> > > >
>> >> > > > Per the other user, they are using Drill 1.3, and I am as well
>> (per
>> >> the
>> >> > > > MapR folks, I am using a Dev release version from MapR that has
>> the
>> >> > > Office
>> >> > > > 1.3 release code base)
>> >> > > >
>> >> > > > Basically, The query from the JIRA author showed the CSV file
>> being
>> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers
>> and
>> >> the
>> >> > > > results broken out into columns. When I did this on the same
>> data, I
>> >> > got
>> >> > > > one results, columns and an array of data.
>> >> > > >
>> >> > > > I tried setting extractHeader: true (what is the default on this
>> >> > setting)
>> >> > > > and that had no effect. (After I update a storage plugin, what
>> do I
>> >> > need
>> >> > > to
>> >> > > > do to ensure I see the effect in my SQL line session? DO I need
>> to
>> >> > > > reconnect?  Basically I set the storage plugin, got the "success"
>> >> then
>> >> > > > changed to a difference schema and then back to my original
>> schema
>> >> and
>> >> > > saw
>> >> > > > no effect... should I reconnect or is that not needed?)
>> >> > > >
>> >> > > > Just curious on why we'd see different ways to read CSV files,
>> the
>> >> S3
>> >> > vs.
>> >> > > > MapRFS shouldn't be different... or am I missing something?
>> >> > > >
>> >> > > > Thanks!
>> >> > > >
>> >> > > > John
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > >
>> >> > > Abdelhakim Deneche
>> >> > >
>> >> > > Software Engineer
>> >> > >
>> >> > >   <http://www.mapr.com/>
>> >> > >
>> >> > >
>> >> > > Now Available - Free Hadoop On-Demand Training
>> >> > > <
>> >> > >
>> >> >
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Abdelhakim Deneche
>> >
>> > Software Engineer
>> >
>> >   <http://www.mapr.com/>
>> >
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>> >
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: CSV Reader on 1.3

Reply via email to