Re: delimiter in column values

Divya Gehlot Sun, 06 Aug 2017 19:43:08 -0700

Hi,

Please find the response inline :


1. Your delimiter is a pipe in this example, and not a comma as originally
seen in the attached file. For such seminars, either we modify the storage
plugin, or rename the extension to 'psv' so that drill understands what the
delimiter is.
 Delimeter the file is comma not pipe its the same file I query  in drill
console which I shared in earlier email messages .

2. Can you try Drill-1.11.0 ?
I am using Drill 1.11.0
3. There are table functions in Drill that guide it with additional inputs
on how to manage the preparation of the table.
Can you please share the link ?


Thanks,
Divya

On 3 August 2017 at 21:06, Kunal Khatua <[email protected]> wrote:

> A couple of things...
>
> 1. Your delimiter is a pipe in this example, and not a comma as originally
> seen in the attached file. For such seminars, either we modify the storage
> plugin, or rename the extension to 'psv' so that drill understands what the
> delimiter is.
>
> 2. Can you try Drill-1.11.0 ?
>
> 3. There are table functions in Drill that guide it with additional inputs
> on how to manage the preparation of the table.
>
> I'll try this in a Windows machine in the meanwhile.
>
>
> ________________________________
> From: Divya Gehlot <[email protected]>
> Sent: Wednesday, August 2, 2017 11:14:56 PM
> To: [email protected]
> Subject: Re: delimiter in column values
>
> Hi ,
> This is my output when run in sqlline on Windows Embedded mode
>
> 0: jdbc:drill:zk=local> select * from
> `dfs`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`;
> +----------------+-------------------+----------------+-----
> -----------+----------------+
> |  col_Column1   |      Column2      |    Column3     |    Column4     |
>  Column5     |
> +----------------+-------------------+----------------+-----
> -----------+----------------+
> | "colonedata1"  | "coltwodata1"     | "-35.924476"   | "138.5987123"  | ""
>             |
> | "colonedata2"  | "coltwodata2"     | "-27.4372536"  | "153.0304583"  |
> "137"          |
> | "colonedata3"  | "coltwodata3"     | "-35.2793885"  | "149.1233503"  |
> "134"          |
> | "colonedata4"  | "coltwodata4"     | "-33.8724176"  | "151.2067579"  | ""
>             |
> | "colonedata5"  | "coltwodata5"     | ""             | ""             | ""
>             |
> | "This          |  col6 data"       | "coltwodata6"  | "-33.869732"   |
> "151.2055553"  |
> | "This          |  col7 data yes."  | "coltwodata7"  | "1.2845045"    |
> "103.8482739"  |
> | "Chifley"      | "coltwodata5"     | ""             | ""             | ""
>             |
> +----------------+-------------------+----------------+-----
> -----------+----------------+
> 8 rows selected (0.147 seconds)
> 0: jdbc:drill:zk=local> select `col_Column1` from
> `dfs`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> 0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`;
> +----------------+
> |  col_Column1   |
> +----------------+
> | "colonedata1"  |
> | "colonedata2"  |
> | "colonedata3"  |
> | "colonedata4"  |
> | "colonedata5"  |
> | "This          |
> | "This          |
> | "Chifley"      |
> +----------------+
> 8 rows selected (0.1 seconds)
>
>
> The query returning the different results due to host operating system?
>
>
> Thanks,
> Divya
>
> On 3 August 2017 at 12:45, Kunal Khatua <[email protected]> wrote:
>
> > Based on your sample data, which contains this:
> > Column1,Column2,Column3,Column4,Column5
> > "colonedata1","coltwodata1","-35.924476","138.5987123",""
> > "colonedata2","coltwodata2","-27.4372536","153.0304583","137"
> > colonedata3","coltwodata3","-35.2793885","149.1233503","134"
> > "colonedata4","coltwodata4","-33.8724176","151.2067579",""
> > "colonedata5","coltwodata5","","",""
> > "This, col6 data","coltwodata6","-33.869732","151.2055553","351"
> > "This, col7 data yes.","coltwodata7","1.2845045","103.8482739","80"
> > "Chifley","coltwodata5","","",""
> >
> > I got this and it looks like this...
> >
> >
> > 0: jdbc:drill:schema=dfs.root> select * from `sample_data.csv`;
> > +-----------------------------------------------------------
> -------------+
> > |                                columns
>  |
> > +-----------------------------------------------------------
> -------------+
> > | ["ï»¿Column1","Column2","Column3","Column4","Column5"]
> >  |
> > | ["colonedata1","coltwodata1","-35.924476","138.5987123",""]
>   |
> > | ["colonedata2","coltwodata2","-27.4372536","153.0304583","137"]
>   |
> > | ["colonedata3\"","coltwodata3","-35.2793885","149.1233503","134"]
>   |
> > | ["colonedata4","coltwodata4","-33.8724176","151.2067579",""]
>  |
> > | ["colonedata5","coltwodata5","","",""]
>  |
> > | ["This, col6 data","coltwodata6","-33.869732","151.2055553","351"]
>  |
> > | ["This, col7 data yes.","coltwodata7","1.2845045","103.8482739","80"]
> |
> > | ["Chifley","coltwodata5","","",""]
>  |
> > +-----------------------------------------------------------
> -------------+
> > 9 rows selected (0.502 seconds)
> > 0: jdbc:drill:schema=dfs.root> select columns[0] from `sample_data.csv`;
> > +-----------------------+
> > |        EXPR$0         |
> > +-----------------------+
> > | ï»¿Column1              |
> > | colonedata1           |
> > | colonedata2           |
> > | colonedata3"          |
> > | colonedata4           |
> > | colonedata5           |
> > | This, col6 data       |
> > | This, col7 data yes.  |
> > | Chifley               |
> > +-----------------------+
> > 9 rows selected (0.581 seconds)
> >
> > I was wondering if there is something else you're seeing because you're
> > running this on Windows. So I tried after converting the Unix format and
> > got the exact same result. Is this what you're getting?
> > I'm running this on a Linux machine.
> >
> > -----Original Message-----
> > From: Divya Gehlot [mailto:[email protected]]
> > Sent: Wednesday, August 02, 2017 6:45 PM
> > To: [email protected]
> > Subject: Re: delimiter in column values
> >
> > Hi ,
> >
> > I am using Drill 1.11  and with all the setting which you have mentioned
> > in plugin configurations .
> > As Kunal advised to surrounded  the column values with quotes which acts
> > as a string delimiter as one of my column value includes delimiter  same
> as
> > field delimiter,  still getting the same results i.e the first column
> > values are getting split into two columns as posted my earlier posts.
> > I am kind of wondering how to resolve the column split issue  , as the
> > received data set is from third party.
> >
> >
> > Appreciate the help!
> >
> > Thanks,
> > Divya
> >
> > On 3 August 2017 at 00:10, Paul Rogers <[email protected]> wrote:
> >
> > > Hi Divya,
> > >
> > > Drill follows the commonly-accepted practice for CSV files. The
> > > general rule is:
> > >
> > > 1. Column headers all on one line, comma separated. (Drill 1.11 has
> > > fixes in this area, so you’ll want to use that if you have any
> problems.
> > > 2. Each record on its own line, comma-separated, no leading or
> > > trailing spaces.
> > > 3. No need for quotes unless your value contains commas.
> > >
> > > You can customize behavior using the storage plugin config:
> > >
> > > * Choose delimiter (tab for TSV, | for PSV, etc.)
> > > * Choose to read or skip the header.
> > >
> > > You’ll want to make sure to use the “,” delimiter, read and use the
> > > header. The docs have an example of the required setup.
> > >
> > > Values are always read as text, so even your numbers will start as
> > > VarChar. You can convert to a numeric type in the query.
> > >
> > > Example using your data:
> > >
> > > Column1,Column2,Column3,Column4,Column5
> > > colonedata1,coltwodata1,-35.924476,138.5987123,
> > > colonedata2,coltwodata2,-27.4372536,153.0304583,137
> > >
> > > Note that if columns are empty (like your first row), you still should
> > > include the comma separators. (Another bug fix in 1.11 fixes this
> > > case;
> > > 1.10 and earlier have problems if trailing columns are missing.)
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > >
> > > On Aug 1, 2017, at 11:51 PM, Divya Gehlot <[email protected]<
> > mailto:
> > > [email protected]>> wrote:
> > >
> > > Hi,
> > > My column headers are in single line only i.e.
> > > Column1,Column2,Column3,Column4,Column5
> > > "colonedata1","coltwodata1","-35.924476","138.5987123",""
> > > "colonedata2","coltwodata2","-27.4372536","153.0304583","137"
> > > colonedata3","coltwodata3","-35.2793885","149.1233503","134"
> > > "colonedata4","coltwodata4","-33.8724176","151.2067579",""
> > >
> > > As you advised to put quotes as string delimeter for each column data
> > > and ran the select query.
> > > attaching the data file too .
> > >
> > > Appreciate the help !
> > >
> > > Thanks,
> > > Divya
> > >
> > > On 2 August 2017 at 12:37, Kunal Khatua <[email protected]<mailto:kkhat
> > > [email protected]>> wrote:
> > > So, the way you’ve shown your data is basically in this format:
> > >
> > > <List of column headers, one per line> <actual column data, one row
> > > per line>
> > >
> > > Unfortunately, I don't believe the text reader in Drill is that
> > > advanced as to interpret  the list of column headers across multiple
> > > lines, while the actual data is in a single line per row.
> > >
> > > Typically text data is in CSV (or other delimiters similar to the
> > > comma) and can have the first line representing a header.
> > >
> > > Also, I'm not sure if there was ever an option introduced to allow
> > > skipping of the initial set of lines within a text file being read.
> > >
> > >
> > > -----Original Message-----
> > > From: Divya Gehlot
> > > [mailto:[email protected]<mailto:divya.htconex@
> > > gmail.com>]
> > > Sent: Tuesday, August 01, 2017 7:06 PM
> > > To: [email protected]<mailto:[email protected]>
> > > Subject: Re: delimiter in column values
> > >
> > > For my sample dataset as you advised I surrounded with single columns
> > > also with quotes and the results are as below :
> > > col_Column1
> > > Column2
> > > Column3
> > > Column4
> > > Column5
> > > "Chifley" "coltwodata5" "" "" ""
> > > "colonedata1" "coltwodata1" "-35.924476" "138.5987123" ""
> > > "colonedata2" "coltwodata2" "-27.4372536" "153.0304583" "137"
> > > "colonedata4" "coltwodata4" "-33.8724176" "151.2067579" ""
> > > "colonedata5" "coltwodata5" "" "" ""
> > > "This col6 data" "coltwodata6" "-33.869732" "151.2055553"
> > > "This col7 data yes." "coltwodata7" "1.2845045" "103.8482739"
> > > colonedata3" "coltwodata3" "-35.2793885" "149.1233503" "134"
> > >
> > > Thanks,
> > > Divya
> > >
> > > On 1 August 2017 at 22:39, Kunal Khatua <[email protected]<mailto:kkhat
> > > [email protected]>> wrote:
> > >
> > > > I think you need quotes around the single word datasets as well,
> > > > because the quotes act as String delimiters and help in indicating
> > > > the start and end of a String.
> > > >
> > > > Is there a reason why the single word strings cannot be in quotes as
> > > well?
> > > >
> > > > -----Original Message-----
> > > > From: Divya Gehlot
> > > > [mailto:[email protected]<mailto:divya.htconex@
> > > gmail.com>]
> > > > Sent: Tuesday, August 01, 2017 3:04 AM
> > > > To: [email protected]<mailto:[email protected]>
> > > > Subject: delimiter in column values
> > > >
> > > > Hi,
> > > > I have data set which has delimeter in first column value when I
> > > > read the data set It provides the output below :
> > > >
> > > > col_Column1
> > > > Column2
> > > > Column3
> > > > Column4
> > > > Column5
> > > >
> > > > "This col6 data" coltwodata6 -33.869732 151.2055553 "This col7 data
> > yes."
> > > > coltwodata7 1.2845045 103.8482739 Chifley coltwodata5
> > > > colonedata1 coltwodata1 -35.924476 138.5987123
> > > > colonedata2 coltwodata2 -27.4372536 153.0304583 137
> > > > colonedata3 coltwodata3 -35.2793885 149.1233503 134
> > > > colonedata4 coltwodata4 -33.8724176 151.2067579
> > > > colonedata5 coltwodata5
> > > >
> > > >
> > > >
> > > > How can I read the column1 values as is without getting split into
> > > > two columns for instance the Column values should be
> > > > Column1
> > > > colonedata1,
> > > > colonedata2,
> > > > colonedata3,
> > > > colonedata4,
> > > > colonedata5,
> > > > "This, col6 data"
> > > > "This, col7 data"
> > > > Chifley,
> > > >
> > > > Appreciate the help !
> > > >
> > > > Thanks ,
> > > > Divya
> > > >
> > >
> > > <sample_data.csv>
> > >
> > >
> >
>

Re: delimiter in column values

Reply via email to