Re: delimiter in column values

Divya Gehlot Wed, 02 Aug 2017 23:16:51 -0700

Hi ,
This is my output when run in sqlline on Windows Embedded mode

0: jdbc:drill:zk=local> select * from
`dfs`.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`;
+----------------+-------------------+----------------+----------------+----------------+
|  col_Column1   |      Column2      |    Column3     |    Column4     |
 Column5     |
+----------------+-------------------+----------------+----------------+----------------+
| "colonedata1"  | "coltwodata1"     | "-35.924476"   | "138.5987123"  | ""
            |
| "colonedata2"  | "coltwodata2"     | "-27.4372536"  | "153.0304583"  |
"137"          |
| "colonedata3"  | "coltwodata3"     | "-35.2793885"  | "149.1233503"  |
"134"          |
| "colonedata4"  | "coltwodata4"     | "-33.8724176"  | "151.2067579"  | ""
            |
| "colonedata5"  | "coltwodata5"     | ""             | ""             | ""
            |
| "This          |  col6 data"       | "coltwodata6"  | "-33.869732"   |
"151.2055553"  |
| "This          |  col7 data yes."  | "coltwodata7"  | "1.2845045"    |
"103.8482739"  |
| "Chifley"      | "coltwodata5"     | ""             | ""             | ""
            |
+----------------+-------------------+----------------+----------------+----------------+
8 rows selected (0.147 seconds)
0: jdbc:drill:zk=local> select `col_Column1` from
`dfs`.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`;
+----------------+
|  col_Column1   |
+----------------+
| "colonedata1"  |
| "colonedata2"  |
| "colonedata3"  |
| "colonedata4"  |
| "colonedata5"  |
| "This          |
| "This          |
| "Chifley"      |
+----------------+
8 rows selected (0.1 seconds)



The query returning the different results due to host operating system?


Thanks,
Divya

On 3 August 2017 at 12:45, Kunal Khatua <[email protected]> wrote:

> Based on your sample data, which contains this:
> Column1,Column2,Column3,Column4,Column5
> "colonedata1","coltwodata1","-35.924476","138.5987123",""
> "colonedata2","coltwodata2","-27.4372536","153.0304583","137"
> colonedata3","coltwodata3","-35.2793885","149.1233503","134"
> "colonedata4","coltwodata4","-33.8724176","151.2067579",""
> "colonedata5","coltwodata5","","",""
> "This, col6 data","coltwodata6","-33.869732","151.2055553","351"
> "This, col7 data yes.","coltwodata7","1.2845045","103.8482739","80"
> "Chifley","coltwodata5","","",""
>
> I got this and it looks like this...
>
>
> 0: jdbc:drill:schema=dfs.root> select * from `sample_data.csv`;
> +------------------------------------------------------------------------+
> |                                columns                                 |
> +------------------------------------------------------------------------+
> | ["ï»¿Column1","Column2","Column3","Column4","Column5"]
>  |
> | ["colonedata1","coltwodata1","-35.924476","138.5987123",""]            |
> | ["colonedata2","coltwodata2","-27.4372536","153.0304583","137"]        |
> | ["colonedata3\"","coltwodata3","-35.2793885","149.1233503","134"]      |
> | ["colonedata4","coltwodata4","-33.8724176","151.2067579",""]           |
> | ["colonedata5","coltwodata5","","",""]                                 |
> | ["This, col6 data","coltwodata6","-33.869732","151.2055553","351"]     |
> | ["This, col7 data yes.","coltwodata7","1.2845045","103.8482739","80"]  |
> | ["Chifley","coltwodata5","","",""]                                     |
> +------------------------------------------------------------------------+
> 9 rows selected (0.502 seconds)
> 0: jdbc:drill:schema=dfs.root> select columns[0] from `sample_data.csv`;
> +-----------------------+
> |        EXPR$0         |
> +-----------------------+
> | ï»¿Column1              |
> | colonedata1           |
> | colonedata2           |
> | colonedata3"          |
> | colonedata4           |
> | colonedata5           |
> | This, col6 data       |
> | This, col7 data yes.  |
> | Chifley               |
> +-----------------------+
> 9 rows selected (0.581 seconds)
>
> I was wondering if there is something else you're seeing because you're
> running this on Windows. So I tried after converting the Unix format and
> got the exact same result. Is this what you're getting?
> I'm running this on a Linux machine.
>
> -----Original Message-----
> From: Divya Gehlot [mailto:[email protected]]
> Sent: Wednesday, August 02, 2017 6:45 PM
> To: [email protected]
> Subject: Re: delimiter in column values
>
> Hi ,
>
> I am using Drill 1.11  and with all the setting which you have mentioned
> in plugin configurations .
> As Kunal advised to surrounded  the column values with quotes which acts
> as a string delimiter as one of my column value includes delimiter  same as
> field delimiter,  still getting the same results i.e the first column
> values are getting split into two columns as posted my earlier posts.
> I am kind of wondering how to resolve the column split issue  , as the
> received data set is from third party.
>
>
> Appreciate the help!
>
> Thanks,
> Divya
>
> On 3 August 2017 at 00:10, Paul Rogers <[email protected]> wrote:
>
> > Hi Divya,
> >
> > Drill follows the commonly-accepted practice for CSV files. The
> > general rule is:
> >
> > 1. Column headers all on one line, comma separated. (Drill 1.11 has
> > fixes in this area, so you’ll want to use that if you have any problems.
> > 2. Each record on its own line, comma-separated, no leading or
> > trailing spaces.
> > 3. No need for quotes unless your value contains commas.
> >
> > You can customize behavior using the storage plugin config:
> >
> > * Choose delimiter (tab for TSV, | for PSV, etc.)
> > * Choose to read or skip the header.
> >
> > You’ll want to make sure to use the “,” delimiter, read and use the
> > header. The docs have an example of the required setup.
> >
> > Values are always read as text, so even your numbers will start as
> > VarChar. You can convert to a numeric type in the query.
> >
> > Example using your data:
> >
> > Column1,Column2,Column3,Column4,Column5
> > colonedata1,coltwodata1,-35.924476,138.5987123,
> > colonedata2,coltwodata2,-27.4372536,153.0304583,137
> >
> > Note that if columns are empty (like your first row), you still should
> > include the comma separators. (Another bug fix in 1.11 fixes this
> > case;
> > 1.10 and earlier have problems if trailing columns are missing.)
> >
> > Thanks,
> >
> > - Paul
> >
> >
> > On Aug 1, 2017, at 11:51 PM, Divya Gehlot <[email protected]<
> mailto:
> > [email protected]>> wrote:
> >
> > Hi,
> > My column headers are in single line only i.e.
> > Column1,Column2,Column3,Column4,Column5
> > "colonedata1","coltwodata1","-35.924476","138.5987123",""
> > "colonedata2","coltwodata2","-27.4372536","153.0304583","137"
> > colonedata3","coltwodata3","-35.2793885","149.1233503","134"
> > "colonedata4","coltwodata4","-33.8724176","151.2067579",""
> >
> > As you advised to put quotes as string delimeter for each column data
> > and ran the select query.
> > attaching the data file too .
> >
> > Appreciate the help !
> >
> > Thanks,
> > Divya
> >
> > On 2 August 2017 at 12:37, Kunal Khatua <[email protected]<mailto:kkhat
> > [email protected]>> wrote:
> > So, the way you’ve shown your data is basically in this format:
> >
> > <List of column headers, one per line> <actual column data, one row
> > per line>
> >
> > Unfortunately, I don't believe the text reader in Drill is that
> > advanced as to interpret  the list of column headers across multiple
> > lines, while the actual data is in a single line per row.
> >
> > Typically text data is in CSV (or other delimiters similar to the
> > comma) and can have the first line representing a header.
> >
> > Also, I'm not sure if there was ever an option introduced to allow
> > skipping of the initial set of lines within a text file being read.
> >
> >
> > -----Original Message-----
> > From: Divya Gehlot
> > [mailto:[email protected]<mailto:divya.htconex@
> > gmail.com>]
> > Sent: Tuesday, August 01, 2017 7:06 PM
> > To: [email protected]<mailto:[email protected]>
> > Subject: Re: delimiter in column values
> >
> > For my sample dataset as you advised I surrounded with single columns
> > also with quotes and the results are as below :
> > col_Column1
> > Column2
> > Column3
> > Column4
> > Column5
> > "Chifley" "coltwodata5" "" "" ""
> > "colonedata1" "coltwodata1" "-35.924476" "138.5987123" ""
> > "colonedata2" "coltwodata2" "-27.4372536" "153.0304583" "137"
> > "colonedata4" "coltwodata4" "-33.8724176" "151.2067579" ""
> > "colonedata5" "coltwodata5" "" "" ""
> > "This col6 data" "coltwodata6" "-33.869732" "151.2055553"
> > "This col7 data yes." "coltwodata7" "1.2845045" "103.8482739"
> > colonedata3" "coltwodata3" "-35.2793885" "149.1233503" "134"
> >
> > Thanks,
> > Divya
> >
> > On 1 August 2017 at 22:39, Kunal Khatua <[email protected]<mailto:kkhat
> > [email protected]>> wrote:
> >
> > > I think you need quotes around the single word datasets as well,
> > > because the quotes act as String delimiters and help in indicating
> > > the start and end of a String.
> > >
> > > Is there a reason why the single word strings cannot be in quotes as
> > well?
> > >
> > > -----Original Message-----
> > > From: Divya Gehlot
> > > [mailto:[email protected]<mailto:divya.htconex@
> > gmail.com>]
> > > Sent: Tuesday, August 01, 2017 3:04 AM
> > > To: [email protected]<mailto:[email protected]>
> > > Subject: delimiter in column values
> > >
> > > Hi,
> > > I have data set which has delimeter in first column value when I
> > > read the data set It provides the output below :
> > >
> > > col_Column1
> > > Column2
> > > Column3
> > > Column4
> > > Column5
> > >
> > > "This col6 data" coltwodata6 -33.869732 151.2055553 "This col7 data
> yes."
> > > coltwodata7 1.2845045 103.8482739 Chifley coltwodata5
> > > colonedata1 coltwodata1 -35.924476 138.5987123
> > > colonedata2 coltwodata2 -27.4372536 153.0304583 137
> > > colonedata3 coltwodata3 -35.2793885 149.1233503 134
> > > colonedata4 coltwodata4 -33.8724176 151.2067579
> > > colonedata5 coltwodata5
> > >
> > >
> > >
> > > How can I read the column1 values as is without getting split into
> > > two columns for instance the Column values should be
> > > Column1
> > > colonedata1,
> > > colonedata2,
> > > colonedata3,
> > > colonedata4,
> > > colonedata5,
> > > "This, col6 data"
> > > "This, col7 data"
> > > Chifley,
> > >
> > > Appreciate the help !
> > >
> > > Thanks ,
> > > Divya
> > >
> >
> > <sample_data.csv>
> >
> >
>

Re: delimiter in column values

Reply via email to