Re: Last Column showing blank in csv file

Rahul Raj Mon, 05 Dec 2016 03:09:07 -0800

Notes on line delimiter configuration

There must not be quotes around lineDelimiter - select * from
table(dfs.`my_table`(type=>'text', lineDelimiter=>'\r\n'))


*fieldDelimiter needs to be specified explicitly, else the columns are not
separated properly*

select columns[0] as `A`, CAST(columns[1] as INTEGER) as `B`,
CAST(columns[2] as INTEGER) as `c` from
table(dfs.myfs.`0/7b/data/*.csv`(type=>'text', lineDelimiter=>'\r\n'));
+--------------+-------+-------+
|      A       |   B   |   c   |
+--------------+-------+-------+
| xxxxxx,50,1  | null  | null  |
| xxxxx,60,2   | null  | null  |
| xxx,70,3     | null  | null  |
+--------------+-------+-------+

After speficying fieldDelimiter,

 select columns[0] as `A`, CAST(columns[1] as INTEGER) as `B`,
CAST(columns[2] as INTEGER) as `c` from
table(dfs.docubefs.`0/7b/data/*.csv`(type=>'text', fieldDelimiter=>',',
lineDelimiter=>'\r\n'));

+---------+-----+----+
|    A    |  B  | c  |
+---------+-----+----+
| xxxxxx  | 50  | 1  |
| xxxxx   | 60  | 2  |
| xxx     | 70  | 3  |
+---------+-----+----+

*Also it does not work on directories, we have to explicitly specify
folder/*.csv*

On creating new storage plugin with delimiter '\r\n', everything works as
expected(no need to specify csv wild card pattern)

Rahul

On Sun, Dec 4, 2016 at 3:09 AM, Abhishek Girish <[email protected]>
wrote:

> Thanks Khurram, this was the attribute I thought din't exist as I couldn't
> find it anywhere our docs.
>
> This can also be added to format config:
>
> "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "lineDelimiter": "\r\n",
>       "extractHeader": true,
>       "delimiter": ","
>     }
>
> Tried with 1.9.0 on Windows (using the above format config) and the queries
> work as expected.
>
> We should update [1] with this information.
>
> [1] https://drill.apache.org/docs/plugin-configuration-basic
> s/#list-of-attributes-and-definitions
>
> P.S. So there is no need to file a new JIRA for this.
>
> On Sat, Dec 3, 2016 at 4:12 AM, Khurram Faraaz <[email protected]>
> wrote:
>
> > I am not exactly sure this will work (I haven't tried it on Windows 10)
> > Setting lineDelimiter to '\r\n' should try to help in your case, can you
> > please try. For details look at DRILL-3149
> > <https://issues.apache.org/jira/browse/DRILL-3149>
> >
> > Example:
> >
> > select * from table(dfs.`my_table`(type=>'text',
> 'lineDelimiter'=>'\r\n'))
> >
> >
> > On Sat, Dec 3, 2016 at 2:28 AM, Kunal Khatua <[email protected]>
> wrote:
> >
> > > There is a dos2unix utility for Linux that allows you to substitute the
> > > multichar newline with the single char newline.
> > >
> > > For Windows, you can use either a similar util on SourceForge or the
> > > CygUtils (part of the Cygwin shell, I believe) to achieve the same
> > > conversion.
> > >
> > > In the meanwhile, like Abhishek suggested, please file a JIRA for this
> so
> > > that the issue can be tracked and fixed in a future release.
> > >
> > > Thanks
> > > Kunal
> > >
> > > On Wed 30-Nov-2016 8:24:00 PM, Abhishek Girish <
> > [email protected]>
> > > wrote:
> > > Well the only workaround I got to work is this: I saved the file in
> > > UNIX/OS_X Format and executed the query successfully (with no blanks in
> > the
> > > class column).
> > >
> > > Some text editors such as Notepad++ let you do this directly on
> Windows.
> > Or
> > > you could use a utility like iconv or dos2unix for larger files like
> Leon
> > > suggested.
> > >
> > > Also, please file an enhancement JIRA for supporting a new 'newline'
> > > attribute (which takes multi-byte values) for text formats. This will
> be
> > > helpful for files on windows which use CRLF (\r\n) as newline
> characters.
> > >
> > > On Wed, Nov 30, 2016 at 7:56 PM, Abhishek Girish
> > > wrote:
> > >
> > > > Okay, I could reproduce the issue on Windows.
> > > >
> > > > 0: jdbc:drill:zk=local> select A.sepalen, A.sepalwidth, A.patelen,
> > > > A.patelwidth,
> > > > A.class from dfs.`/drill/tmp.csv` as A;
> > > > +----------+-------------+----------+--------------+--------+
> > > > | sepalen | sepalwidth | patelen | patelwidth | class |
> > > > +----------+-------------+----------+--------------+--------+
> > > > | 5.1 | 3.5 | 1.4 | Iris-setosa | |
> > > > | 4.9 | 3 | 1.4 | Iris-setosa | |
> > > > | 4.7 | 3.2 | 1.3 | Iris-setosa | |
> > > > | 4.6 | 3.1 | 1.5 | Iris-setosa | |
> > > > | 5 | 3.6 | 1.4 | Iris-setosa | |
> > > > | 5.4 | 3.9 | 1.7 | Iris-setosa | |
> > > > | 4.6 | 3.4 | 1.4 | Iris-setosa | |
> > > > | 5 | 3.4 | 1.5 | Iris-setosa | |
> > > > | 4.4 | 2.9 | 1.4 | Iris-setosa | |
> > > > | 4.9 | 3.1 | 1.5 | Iris-setosa | |
> > > > | 5.4 | 3.7 | 1.5 | Iris-setosa | |
> > > > | 4.8 | 3.4 | 1.6 | Iris-setosa | |
> > > > +----------+-------------+----------+--------------+--------+
> > > > 12 rows selected (0.277 seconds)
> > > >
> > > > I'll get back if and once I have a workaround / solution.
> > > >
> > > > On Wed, Nov 30, 2016 at 7:36 PM, Abhishek Girish
> > > > [email protected]> wrote:
> > > >
> > > >> I forgot to mention that, the other query works too.
> > > >>
> > > >> > select A.sepalen, A.sepalwidth, A.patelen, A.patelwidth, A.class
> > from
> > > >> dfs.tmp.`tmp.csv` as A;
> > > >> +----------+-------------+----------+--------------+--------+
> > > >> | sepalen | sepalwidth | patelen | patelwidth | class |
> > > >> +----------+-------------+----------+--------------+--------+
> > > >> | 5.1 | 3.5 | 1.4 | Iris-setosa | 0.2 |
> > > >> | 4.9 | 3 | 1.4 | Iris-setosa | 0.2 |
> > > >> | 4.7 | 3.2 | 1.3 | Iris-setosa | 0.2 |
> > > >> | 4.6 | 3.1 | 1.5 | Iris-setosa | 0.2 |
> > > >> | 5 | 3.6 | 1.4 | Iris-setosa | 0.2 |
> > > >> | 5.4 | 3.9 | 1.7 | Iris-setosa | 0.4 |
> > > >> | 4.6 | 3.4 | 1.4 | Iris-setosa | 0.3 |
> > > >> | 5 | 3.4 | 1.5 | Iris-setosa | 0.2 |
> > > >> | 4.4 | 2.9 | 1.4 | Iris-setosa | 0.2 |
> > > >> | 4.9 | 3.1 | 1.5 | Iris-setosa | 0.1 |
> > > >> | 5.4 | 3.7 | 1.5 | Iris-setosa | 0.2 |
> > > >> | 4.8 | 3.4 | 1.6 | Iris-setosa | 0.2 |
> > > >> +----------+-------------+----------+--------------+--------+
> > > >> 12 rows selected (0.31 seconds)
> > > >>
> > > >>
> > > >> On Wed, Nov 30, 2016 at 8:15 AM, Leon Clayton
> > > >> wrote:
> > > >>
> > > >>> Are we sure there is no hidden characters on the end of the one or
> > more
> > > >>> lines. thinking ?
> > > >>>
> > > >>> try a dos2unix on the file to check this theory out.
> > > >>>
> > > >>>
> > > >>> > On 30 Nov 2016, at 10:45, Sanjiv Kumar wrote:
> > > >>> >
> > > >>> > Hello
> > > >>> > Yes you are right. select * from `tmp.csv`
> > > >>> > Is Working fine, but if select column then the last column data
> > > showing
> > > >>> > blank.
> > > >>> > Run this query:-
> > > >>> > select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> > > >>> A.class
> > > >>> > from dfs.tmp.`copydata.csv` as A;
> > > >>> >
> > > >>> > If you run this query you will get last column data as blank.
> > > >>> >
> > > >>> > On Wed, Nov 30, 2016 at 11:15 AM, Sanjiv Kumar
> > > >>> wrote:
> > > >>> >
> > > >>> >> I am using latest version 1.8 and in window 10 operating system.
> > > >>> >>
> > > >>> >> On Tue, Nov 29, 2016 at 11:40 AM, Sanjiv Kumar
> > > >>> >> wrote:
> > > >>> >>
> > > >>> >>> I already pasted the csv file. Just copy and save it as csv. I
> am
> > > >>> also
> > > >>> >>> attaching csv file. And the query is same as i mention above.
> > > >>> >>>
> > > >>> >>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`,
> A.`patelwidth`,
> > > >>> A.class
> > > >>> >>> from dfs.tmp.`copydata.csv` as A;
> > > >>> >>>
> > > >>> >>> On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv Kumar
> > > >>> >>> wrote:
> > > >>> >>>
> > > >>> >>>> Ya Its working but what if i am using :-
> > > >>> >>>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`,
> A.`patelwidth`,
> > > >>> >>>> A.class from dfs.tmp.`copydata.csv` as A;
> > > >>> >>>>
> > > >>> >>>> why my last column data showing blank?
> > > >>> >>>> And One more thing See my csv file:-
> > > >>> >>>>
> > > >>> >>>> sepalen,sepalwidth,patelen,patelwidth,class
> > > >>> >>>> 5.1,3.5,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.9,3,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.7,3.2,1.3,Iris-setosa,0.2
> > > >>> >>>> 4.6,3.1,1.5,Iris-setosa,0.2
> > > >>> >>>> 5,3.6,1.4,Iris-setosa,0.2
> > > >>> >>>> 5.4,3.9,1.7,Iris-setosa,0.4
> > > >>> >>>> 4.6,3.4,1.4,Iris-setosa,0.3
> > > >>> >>>> 5,3.4,1.5,Iris-setosa,0.2
> > > >>> >>>> 4.4,2.9,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.9,3.1,1.5,Iris-setosa,0.1
> > > >>> >>>> 5.4,3.7,1.5,Iris-setosa,0.2
> > > >>> >>>> 4.8,3.4,1.6,Iris-setosa,0.2
> > > >>> >>>>
> > > >>> >>>> This is my previous file.
> > > >>> >>>> Now if i add comma after class Check this new file.
> > > >>> >>>>
> > > >>> >>>> sepalen,sepalwidth,patelen,patelwidth,class,
> > > >>> >>>> 5.1,3.5,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.9,3,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.7,3.2,1.3,Iris-setosa,0.2
> > > >>> >>>> 4.6,3.1,1.5,Iris-setosa,0.2
> > > >>> >>>> 5,3.6,1.4,Iris-setosa,0.2
> > > >>> >>>> 5.4,3.9,1.7,Iris-setosa,0.4
> > > >>> >>>> 4.6,3.4,1.4,Iris-setosa,0.3
> > > >>> >>>> 5,3.4,1.5,Iris-setosa,0.2
> > > >>> >>>> 4.4,2.9,1.4,Iris-setosa,0.2
> > > >>> >>>> 4.9,3.1,1.5,Iris-setosa,0.1
> > > >>> >>>> 5.4,3.7,1.5,Iris-setosa,0.2
> > > >>> >>>> 4.8,3.4,1.6,Iris-setosa,0.2
> > > >>> >>>>
> > > >>> >>>>
> > > >>> >>>> And Fire this query:- select A.`sepalen`, A.`sepalwidth`,
> > > >>> A.`patelen`,
> > > >>> >>>> A.`patelwidth`, A.class from dfs.tmp.`copydata.csv` as A;
> > > >>> >>>>
> > > >>> >>>> then the output is show fine. but if the comma is not there in
> > > then
> > > >>> the
> > > >>> >>>> last column data showing blank.
> > > >>> >>>>
> > > >>> >>>> Is this a bug ??
> > > >>> >>>> ......................
> > > >>> >>>> Thanks & Regards
> > > >>> >>>> *Sanjiv Kumar*
> > > >>> >>>>
> > > >>> >>>
> > > >>> >>>
> > > >>> >>>
> > > >>> >>> --
> > > >>> >>> ......................
> > > >>> >>> Thanks & Regards
> > > >>> >>> *Sanjiv Kumar*
> > > >>> >>>
> > > >>> >>
> > > >>> >>
> > > >>> >>
> > > >>> >> --
> > > >>> >> ......................
> > > >>> >> Thanks & Regards
> > > >>> >> *Sanjiv Kumar*
> > > >>> >>
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > ......................
> > > >>> > Thanks & Regards
> > > >>> > *Sanjiv Kumar*
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

Re: Last Column showing blank in csv file

Reply via email to