Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-10 Thread cheng xu


 On Sept. 9, 2014, 3:07 p.m., Brock Noland wrote:
  serde/pom.xml, line 73
  https://reviews.apache.org/r/25468/diff/1/?file=683466#file683466line73
 
  These should only be indented by two spaces, not four. Have you tried 
  submitting an MR job on a cluster with this patch? The reason I ask is that 
  I think the serde must be in here:
  
  https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563
  
  for it to be available to MR jobs.

I think it does not need to add the class alone because 
org.apache.hive:hive-serde was already included. BTW, I do a test as the 
following steps: 

(1) create a table with the csv format:
create table csv_table(a string, b string)
  row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
 separatorChar = ,,
   quoteChar = ',
   escapeChar= \
) stored as textfile;

(2) load data by:
load data local inpath /root/workspace/data overwrite into table csv_table;

(3) cat /root/workspace/data:
aa,bb
dd,cc

(4) select a from csv_table:
+-+--+
|  a  |
+-+--+
| aa  |
| dd  |
+-+--+

If I am missing anything, please help figure it out. Thanks!


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52723
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-10 Thread cheng xu


 On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote:
  Looks good apart from minor comments.
  
  Maybe add a test for the Serialization part?
  https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: 
  STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure 
  if it's feasible.

Good point! Why not file it in a new jira ticket as a future work?


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-10 Thread Lars Francke

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52832
---



serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java
https://reviews.apache.org/r/25468/#comment91999

Thanks for moving these out. Could you make them static?



serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java
https://reviews.apache.org/r/25468/#comment92000

no need to wrap this



serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java
https://reviews.apache.org/r/25468/#comment92001

missing spaces



serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java
https://reviews.apache.org/r/25468/#comment92003

There's a couple more puts in this file that can be replaced with 
setProperty


- Lars Francke


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-10 Thread Lars Francke


 On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 151
  https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line151
 
  I don't quite get this comment. Looking at the two CSVReader 
  constructors they seem to do the same in this case. From how I understand 
  it this if-statement is not needed. Same for the newWriter method.
  
  Maybe I'm missing something?
 
 cheng xu wrote:
 The CSVParser will do a check work if the separator, quotechar or escape 
 is the same. If so, it will throw an exception. For this reason, we have to 
 replace with CSVParser.DEFAULT_ESCAPE_CHARACTER('\') if the escape is 
 DEFAULT_ESCAPE_CHARACTER('').

Ahh! I see now. That's a bit weird indeed. Thanks for the explanation.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-10 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52883
---


This looks great! I think we are almost ready to commit. Can you add a new test 
(e.g. ql/src/test/queries/clientpositive/serde_csv.q) which runs a couple 
queries? e.g. ql/src/test/queries/clientpositive/serde_regex.q

- Brock Noland


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Lars Francke

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


Looks good apart from minor comments.

Maybe add a test for the Serialization part?
https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: 
STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if 
it's feasible.


serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91646

This comment doesn't add value so I suggest removing it.



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91647

* Constants is deprecated. Use serdeConstants instead
* Exceeds maximum line length (100 chars)



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91648

Unused



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91651

Missing spaces around operators



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91650

2 x Unnecessary this



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91653

Missing spaces around operators



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91669

I suggest moving these properties to Constants somewhere



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91649

Method declared final in final class



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91657

long line



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91656

Missing spaces



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91659

I don't quite get this comment. Looking at the two CSVReader constructors 
they seem to do the same in this case. From how I understand it this 
if-statement is not needed. Same for the newWriter method.

Maybe I'm missing something?



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91660

Missing @Override annotation



serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java
https://reviews.apache.org/r/25468/#comment91661

Can be private too



serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java
https://reviews.apache.org/r/25468/#comment91662

Properties.put should not be used. Use setProperty instead. Also Constants 
== deprecated


- Lars Francke


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52723
---


Great work!


serde/pom.xml
https://reviews.apache.org/r/25468/#comment91700

These should only be indented by two spaces, not four. Have you tried 
submitting an MR job on a cluster with this patch? The reason I ask is that I 
think the serde must be in here:

https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563

for it to be available to MR jobs.



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91701

I think we should call this OpenCSVSerde since it's based on OpenCSV and I 
believe we might see multiple implementations of CSVSerde.

I think we should extend AbstractSerde as that is what all the new Serdes 
are supposed to be doing.


- Brock Noland


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Lefty Leverenz


 On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 31
  https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line31
 
  This comment doesn't add value so I suggest removing it.

Or you could expand the comment.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread cheng xu


 On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 151
  https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line151
 
  I don't quite get this comment. Looking at the two CSVReader 
  constructors they seem to do the same in this case. From how I understand 
  it this if-statement is not needed. Same for the newWriter method.
  
  Maybe I'm missing something?

The CSVParser will do a check work if the separator, quotechar or escape is the 
same. If so, it will throw an exception. For this reason, we have to replace 
with CSVParser.DEFAULT_ESCAPE_CHARACTER('\') if the escape is 
DEFAULT_ESCAPE_CHARACTER('').


- cheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu
 




Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-08 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/
---

Review request for hive.


Bugs: HIVE-
https://issues.apache.org/jira/browse/HIVE-


Repository: hive-git


Description
---

HIVE-: add CSVSerde support


Diffs
-

  serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
  serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 

Diff: https://reviews.apache.org/r/25468/diff/


Testing
---

Unit test


Thanks,

cheng xu