Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 3:07 p.m., Brock Noland wrote: serde/pom.xml, line 73 https://reviews.apache.org/r/25468/diff/1/?file=683466#file683466line73 These should only be indented by two spaces, not four. Have you tried submitting an MR job on a cluster with this patch? The reason I ask is that I think the serde must be in here: https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563 for it to be available to MR jobs. I think it does not need to add the class alone because org.apache.hive:hive-serde was already included. BTW, I do a test as the following steps: (1) create a table with the csv format: create table csv_table(a string, b string) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES( separatorChar = ,, quoteChar = ', escapeChar= \ ) stored as textfile; (2) load data by: load data local inpath /root/workspace/data overwrite into table csv_table; (3) cat /root/workspace/data: aa,bb dd,cc (4) select a from csv_table: +-+--+ | a | +-+--+ | aa | | dd | +-+--+ If I am missing anything, please help figure it out. Thanks! - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52723 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: Looks good apart from minor comments. Maybe add a test for the Serialization part? https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if it's feasible. Good point! Why not file it in a new jira ticket as a future work? - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52832 --- serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment91999 Thanks for moving these out. Could you make them static? serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92000 no need to wrap this serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92001 missing spaces serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92003 There's a couple more puts in this file that can be replaced with setProperty - Lars Francke On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 151 https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line151 I don't quite get this comment. Looking at the two CSVReader constructors they seem to do the same in this case. From how I understand it this if-statement is not needed. Same for the newWriter method. Maybe I'm missing something? cheng xu wrote: The CSVParser will do a check work if the separator, quotechar or escape is the same. If so, it will throw an exception. For this reason, we have to replace with CSVParser.DEFAULT_ESCAPE_CHARACTER('\') if the escape is DEFAULT_ESCAPE_CHARACTER(''). Ahh! I see now. That's a bit weird indeed. Thanks for the explanation. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52883 --- This looks great! I think we are almost ready to commit. Can you add a new test (e.g. ql/src/test/queries/clientpositive/serde_csv.q) which runs a couple queries? e.g. ql/src/test/queries/clientpositive/serde_regex.q - Brock Noland On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- Looks good apart from minor comments. Maybe add a test for the Serialization part? https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if it's feasible. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91646 This comment doesn't add value so I suggest removing it. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91647 * Constants is deprecated. Use serdeConstants instead * Exceeds maximum line length (100 chars) serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91648 Unused serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91651 Missing spaces around operators serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91650 2 x Unnecessary this serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91653 Missing spaces around operators serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91669 I suggest moving these properties to Constants somewhere serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91649 Method declared final in final class serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91657 long line serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91656 Missing spaces serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91659 I don't quite get this comment. Looking at the two CSVReader constructors they seem to do the same in this case. From how I understand it this if-statement is not needed. Same for the newWriter method. Maybe I'm missing something? serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91660 Missing @Override annotation serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java https://reviews.apache.org/r/25468/#comment91661 Can be private too serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java https://reviews.apache.org/r/25468/#comment91662 Properties.put should not be used. Use setProperty instead. Also Constants == deprecated - Lars Francke On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52723 --- Great work! serde/pom.xml https://reviews.apache.org/r/25468/#comment91700 These should only be indented by two spaces, not four. Have you tried submitting an MR job on a cluster with this patch? The reason I ask is that I think the serde must be in here: https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563 for it to be available to MR jobs. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91701 I think we should call this OpenCSVSerde since it's based on OpenCSV and I believe we might see multiple implementations of CSVSerde. I think we should extend AbstractSerde as that is what all the new Serdes are supposed to be doing. - Brock Noland On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 31 https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line31 This comment doesn't add value so I suggest removing it. Or you could expand the comment. - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 151 https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line151 I don't quite get this comment. Looking at the two CSVReader constructors they seem to do the same in this case. From how I understand it this if-statement is not needed. Same for the newWriter method. Maybe I'm missing something? The CSVParser will do a check work if the separator, quotechar or escape is the same. If so, it will throw an exception. For this reason, we have to replace with CSVParser.DEFAULT_ESCAPE_CHARACTER('\') if the escape is DEFAULT_ESCAPE_CHARACTER(''). - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu