[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333091#comment-17333091 ] Robert Zhang commented on HIVE-14679: - Reading the comments the issue seems to have been resolved. But the status is still "open", why? > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian >Priority: Major > Attachments: HIVE-14769.1.patch, HIVE-14769.2 .patch > > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593556#comment-15593556 ] Jianguo Tian commented on HIVE-14679: - I have updated latest patch on the Review Board, [~brocknoland], [~kennethmac2000], [~ngangam], could you please help me review this latest patch? Looking forward to your precious opinion. Thanks a lot! > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > Attachments: HIVE-14769.1.patch > > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588106#comment-15588106 ] Jianguo Tian commented on HIVE-14679: - I have fixed this issue, you can check the code as below: {code:borderStyle=solid} unquotedCsvPreference = new CsvPreference.Builder('\u0020', separator, "").surroundingSpacesNeedQuotes(true).build(); {code} And accordingto the API of *CsvPreference.Builder*, method *surroundingSpacesNeedQuotes*'s parameter is "indicating whether spaces at the beginning or end of a cell should be ignored if they're not surrounded by quotes". > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587185#comment-15587185 ] Jianguo Tian commented on HIVE-14679: - Agree. It really looks confusing and strange with null character. Let me find a suitable solution. Thanks. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585589#comment-15585589 ] Kenneth MacArthur commented on HIVE-14679: -- Commands like "more" choke on these null characters. View a CSV file with nulls instead of quotes and you'll see - the line is truncated. Even in "vi", you see some bizarre character that makes you think there's something wrong with the character set of the file. It's all very confusing (and, more importantly, time-wasting) for the user. I would say user convenience should trump implementation convenience. ;) What do you say? > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584710#comment-15584710 ] Jianguo Tian commented on HIVE-14679: - What you said about "not affect the csv2/tsv2 formats" is correct, and that is exactly what I'm working forward to. Thanks for your opinion! Please wait for my patch which will be updated. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584702#comment-15584702 ] Jianguo Tian commented on HIVE-14679: - Hi, [~Kenneth MacArthur]. It looks difficult to implement "there should simply be no quote character at all when quoting is disabled". As we can see from the below code, the first parameter of *Builder* method is a character, but unfortunately we can't implement an empty character in java as *""* in String. {code:borderStyle=solid} unquotedCsvPreference = new CsvPreference.Builder('\0', separator, "").build(); {code} How do you think about this above? > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580939#comment-15580939 ] Jianguo Tian commented on HIVE-14679: - Thanks for your suggestions. I have finished the part of "Disabling quoting should be possible using a beeline argument". Next, I'll resolved your 3rd suggestion. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572121#comment-15572121 ] Ferdinand Xu commented on HIVE-14679: - bq. It optional for the original sv formats and not affect the csv2/tsv2 formats You can see the code for HIVE-9788 that by default the quoting is disabled. And it is about csv2/tsv2 as the release note said I think. But it has been a while and need double check. [~JonnyR], can you please confirm this? {noformat} + private boolean isQuotingDisabled() { +String quotingDisabledStr = System.getProperty(SeparatedValuesOutputFormat.DISABLE_QUOTING_FOR_SV); +if (quotingDisabledStr == null || quotingDisabledStr.isEmpty()) { + // default is disabling the double quoting for separated value + return true; +} +String parsedOptionStr = quotingDisabledStr.toLowerCase(); +if (parsedOptionStr.equals("false") || parsedOptionStr.equals("true")) { + return Boolean.valueOf(parsedOptionStr); +} else { + beeLine.error("System Property disable.quoting.for.sv is now " + parsedOptionStr + + " which only accepts boolean value"); + return true; +} + } {noformat} > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572042#comment-15572042 ] Naveen Gangam commented on HIVE-14679: -- Its my understanding that quoting is NOT optional for csv2/tsv2 formats. These formats were introduced specifically to rid the quotes around the column values. Since we could not just change the original csv/tsv formats to not wrap values in quotes for backward compatibility reasons, we had to introduce new output formats. Its been a while but I believe HIVE-9788 makes it optional for the original sv formats and not affect the csv2/tsv2 formats. [~Ferd] please correct me if I am wrong. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571892#comment-15571892 ] Kenneth MacArthur commented on HIVE-14679: -- Section 2.6 of RFC 4180 says: "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes." It seems strange, then, to disable quoting for the csv2 output format by default. What's also strange is that when quoting is disabled, values are in fact still 'quoted' with a null character (00), rather than no character at all (as described in [~ngangam]'s comment on HIVE-9788). This doesn't appear to be mentioned anywhere in RFC 4180. May I suggest that: - Quoting should be enabled by default for csv2, tsv2 and dsv. - Disabling quoting should be possible using a beeline argument. - Disabling quoting should not result in the output of a null character in place of a visible quote - there should simply be no quote character at all in this case. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454756#comment-15454756 ] Ferdinand Xu commented on HIVE-14679: - Thanks Brock, I will take a look. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)