[jira] [Commented] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880356#comment-15880356 ] Kenneth MacArthur commented on HIVE-14798: -- We certainly saw this in Dataproc. > MSCK REPAIR TABLE throws null pointer exception > --- > > Key: HIVE-14798 > URL: https://issues.apache.org/jira/browse/HIVE-14798 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.0 >Reporter: Anbu Cheeralan > > MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 > I have tested the same against external/internal tables created both in HDFS > and in Google Cloud. > The error shown in beeline/sql client > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) > Hive Logs: > 2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: > metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException > 2016-09-20T17:28:00,717 WARN [HiveServer2-Background-Pool: Thread-92]: > exec.DDLTask (:()) - Failed to run metacheck: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109) > at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) > at > org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) > at > org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) > at > java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418) > ... 4 more > Here are the steps to recreate this issue: > use default; > DROP TABLE IF EXISTS repairtable; > CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING); > MSCK REPAIR TABLE default.repairtable; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585589#comment-15585589 ] Kenneth MacArthur commented on HIVE-14679: -- Commands like "more" choke on these null characters. View a CSV file with nulls instead of quotes and you'll see - the line is truncated. Even in "vi", you see some bizarre character that makes you think there's something wrong with the character set of the file. It's all very confusing (and, more importantly, time-wasting) for the user. I would say user convenience should trump implementation convenience. ;) What do you say? > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable
[ https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571892#comment-15571892 ] Kenneth MacArthur commented on HIVE-14679: -- Section 2.6 of RFC 4180 says: "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes." It seems strange, then, to disable quoting for the csv2 output format by default. What's also strange is that when quoting is disabled, values are in fact still 'quoted' with a null character (00), rather than no character at all (as described in [~ngangam]'s comment on HIVE-9788). This doesn't appear to be mentioned anywhere in RFC 4180. May I suggest that: - Quoting should be enabled by default for csv2, tsv2 and dsv. - Disabling quoting should be possible using a beeline argument. - Disabling quoting should not result in the output of a null character in place of a visible quote - there should simply be no quote character at all in this case. > csv2/tsv2 output format disables quoting by default and it's difficult to > enable > > > Key: HIVE-14679 > URL: https://issues.apache.org/jira/browse/HIVE-14679 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Jianguo Tian > > Over in HIVE-9788 we made quoting optional for csv2/tsv2. > However I see the following issues: > * JIRA doc doesn't mention it's disabled by default, this should be there an > in the output of beeline help. > * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a > system property. We should not use a system property as it's non-standard so > extremely hard for users to set. For example I must do: {{env > HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}} > * The arg {{--disableQuotingForSV}} should be documented in beeline help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)