[jira] [Commented] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception

2017-02-23 Thread Kenneth MacArthur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880356#comment-15880356
 ] 

Kenneth MacArthur commented on HIVE-14798:
--

We certainly saw this in Dataproc.

> MSCK REPAIR TABLE throws null pointer exception
> ---
>
> Key: HIVE-14798
> URL: https://issues.apache.org/jira/browse/HIVE-14798
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Anbu Cheeralan
>
> MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
> I have tested the same against external/internal tables created both in HDFS 
> and in Google Cloud.
> The error shown in beeline/sql client 
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
> Hive Logs:
> 2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: 
> metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException
> 2016-09-20T17:28:00,717 WARN  [HiveServer2-Background-Pool: Thread-92]: 
> exec.DDLTask (:()) - Failed to run metacheck: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
> at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418)
> ... 4 more
> Here are the steps to recreate this issue:
> use default;
> DROP TABLE IF EXISTS repairtable;
> CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING);
> MSCK REPAIR TABLE default.repairtable;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-18 Thread Kenneth MacArthur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585589#comment-15585589
 ] 

Kenneth MacArthur commented on HIVE-14679:
--

Commands like "more" choke on these null characters. View a CSV file with nulls 
instead of quotes and you'll see - the line is truncated. Even in "vi", you see 
some bizarre character that makes you think there's something wrong with the 
character set of the file.

It's all very confusing (and, more importantly, time-wasting) for the user.

I would say user convenience should trump implementation convenience. ;)

What do you say?

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14679) csv2/tsv2 output format disables quoting by default and it's difficult to enable

2016-10-13 Thread Kenneth MacArthur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571892#comment-15571892
 ] 

Kenneth MacArthur commented on HIVE-14679:
--

Section 2.6 of RFC 4180 says:
"Fields containing line breaks (CRLF), double quotes, and commas should be 
enclosed in double-quotes."

It seems strange, then, to disable quoting for the csv2 output format by 
default.

What's also strange is that when quoting is disabled, values are in fact still 
'quoted' with a null character (00), rather than no character at all (as 
described in [~ngangam]'s comment on HIVE-9788). This doesn't appear to be 
mentioned anywhere in RFC 4180.

May I suggest that:
- Quoting should be enabled by default for csv2, tsv2 and dsv.
- Disabling quoting should be possible using a beeline argument.
- Disabling quoting should not result in the output of a null character in 
place of a visible quote - there should simply be no quote character at all in 
this case.

> csv2/tsv2 output format disables quoting by default and it's difficult to 
> enable
> 
>
> Key: HIVE-14679
> URL: https://issues.apache.org/jira/browse/HIVE-14679
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Jianguo Tian
>
> Over in HIVE-9788 we made quoting optional for csv2/tsv2.
> However I see the following issues:
> * JIRA doc doesn't mention it's disabled by default, this should be there an 
> in the output of beeline help.
> * The JIRA says the property is {{--disableQuotingForSV}} but it's actually a 
> system property. We should not use a system property as it's non-standard so 
> extremely hard for users to set. For example I must do: {{env 
> HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline ...}}
> * The arg {{--disableQuotingForSV}} should be documented in beeline help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)