Hi Rajesh, sure. i'll give that setting a try. thanks. re: s3 vs. hdfs. indeed. I figured i'd eliminate the s3 angle when posting here given the msck repair table failed in both cases. but yeah my real use case is using s3.
ok. just tried that setting and got a slightly different stack trace but end result still was the NPE. its a strange one. Cheers, Stephen. 2016-07-15T03:13:08,102 DEBUG [main]: parse.ParseDriver (:()) - Parse Completed 2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Semantic Analysis Completed 2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 2016-07-15T03:13:08,119 INFO [main]: metadata.Hive (:()) - Dumping metastore api call timing information for : compilation phase 2016-07-15T03:13:08,119 DEBUG [main]: metadata.Hive (:()) - Total time spent in each metastore function (ms): {isCompatibleWith_(HiveConf, )=0, getTable_(String, String, )=16, flushCache_()=0} 2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Completed compiling command(queryId=ubuntu_20160715031308_bdf29227-ee7e-417f-834d-dae397d4eb9b); Time taken: 0.018 seconds 2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Executing command(queryId=ubuntu_20160715031308_bdf29227-ee7e-417f-834d-dae397d4eb9b): msck repair table foo 2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Starting task [Stage-0:DDL] in serial mode 2016-07-15T03:13:08,138 DEBUG [main]: ipc.Client (:()) - The ping interval is 60000 ms. 2016-07-15T03:13:08,138 DEBUG [main]: ipc.Client (:()) - Connecting to / 10.12.15.12:8020 2016-07-15T03:13:08,140 DEBUG [IPC Parameter Sending Thread #3]: ipc.Client (:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu sending #35 2016-07-15T03:13:08,138 DEBUG [IPC Client (1990733619) connection to / 10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu: starting, having connections 1 2016-07-15T03:13:08,140 DEBUG [IPC Client (1990733619) connection to / 10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu got value #35 2016-07-15T03:13:08,144 DEBUG [main]: ipc.ProtobufRpcEngine (:()) - Call: getFileInfo took 7ms 2016-07-15T03:13:08,144 DEBUG [main]: metadata.HiveMetaStoreChecker (:()) - *Not-using threaded version of MSCK-GetPaths* 2016-07-15T03:13:08,144 DEBUG [IPC Parameter Sending Thread #3]: ipc.Client (:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu sending #36 2016-07-15T03:13:08,145 DEBUG [IPC Client (1990733619) connection to / 10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu got value #36 2016-07-15T03:13:08,145 DEBUG [main]: ipc.ProtobufRpcEngine (:()) - Call: getListing took 1ms 2016-07-15T03:13:08,146 ERROR [main]: exec.DDLTask (:()) - java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:409) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109) at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) table ddl: CREATE EXTERNAL TABLE `foo`( `a` int) PARTITIONED BY ( `date_key` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs://10.12.15.12:8020/tmp/foo' TBLPROPERTIES ( 'transient_lastDdlTime'='1468469502') On Thu, Jul 14, 2016 at 6:55 PM, Rajesh Balamohan < rajesh.balamo...@gmail.com> wrote: > Hi Stephen, > > Can you try by turning off multi-threaded approach by setting > "hive.mv.files.thread=0"? You mentioned that your tables tables are in s3, > but the external table created was pointing to HDFS. Was that intentional? > > ~Rajesh.B > > On Fri, Jul 15, 2016 at 6:58 AM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> in the meantime given my tables are in s3 i've written a utility to do a >> 'aws s3 ls' on the bucket and folder in question, change the folder syntax >> to partition syntax and then issued my own 'alter table ... add partition' >> for each partition. >> >> >> so essentially it does what msck repair tables does but in a non-portable >> way. oh well. gotta do what ya gotta do. >> >> On Wed, Jul 13, 2016 at 9:29 PM, Stephen Sprague <sprag...@gmail.com> >> wrote: >> >>> hey guys, >>> i'm using hive version 2.1.0 and i can't seem to get msck repair table >>> to work. no matter what i try i get the 'ol NPE. I've set the log level >>> to 'DEBUG' but yet i still am not seeing any smoking gun. >>> >>> would anyone here have any pointers or suggestions to figure out what's >>> going wrong? >>> >>> thanks, >>> Stephen. >>> >>> >>> >>> hive> create external table foo (a int) partitioned by (date_key bigint) >>> location 'hdfs:/tmp/foo'; >>> OK >>> Time taken: 3.359 seconds >>> >>> hive> msck repair table foo; >>> FAILED: Execution Error, return code 1 from >>> org.apache.hadoop.hive.ql.exec.DDLTask >>> >>> >>> from the log... >>> >>> 2016-07-14T04:08:02,431 DEBUG [MSCK-GetPaths-1]: >>> httpclient.RestStorageService (:()) - Found 13 objects in one batch >>> 2016-07-14T04:08:02,431 DEBUG [MSCK-GetPaths-1]: >>> httpclient.RestStorageService (:()) - Found 0 common prefixes in one batch >>> 2016-07-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker >>> (:()) - java.lang.NullPointerException >>> 2016-07-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run >>> metacheck: >>> org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.NullPointerException >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230) >>> at >>> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109) >>> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814) >>> at >>> org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403) >>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) >>> at >>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) >>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) >>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) >>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) >>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >> >> > > > -- > ~Rajesh.B >