Hi Keith, Were you able to resolve this? Or, is this still an issue?
Thanks. Shreepadma On Tue, May 28, 2013 at 6:02 AM, Keith Wright <kwri...@nanigans.com> wrote: > Hi all, > > This is my first post to the hive mailing list and I was hoping to get > some help with the exception I am getting below. I am using CDH4.2 (hive > 0.10.0) to query snappy compressed, Sequence files that are built using > Flume (relevant portion of flume conf below as well). Note that I'm using > a SequenceFile as it was needed for Impala integration. Has anyone see > this error before? Couple of additional points to help diagnose: > > 1. Queries seem to be able to process some mappers without issues. > In fact, I can do a simple select * from <table> limit 10 without issue. > However if I make the limit high enough, it will eventually fail presumably > as it needs to read in a file that has this issue. > 2. The same query runs in Impala without errors but appears to "skip" > some data. I can confirm that the missing data is present via a custom > map/reduce job > 3. I am able to write a map/reduce job that reads through all of the > same data without issue and have been unable to identify data corruption > 4. This is a partitioned table and queries fail that touch ANY of the > partitions (and there are hundreds) so this does not appear to be a > sporadic, data integrity problem (table definition below) > 5. We are using '\001' as our field separator. We are capturing other > data also with SequenceFile, snappy but using '|' as our delimiter and we > do not have any issues querying there. Although we are using a different > flume source. > > My next step for debugging was to disable snappy compression and see if I > could query the data. If not, switch from SequenceFile to simple text. > > I appreciate the help!!! > > CREATE EXTERNAL TABLE ORGANIC_EVENTS ( > event_id BIGINT, > app_id INT, > user_id BIGINT, > type STRING, > name STRING, > value STRING, > extra STRING, > ip_address STRING, > user_agent STRING, > referrer STRING, > event_time BIGINT, > install_flag TINYINT, > first_for_user TINYINT, > cookie STRING, > year int, > month int, > day int, > hour int) PARTITIONED BY (year int, month int, day int,hour int) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' > COLLECTION ITEMS TERMINATED BY '\002' > MAP KEYS TERMINATED BY '\003' > STORED AS SEQUENCEFILE > LOCATION '/events/organic'; > > agent.sinks.exhaustHDFSSink3.type = HDFS > agent.sinks.exhaustHDFSSink3.channel = exhaustFileChannel > agent.sinks.exhaustHDFSSink3.hdfs.path = hdfs://lxscdh001.nanigans.com:8020 > %{path} > agent.sinks.exhaustHDFSSink3.hdfs.filePrefix = 3.%{hostname} > agent.sinks.exhaustHDFSSink3.hdfs.rollInterval = 0 > agent.sinks.exhaustHDFSSink3.hdfs.idleTimeout = 600 > agent.sinks.exhaustHDFSSink3.hdfs.rollSize = 0 > agent.sinks.exhaustHDFSSink3.hdfs.rollCount = 0 > agent.sinks.exhaustHDFSSink3.hdfs.batchSize = 5000 > agent.sinks.exhaustHDFSSink3.hdfs.txnEventMax = 5000 > agent.sinks.exhaustHDFSSink3.hdfs.fileType = SequenceFile > agent.sinks.exhaustHDFSSink3.hdfs.maxOpenFiles = 100 > agent.sinks.exhaustHDFSSink3.hdfs.codeC = snappy > agent.sinks.exhaustHDFSSink.3hdfs.writeFormat = Text > > 2013-05-28 12:29:00,919 WARN org.apache.hadoop.mapred.Child: Error running > child java.io.IOException: java.io.IOException: > java.lang.IndexOutOfBoundsException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:330) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:246) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:216) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:201) > at > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) > at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) > at > org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at > java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.io.IOException: > java.lang.IndexOutOfBoundsException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:328) > ... 11 more > Caused by: java.lang.IndexOutOfBoundsException > at > java.io.DataInputStream.readFully(DataInputStream.java:175) > at > org.apache.hadoop.io.Text.readFields(Text.java:284) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) > at > org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2180) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2164) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) > ... 15 more > >