Unfortunately, I сan't provide more information, this file I got from our tester and he already droped table.
On Thu, Aug 4, 2016 at 9:16 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > In case of streaming, when a transaction is open orc file is not closed > and hence may not be flushed completely. Did the transaction commit > successfully? Or was there any exception thrown during writes/commit? > > Thanks > Prasanth > > On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko <f1she...@gmail.com> wrote: > > Hello, I've got a malformed ORC file in my Hive table. File was created by > Hive Streaming API and I have no idea under what circumstances it > became corrupted. > > File on google drive: link > <https://drive.google.com/file/d/0ByB92PAoAkrKeFFZRUN4WWVQY1U/view?usp=sharing> > > Exception message when trying to perform select from table: > > ERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1468498236400_1106_6_00, > diagnostics=[Task failed, taskId=task_1468498236400_1106_6_00_000000, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: > java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: > Malformed ORC file hdfs://sorm-master01.msk.mts. > ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/ > directory_number_last_digit=5/delta_71700156_71700255/bucket_00000. > Invalid postscript length 0 > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor. > initializeAndRunProcessor(TezProcessor.java:173) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run( > TezProcessor.java:139) > at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run( > LogicalIOProcessorRuntimeTask.java:344) > at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run( > TezTaskRunner.java:181) > at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run( > TezTaskRunner.java:172) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1657) > at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable. > callInternal(TezTaskRunner.java:172) > at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable. > callInternal(TezTaskRunner.java:168) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file > hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/ > pstn_connections/dt=20160711/directory_number_last_digit=5/ > delta_71700156_71700255/bucket_00000. Invalid postscript length 0 > at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$ > TezGroupedSplitsRecordReader.initNextRecordReader( > TezGroupedSplitsInputFormat.java:196) > at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$ > TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at org.apache.tez.mapreduce.lib.MRReaderMapred.next( > MRReaderMapred.java:113) > at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource. > pushRecord(MapRecordSource.java:61) > at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor. > run(MapRecordProcessor.java:326) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor. > initializeAndRunProcessor(TezProcessor.java:150) > ... 14 more > Caused by: java.io.IOException: > org.apache.hadoop.hive.ql.io.FileFormatException: > Malformed ORC file hdfs://sorm-master01.msk.mts. > ru:8020/apps/hive/warehouse/pstn_connections/dt=20160711/ > directory_number_last_digit=5/delta_71700156_71700255/bucket_00000. > Invalid postscript length 0 > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain. > handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil. > handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader( > HiveInputFormat.java:251) > at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$ > TezGroupedSplitsRecordReader.initNextRecordReader( > TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed > ORC file hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/ > pstn_connections/dt=20160711/directory_number_last_digit=5/ > delta_71700156_71700255/bucket_00000. Invalid postscript length 0 > at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl. > java:236) > at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter( > ReaderImpl.java:376) > at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:317) > at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:238) > at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader( > OrcInputFormat.java:1259) > at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader( > OrcInputFormat.java:1151) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader( > HiveInputFormat.java:249) > ... 20 more > > Does anyone encountered such a situation? > > >