Here is the stack trace... Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:2072) at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:2139) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2214) at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:109) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:84) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) ... 15 more
On Tue, Jul 22, 2014 at 6:14 PM, Edward Capriolo <[email protected]> wrote: > Currently using: > > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-hdfs</artifactId> > <version>2.3.0</version> > </dependency> > > > I have this piece of code that does. > > writer = SequenceFile.createWriter(fs, conf, p, Text.class, Text.class, > CompressionType.BLOCK, codec); > > Then I have a piece of code like this... > > public static final long SYNC_EVERY_LINES = 1000; > if (meta.getLinesWritten() % SYNC_EVERY_LINES == 0){ > meta.getWriter().sync(); > } > > > And I commonly see: > > [ecapriolo@staging-hadoop-cdh-67-14 ~]$ hadoop dfs -ls /user/beacon/ > 2014072117 > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > > Found 12 items > -rw-r--r-- 3 service-igor supergroup 1065682 2014-07-21 17:50 > /user/beacon/2014072117/0bb6cd71-70ac-405a-a8b7-b8caf9af8da1 > -rw-r--r-- 3 service-igor supergroup 1029041 2014-07-21 17:40 > /user/beacon/2014072117/1b0ef6b3-bd51-4100-9d4b-1cecdd565f93 > -rw-r--r-- 3 service-igor supergroup 1002096 2014-07-21 17:10 > /user/beacon/2014072117/34e2acb4-2054-44df-bbf7-a4ce7f1e5d1b > -rw-r--r-- 3 service-igor supergroup 1028450 2014-07-21 17:30 > /user/beacon/2014072117/41c7aa62-d27f-4d53-bed8-df2fb5803c92 > -rw-r--r-- 3 service-igor supergroup 0 2014-07-21 17:50 > /user/beacon/2014072117/5450f246-7623-4bbd-8c97-8176a0c30351 > -rw-r--r-- 3 service-igor supergroup 1084873 2014-07-21 17:30 > /user/beacon/2014072117/8b36fbca-6f5b-48a3-be3c-6df6254c3db2 > -rw-r--r-- 3 service-igor supergroup 1043108 2014-07-21 17:20 > /user/beacon/2014072117/949da11a-247b-4992-b13a-5e6ce7e51e9b > -rw-r--r-- 3 service-igor supergroup 986866 2014-07-21 17:10 > /user/beacon/2014072117/979bba76-4d2e-423f-92f6-031bc41f6fbd > -rw-r--r-- 3 service-igor supergroup 0 2014-07-21 17:50 > /user/beacon/2014072117/b76db189-054f-4dac-84a4-a65f39a6c1a9 > -rw-r--r-- 3 service-igor supergroup 1040931 2014-07-21 17:50 > /user/beacon/2014072117/bba6a677-226c-4982-8fb2-4b136108baf1 > -rw-r--r-- 3 service-igor supergroup 1012137 2014-07-21 17:40 > /user/beacon/2014072117/be940202-f085-45bb-ac84-51ece2e1ba47 > -rw-r--r-- 3 service-igor supergroup 1028467 2014-07-21 17:20 > /user/beacon/2014072117/c336e0c8-76e7-40e7-98e2-9f529f25577b > > Sometimes even though they show as 0 bytes you can read data from them. > Sometimes it blows up with a stack trace I have lost. > > > On Tue, Jul 22, 2014 at 5:45 PM, Bertrand Dechoux <[email protected]> > wrote: > >> I looked at the source by curiosity, for the latest version (2.4), the >> header is flushed during the writer creation. Of course, key/value classes >> are provided. By 0-bytes, you really mean even without the header? Or 0 >> bytes of payload? >> >> >> On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux <[email protected]> >> wrote: >> >>> The header is expected to have the full name of the key class and value >>> class so if it is only detected with the first record (?) indeed the file >>> can not respect its own format. >>> >>> I haven't tried it but LazyOutputFormat should solve your problem. >>> >>> https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/lib/LazyOutputFormat.html >>> >>> Regards >>> >>> Bertrand Dechoux >>> >>> >>> Bertrand Dechoux >>> >>> >>> On Tue, Jul 22, 2014 at 10:39 PM, Edward Capriolo <[email protected] >>> > wrote: >>> >>>> I have two processes. One that writes sequence files directly to hdfs, >>>> the other that is a hive table that reads these files. >>>> >>>> All works well with the exception that I am only flushing the files >>>> periodically. SequenceFile input format gets angry when it encounters >>>> 0-bytes seq files. >>>> >>>> I was considering flush and sync on first record write. Also was >>>> thinking should just be able to hack sequence file input format to skip 0 >>>> byte files and not throw exception on readFully() which it sometimes does. >>>> >>>> Anyone ever tackled this? >>>> >>> >>> >> >
