Just to follow up… This appears to be a bug in the hive version of the code… fixed in the orc library… NOTE: There are two different libraries.
Documentation is a bit lax… but in terms of design… Its better to do the build completely in the reducer making the mapper code cleaner. > On Oct 19, 2016, at 11:00 AM, Michael Segel <msegel_had...@hotmail.com> wrote: > > Hi, > Since I am not on the ORC mailing list… and since the ORC java code is in the > hive APIs… this seems like a good place to start. ;-) > > > So… > > Ran in to a little problem… > > One of my developers was writing a map/reduce job to read records from a > source and after some filter, write the result set to an ORC file. > There’s an example of how to do this at: > http://hadoopcraft.blogspot.com/2014/07/generating-orc-files-using-mapreduce.html > > So far, so good. > But now here’s the problem…. Large source data, means many mappers and with > the filter, the number of output rows is a fraction in terms of size. > So we want to write to a single reducer. (An identity reducer) so that we get > only a single file. > > Here’s the snag. > > We were using the OrcSerde class to serialize the data and generate an Orc > row which we then wrote to the file. > > Looking at the source code for OrcSerde, OrcSerde.serialize() returns a > OrcSerdeRow. > see: > http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java > > OrcSerdeRow implements Writable and as we can see in the example code… for a > map only example… context.write(Text, Writable) works. > > However… if we attempt to make this in to a Map/Reduce job, we run in to a > problem during run time. the context.write() throws the following exception: > "Error: java.io.IOException: Type mismatch in value from map: expected > org.apache.hadoop.io.Writable, received > org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow” > > > The goal was to reduce the orc rows and then write out in the reducer. > > I’m curious as to why the context.write() fails? > The error is a bit cryptic since the OrcSerdeRow implements Writable… so the > error message doesn’t make sense. > > > Now the quick fix is to borrow the ArrayListWritable from giraph and create > the list of fields in to an ArrayListWritable and pass that to the reducer > which will then use that to generate the ORC file. > > Trying to figure out why the context.write() fails… when sending to reducer > while it works if its a mapside write. > > The documentation on the ORC site is … well… to be polite… lacking. ;-) > > I have some ideas why it doesn’t work, however I would like to confirm my > suspicions. > > Thx > > -Mike > > > B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�KK[XZ[�\�\�][��X��ܚX�PY���\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[�\�\�Z[Y���\X�K�ܙ�B --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org