Hello all, Anybody looked into below topic. Please reply your views.
Thanks Ranjith On Fri, Sep 28, 2012 at 1:57 PM, Ranjithkumar Gampa <[email protected]>wrote: > Hi, > > we are using FSDataOutputStream.writeBytes() from map/reduce to write to > Hive table path directly instead of context.write() which is working fine > and so far no problems with this approach. > we make sure the file names are distinct by appending taskAttemptId to > them and we use speculative execution 'false' to ensure map/reducer won't > work on same data and create inconsistency in writing data to HDFS, we went > for this approach for below reasons, please let's know if any disadvantages > with it. > > 1) To avoid cleanup of _SUCCESS and _LOG files created by reducer/mapper > output which Hive may not like. > 2) To write some records from mappers which doesn't need to participate in > Reducer logic, so can save some sort and shuffle process. We are exploring > on Multi Output format, but still above point need to be taken care I think. > 3) We have some special characters in data, on which we are doing String > manipulation using 'ISO-8859-1' encoding, using Text class in > context.write() is not preserving these characters due to default utf-8 > encoding used by it. > > Kindly please share if my understanding is not correct and there are some > other ways of taking care above three points, I am happy to hear and learn, > our project uses mix of Hadoop MR and Hive. > > Thanks in advance. > > Regards, > Ranjith > >
