Yes this was what I did when writing the Hive part of the HadoopOffice / HadoopCryptoledger library. Be aware that Orc uses also some internal Hive APIs/ Extended the existing ones (eg Vectorizedserde)
I don’t have access to the Hive Wiki otherwise I could update it a little bit. > On 13. May 2018, at 17:08, 侯宗田 <zongtian...@icloud.com> wrote: > > Thank you, it makes the concept clearer to me. I think I need to look up the > source code for some details. >> 在 2018年5月13日,下午10:42,Jörn Franke <jornfra...@gmail.com> 写道: >> >> In detail you can check the source code, but a Serde needs to translate an >> object to a Hive object and vice versa. Usually this is very simple (simply >> passing the object or create A HiveDecimal etc). It also provides an >> ObjectInspector that basically describes an object in more detail (eg to be >> processed by an UDF). For example, it can tell you precision and scale of an >> objects. In case of ORC it describes also how a bunch of objects >> (vectorized) can be mapped to hive objects and the other way around. >> Furthermore, it provides statistics and provides means to deal with >> partitions as well as table properties (!=input/outputformat properties). >> Although it sounds complex, hive provides most of the functionality so >> implementing a serde is most of the times easy. >> >>> On 13. May 2018, at 16:34, 侯宗田 <zongtian...@icloud.com> wrote: >>> >>> Hello,everyone >>> I know the json serde turn fields in a row to a json format, csv serde turn >>> it to csv format with their serdeproperties. But I wonder what the orc >>> serde does when I choose to stored as orc file format. And why is there >>> still escaper, separator in orc serdeproperties. Also with RC Parquet. I >>> think they are just about how to stored and compressed with their input and >>> output format respectively, but I don’t know what their serde does, can >>> anyone give some hint? >