Yes this was what I did when writing the Hive part of the HadoopOffice / 
HadoopCryptoledger library. Be aware that Orc uses also some internal Hive 
APIs/ Extended the existing ones (eg Vectorizedserde)

I don’t have access to the Hive Wiki otherwise I could update it a little bit.

> On 13. May 2018, at 17:08, 侯宗田 <zongtian...@icloud.com> wrote:
> 
> Thank you, it makes the concept clearer to me. I think I need to look up the 
> source code for some details.
>> 在 2018年5月13日,下午10:42,Jörn Franke <jornfra...@gmail.com> 写道:
>> 
>> In detail you can check the source code, but a Serde needs to translate an 
>> object to a Hive object and vice versa. Usually this is very simple (simply 
>> passing the object or create A HiveDecimal etc). It also provides an 
>> ObjectInspector that basically describes an object in more detail (eg to be 
>> processed by an UDF). For example, it can tell you precision and scale of an 
>> objects. In case of ORC it describes also how a bunch of objects 
>> (vectorized) can be mapped to hive objects and the other way around. 
>> Furthermore, it provides statistics and provides means to deal with 
>> partitions as well as table properties (!=input/outputformat properties).
>> Although it sounds complex, hive provides most of the functionality so 
>> implementing a serde is most of the times easy.
>> 
>>> On 13. May 2018, at 16:34, 侯宗田 <zongtian...@icloud.com> wrote:
>>> 
>>> Hello,everyone
>>> I know the json serde turn fields in a row to a json format, csv serde turn 
>>> it to csv format with their serdeproperties. But I wonder what the orc 
>>> serde does when I choose to stored as orc file format. And why is there 
>>> still escaper, separator in orc serdeproperties. Also with RC Parquet. I 
>>> think they are just about how to stored and compressed with their input and 
>>> output format respectively, but I don’t know what their serde does, can 
>>> anyone give some hint?  
> 

Reply via email to