Re: Unsubscribe

2018-05-13 Thread Lefty Leverenz
Rahul, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
.

Thanks.

-- Lefty


On Mon, May 7, 2018 at 4:22 PM Rahul Channe  wrote:

>


Re: Unsubscribe

2018-05-13 Thread Lefty Leverenz
Beth, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
.

Thanks.

-- Lefty


On Mon, May 7, 2018 at 4:52 PM Beth Lee  wrote:

>
>


Re: Unsubscribe

2018-05-13 Thread Lefty Leverenz
Roger, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
.

Thanks.

-- Lefty


On Tue, May 8, 2018 at 12:49 AM Roger Baatjes  wrote:

>
>


Re: Unsubscribe

2018-05-13 Thread Lefty Leverenz
Dheena, to unsubscribe please send a message to
user-unsubscr...@hive.apache.org as described here:  Mailing Lists
.

Thanks.

-- Lefty


On Wed, May 9, 2018 at 2:19 AM Dheena Dhayalan  wrote:

>
>


Re: What does the ORC SERDE do

2018-05-13 Thread Lefty Leverenz
Jörn, please do update the wiki, we really need better SerDe documentation.

Getting write access is easy:

About This Wiki -- How to get permission to edit



-- Lefty


On Sun, May 13, 2018 at 10:18 AM Jörn Franke  wrote:

> You have in AbstractSerde a method to return very basic stats related to
> your fileformat (mostly size of the data and number of rows etc):
>
>
> https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java
>
>  In method initialize of your Serde you can retrieve properties related to
> partitions and include this information in your file format, if needed (you
> don’t need to create folders etc for partitions - this is done by Hive)
>
>
> On 13. May 2018, at 19:09, Elliot West  wrote:
>
> Hi Jörn,
>
> I’m curious to know how the SerDe framework provides the means to deal
> with partitions, table properties, and statistics? I was under the
> impression that these were in the domain of the metastore and I’ve not
> found anything in the SerDe interface related to these. I would appreciate
> if you could point me in the direction of anything I’ve missed.
>
> Thanks,
>
> Elliot.
>
> On Sun, 13 May 2018 at 15:42, Jörn Franke  wrote:
>
>> In detail you can check the source code, but a Serde needs to translate
>> an object to a Hive object and vice versa. Usually this is very simple
>> (simply passing the object or create A HiveDecimal etc). It also provides
>> an ObjectInspector that basically describes an object in more detail (eg to
>> be processed by an UDF). For example, it can tell you precision and scale
>> of an objects. In case of ORC it describes also how a bunch of objects
>> (vectorized) can be mapped to hive objects and the other way around.
>> Furthermore, it provides statistics and provides means to deal with
>> partitions as well as table properties (!=input/outputformat properties).
>> Although it sounds complex, hive provides most of the functionality so
>> implementing a serde is most of the times easy.
>>
>> > On 13. May 2018, at 16:34, 侯宗田  wrote:
>> >
>> > Hello,everyone
>> >   I know the json serde turn fields in a row to a json format, csv
>> serde turn it to csv format with their serdeproperties. But I wonder what
>> the orc serde does when I choose to stored as orc file format. And why is
>> there still escaper, separator in orc serdeproperties. Also with RC
>> Parquet. I think they are just about how to stored and compressed with
>> their input and output format respectively, but I don’t know what their
>> serde does, can anyone give some hint?
>>
>


Re: What does the ORC SERDE do

2018-05-13 Thread Jörn Franke
You have in AbstractSerde a method to return very basic stats related to your 
fileformat (mostly size of the data and number of rows etc):

https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java

 In method initialize of your Serde you can retrieve properties related to 
partitions and include this information in your file format, if needed (you 
don’t need to create folders etc for partitions - this is done by Hive)


> On 13. May 2018, at 19:09, Elliot West  wrote:
> 
> Hi Jörn,
> 
> I’m curious to know how the SerDe framework provides the means to deal with 
> partitions, table properties, and statistics? I was under the impression that 
> these were in the domain of the metastore and I’ve not found anything in the 
> SerDe interface related to these. I would appreciate if you could point me in 
> the direction of anything I’ve missed.
> 
> Thanks,
> 
> Elliot.
> 
>> On Sun, 13 May 2018 at 15:42, Jörn Franke  wrote:
>> In detail you can check the source code, but a Serde needs to translate an 
>> object to a Hive object and vice versa. Usually this is very simple (simply 
>> passing the object or create A HiveDecimal etc). It also provides an 
>> ObjectInspector that basically describes an object in more detail (eg to be 
>> processed by an UDF). For example, it can tell you precision and scale of an 
>> objects. In case of ORC it describes also how a bunch of objects 
>> (vectorized) can be mapped to hive objects and the other way around. 
>> Furthermore, it provides statistics and provides means to deal with 
>> partitions as well as table properties (!=input/outputformat properties).
>> Although it sounds complex, hive provides most of the functionality so 
>> implementing a serde is most of the times easy.
>> 
>> > On 13. May 2018, at 16:34, 侯宗田  wrote:
>> > 
>> > Hello,everyone
>> >   I know the json serde turn fields in a row to a json format, csv serde 
>> > turn it to csv format with their serdeproperties. But I wonder what the 
>> > orc serde does when I choose to stored as orc file format. And why is 
>> > there still escaper, separator in orc serdeproperties. Also with RC 
>> > Parquet. I think they are just about how to stored and compressed with 
>> > their input and output format respectively, but I don’t know what their 
>> > serde does, can anyone give some hint?  


Re: What does the ORC SERDE do

2018-05-13 Thread Elliot West
Hi Jörn,

I’m curious to know how the SerDe framework provides the means to deal with
partitions, table properties, and statistics? I was under the impression
that these were in the domain of the metastore and I’ve not found anything
in the SerDe interface related to these. I would appreciate if you could
point me in the direction of anything I’ve missed.

Thanks,

Elliot.

On Sun, 13 May 2018 at 15:42, Jörn Franke  wrote:

> In detail you can check the source code, but a Serde needs to translate an
> object to a Hive object and vice versa. Usually this is very simple (simply
> passing the object or create A HiveDecimal etc). It also provides an
> ObjectInspector that basically describes an object in more detail (eg to be
> processed by an UDF). For example, it can tell you precision and scale of
> an objects. In case of ORC it describes also how a bunch of objects
> (vectorized) can be mapped to hive objects and the other way around.
> Furthermore, it provides statistics and provides means to deal with
> partitions as well as table properties (!=input/outputformat properties).
> Although it sounds complex, hive provides most of the functionality so
> implementing a serde is most of the times easy.
>
> > On 13. May 2018, at 16:34, 侯宗田  wrote:
> >
> > Hello,everyone
> >   I know the json serde turn fields in a row to a json format, csv serde
> turn it to csv format with their serdeproperties. But I wonder what the orc
> serde does when I choose to stored as orc file format. And why is there
> still escaper, separator in orc serdeproperties. Also with RC Parquet. I
> think they are just about how to stored and compressed with their input and
> output format respectively, but I don’t know what their serde does, can
> anyone give some hint?
>


Re: What does the ORC SERDE do

2018-05-13 Thread Jörn Franke
Yes this was what I did when writing the Hive part of the HadoopOffice / 
HadoopCryptoledger library. Be aware that Orc uses also some internal Hive 
APIs/ Extended the existing ones (eg Vectorizedserde)

I don’t have access to the Hive Wiki otherwise I could update it a little bit.

> On 13. May 2018, at 17:08, 侯宗田  wrote:
> 
> Thank you, it makes the concept clearer to me. I think I need to look up the 
> source code for some details.
>> 在 2018年5月13日,下午10:42,Jörn Franke  写道:
>> 
>> In detail you can check the source code, but a Serde needs to translate an 
>> object to a Hive object and vice versa. Usually this is very simple (simply 
>> passing the object or create A HiveDecimal etc). It also provides an 
>> ObjectInspector that basically describes an object in more detail (eg to be 
>> processed by an UDF). For example, it can tell you precision and scale of an 
>> objects. In case of ORC it describes also how a bunch of objects 
>> (vectorized) can be mapped to hive objects and the other way around. 
>> Furthermore, it provides statistics and provides means to deal with 
>> partitions as well as table properties (!=input/outputformat properties).
>> Although it sounds complex, hive provides most of the functionality so 
>> implementing a serde is most of the times easy.
>> 
>>> On 13. May 2018, at 16:34, 侯宗田  wrote:
>>> 
>>> Hello,everyone
>>> I know the json serde turn fields in a row to a json format, csv serde turn 
>>> it to csv format with their serdeproperties. But I wonder what the orc 
>>> serde does when I choose to stored as orc file format. And why is there 
>>> still escaper, separator in orc serdeproperties. Also with RC Parquet. I 
>>> think they are just about how to stored and compressed with their input and 
>>> output format respectively, but I don’t know what their serde does, can 
>>> anyone give some hint?  
> 


Re: What does the ORC SERDE do

2018-05-13 Thread 侯宗田
Thank you, it makes the concept clearer to me. I think I need to look up the 
source code for some details.
> 在 2018年5月13日,下午10:42,Jörn Franke  写道:
> 
> In detail you can check the source code, but a Serde needs to translate an 
> object to a Hive object and vice versa. Usually this is very simple (simply 
> passing the object or create A HiveDecimal etc). It also provides an 
> ObjectInspector that basically describes an object in more detail (eg to be 
> processed by an UDF). For example, it can tell you precision and scale of an 
> objects. In case of ORC it describes also how a bunch of objects (vectorized) 
> can be mapped to hive objects and the other way around. Furthermore, it 
> provides statistics and provides means to deal with partitions as well as 
> table properties (!=input/outputformat properties).
> Although it sounds complex, hive provides most of the functionality so 
> implementing a serde is most of the times easy.
> 
>> On 13. May 2018, at 16:34, 侯宗田  wrote:
>> 
>> Hello,everyone
>>  I know the json serde turn fields in a row to a json format, csv serde turn 
>> it to csv format with their serdeproperties. But I wonder what the orc serde 
>> does when I choose to stored as orc file format. And why is there still 
>> escaper, separator in orc serdeproperties. Also with RC Parquet. I think 
>> they are just about how to stored and compressed with their input and output 
>> format respectively, but I don’t know what their serde does, can anyone give 
>> some hint?  



Re: What does the ORC SERDE do

2018-05-13 Thread Jörn Franke
In detail you can check the source code, but a Serde needs to translate an 
object to a Hive object and vice versa. Usually this is very simple (simply 
passing the object or create A HiveDecimal etc). It also provides an 
ObjectInspector that basically describes an object in more detail (eg to be 
processed by an UDF). For example, it can tell you precision and scale of an 
objects. In case of ORC it describes also how a bunch of objects (vectorized) 
can be mapped to hive objects and the other way around. Furthermore, it 
provides statistics and provides means to deal with partitions as well as table 
properties (!=input/outputformat properties).
Although it sounds complex, hive provides most of the functionality so 
implementing a serde is most of the times easy.

> On 13. May 2018, at 16:34, 侯宗田  wrote:
> 
> Hello,everyone
>   I know the json serde turn fields in a row to a json format, csv serde turn 
> it to csv format with their serdeproperties. But I wonder what the orc serde 
> does when I choose to stored as orc file format. And why is there still 
> escaper, separator in orc serdeproperties. Also with RC Parquet. I think they 
> are just about how to stored and compressed with their input and output 
> format respectively, but I don’t know what their serde does, can anyone give 
> some hint?  


What does the ORC SERDE do

2018-05-13 Thread 侯宗田
Hello,everyone
   I know the json serde turn fields in a row to a json format, csv serde turn 
it to csv format with their serdeproperties. But I wonder what the orc serde 
does when I choose to stored as orc file format. And why is there still 
escaper, separator in orc serdeproperties. Also with RC Parquet. I think they 
are just about how to stored and compressed with their input and output format 
respectively, but I don’t know what their serde does, can anyone give some 
hint?