Re: ORC format

2016-02-02 Thread Lefty Leverenz
as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 02 February 2016 16:10 To: user@hive.apache.o

RE: ORC format

2016-02-02 Thread Mich Talebzadeh
nsibility. From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: 02 February 2016 10:26 To: user@hive.apache.org Subject: Re: ORC format Can't resist teasing Mich about this: "Indeed one often demoralises data taking advantages of massive parallel processing in Hive."

Re: ORC format

2016-02-02 Thread Philip Lee
sponsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > *From:* Lefty Leverenz [mailto:leftylever...@gmail.com] > *Sent:* 02 February 2016 10:26 >

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
o stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: 01 February 2016 15:58 To: user@hive.apach

Re: ORC format

2016-02-01 Thread Alan Gates
ORC does not currently expose a primary key to the user, though we have talked of having it do that. As Mich says the indexing on ORC is oriented towards statistics that help the optimizer plan the query. This can be very important in split generation (determining which parts of the input

Re: ORC format

2016-02-01 Thread Philip Lee
Also, when making ORC from CSV, for indexing every key on each coulmn is made, or a primary on a table is made ? If keys are made on each column in a table, accessing any column in some functions like filtering should be faster. On Mon, Feb 1, 2016 at 4:21 PM, Philip Lee

Re: ORC format

2016-02-01 Thread Philip Lee
What do you mean by the silver bullet? so you mean it is not that stored as primary key on each column. It is just stored as storage indexing, right? "The statistics helps the optimiser. So whether one table or many, the optimiser will take advantage of stats to push down the predicate for faster

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Philip Lee [mailto:philjj...@gmail.com] Sent: 01 February 2016 15:49 To: user@hive.a

RE: ORC format

2016-02-01 Thread Mich Talebzadeh
heir employees accept any responsibility. From: Alan Gates [mailto:alanfga...@gmail.com] Sent: 01 February 2016 17:07 To: user@hive.apache.org Subject: Re: ORC format ORC does not currently expose a primary key to the user, though we have talked of having it do that. As Mich says the i