Re: [ANNOUNCE] New Hive Committer - Thejas Nair

2013-08-20 Thread Prasanth Jayachandran
Congrats Thejas!! On Aug 20, 2013, at 10:01 AM, Daniel Dai wrote: > Congratulation! > > > On Tue, Aug 20, 2013 at 4:56 PM, Shreepadma Venugopalan < > shreepa...@cloudera.com> wrote: > >> Congrats Tejas! >> >> >> On Tue, Aug 20, 2013 at 9:32 AM, Eugene Koifman >> wrote: >> >>> Congrats Thej

Re: [ANNOUNCE] New Hive Committer - Yin Huai

2013-09-03 Thread Prasanth Jayachandran
Congratulations yin!!! On Tuesday, September 3, 2013, Jov wrote: > congratulations! > > Jov > blog: http:amutu.com/blog > > > 2013/9/4 Carl Steinbach > >> The Apache Hive PMC has voted to make Yin Huai a committer on the Apache >> Hive project. >> >> Please join me in con

Re: [ANNOUNCE] New Hive PMC Members - Thejas Nair and Brock Noland

2013-10-24 Thread Prasanth Jayachandran
Congrats Thejas and Brock!! Thanks Prasanth Jayachandran On Oct 24, 2013, at 3:29 PM, Vaibhav Gumashta wrote: > Congrats Brock and Thejas! > > > On Thu, Oct 24, 2013 at 3:25 PM, Prasad Mujumdar wrote: > >Congratulations Thejas and Brock ! > > thanks > Prasa

Re: Predicate pushdown/indexing on ORC file

2013-11-07 Thread Prasanth Jayachandran
AFAIK, ORC uses “hive.optimize.index.filter” hive config to enable predicate pushdown. Can you please try by setting hive.optimize.index.filter to true? Thanks Prasanth Jayachandran On Nov 7, 2013, at 4:04 PM, Avrilia Floratou wrote: > Hi all, > > I'm using hive-12. I ha

Re: RLE in hive ORC

2013-11-11 Thread Prasanth Jayachandran
byte. In the new version 0.12 ORC uses 511 as max run length as it uses 9 bits to store run length. The new version of ORC uses a different encoding if the runs are smaller (<10) which saves a byte. Thanks Prasanth Jayachandran On Nov 11, 2013, at 6:22 AM, qihua wu wrote: > In vertica

Re: RLE in hive ORC

2013-11-11 Thread Prasanth Jayachandran
As Owen noted, max run for version 0.11 is 130. 3 is minimum run for RLE to be used. So max value that can be interpreted from 7 bits is 130. Thanks Prasanth Jayachandran On Nov 11, 2013, at 9:51 AM, Owen O'Malley wrote: > Hi, > The RLE in ORC is a tradeoff (as is all compressi

Re: [ANNOUNCE] New Hive Committer - Prasad Mujumdar

2013-11-11 Thread Prasanth Jayachandran
Congrats Prasad! Thanks Prasanth Jayachandran On Nov 10, 2013, at 10:16 PM, Vaibhav Gumashta wrote: > Congrats Prasad! > > > On Sun, Nov 10, 2013 at 8:17 PM, Lefty Leverenz > wrote: > >> Congratulations Prasad! >> >> -- Lefty >> >> &

Re: [ANNOUNCE] New Hive PMC Member - Harish Butani

2013-11-14 Thread Prasanth Jayachandran
Congratulations Harish!! Thanks Prasanth Jayachandran On Nov 14, 2013, at 5:17 PM, Carl Steinbach wrote: > I am pleased to announce that Harish Butani has been elected to the Hive > Project Management Committee. Please join me in congratulating Harish! > > Thank

Re: [ANNOUNCE] New Hive Committers - Jitendra Nath Pandey and Eric Hanson

2013-11-21 Thread Prasanth Jayachandran
Congratulations both of you!! Thanks Prasanth Jayachandran On Nov 21, 2013, at 3:46 PM, Shreepadma Venugopalan wrote: > Congrats guys! > > > On Thu, Nov 21, 2013 at 3:37 PM, Vinod Kumar Vavilapalli > wrote: > >> Congratulations to both! Great job and keep up the

Re: Error inserting data to ORC table

2013-11-27 Thread Prasanth Jayachandran
along with the bug? Thanks Prasanth Jayachandran On Nov 27, 2013, at 5:02 AM, Juan Martin Pampliega wrote: > Hi, > > I am using Hive 0.12 with Hadoop 2.2 and trying to insert data in a new ORC > table with an INSERT SELECT statement from a TEXT file based table and I am > r

Re: Error inserting data to ORC table

2013-12-09 Thread Prasanth Jayachandran
Hi Juan I was able to reproduce this issue with a different dataset. I posted a patch for this bug here https://issues.apache.org/jira/browse/HIVE-5991. Can you use this patch and see if it resolves the issue? Thanks Prasanth Jayachandran On Nov 27, 2013, at 11:01 AM, Prasanth Jayachandran

Re: Hive - Issue Converting Text to Orc

2013-12-16 Thread Prasanth Jayachandran
What version of protobuf are you using? Are you compiling hive from source? Thanks Prasanth Jayachandran On Dec 16, 2013, at 4:30 PM, Bryan Jeffrey wrote: > Hello. > > Running the following version of Hadoop: hadoop-2.2.0 > Running the following version of Hive: hive-0.12.0 &

Re: Hive - Issue Converting Text to Orc

2013-12-16 Thread Prasanth Jayachandran
Prasanth Jayachandran On Dec 16, 2013, at 4:55 PM, Bryan Jeffrey wrote: > Prasanth, > > I am running Hive 0.12.0 downloaded from the Apache Hive site. I did not > compile it. I downloaded protobuf 2.5.0 earlier today from the Google Code > site. I compiled it via the follow

Re: Hive - Issue Converting Text to Orc

2013-12-16 Thread Prasanth Jayachandran
Also what are you doing with steps 2 through 5? Compiling hive or your custom code? Thanks Prasanth Jayachandran On Dec 16, 2013, at 4:55 PM, Bryan Jeffrey wrote: > Prasanth, > > I am running Hive 0.12.0 downloaded from the Apache Hive site. I did not > compile it. I downloa

Re: Hive - Issue Converting Text to Orc

2013-12-16 Thread Prasanth Jayachandran
://mirror.symnds.com/software/Apache/hive/hive-0.12.0/ and running hive directly. After extracting the hive-0.12.0-bin.tar.gz set HIVE_HOME to the extracted directory and run hive. Let me know if you face any issues. Thanks Prasanth Jayachandran On Dec 16, 2013, at 5:19 PM, Bryan Jeffrey wrote: > Prasa

Re: Hive - Issue Converting Text to Orc

2013-12-24 Thread Prasanth Jayachandran
elease. I would recommend >>> re-downloading hive 0.12 binary release from >>> http://mirror.symnds.com/software/Apache/hive/hive-0.12.0/ and running hive >>> directly. After extracting the hive-0.12.0-bin.tar.gz set HIVE_HOME to the >>> extracted directory and ru

Re: [ANNOUNCE] New Hive PMC Member - Gunther Hagleitner

2013-12-27 Thread Prasanth Jayachandran
Congrats Gunther!! Sent from my iPhone > On Dec 27, 2013, at 4:46 PM, Lefty Leverenz wrote: > > Congratulations Gunther, well deserved! > > -- Lefty > > > On Fri, Dec 27, 2013 at 12:00 AM, Jarek Jarcec Cecho wrote: > >> Congratulations Gunther, good job! >> >> Jarcec >> >>> On Thu, Dec 2

Re: [ANNOUNCE] New Hive Committer - Vikram Dixit

2014-01-06 Thread Prasanth Jayachandran
Congratulations Vikram!! Thanks Prasanth Jayachandran On Jan 6, 2014, at 11:50 PM, Eugene Koifman wrote: > Congratulations! > > > On Mon, Jan 6, 2014 at 9:44 AM, Gunther Hagleitner < > ghagleit...@hortonworks.com> wrote: > >> Congratulations Vikram! >>

Re: orc backward compatility

2014-01-06 Thread Prasanth Jayachandran
Yes. ORC reader is backward compatible. ORC footer has version information to decide which readers to use. Thanks Prasanth Jayachandran On Jan 7, 2014, at 5:20 AM, Tongjie Chen wrote: > > If we use ORC in hive 0.11 and later upgrade to hive 0.12 or 0.13, ORC > should be backward c

Re: any standalone utility/tool to read ORC footer/index/data ?

2014-01-07 Thread Prasanth Jayachandran
You can use ORC file dump utility to analyze ORC files.. Use following command to use file dump hive —orcfiledump Thanks Prasanth Jayachandran On Jan 7, 2014, at 3:53 PM, Nitin Pawar wrote: > as of now none that I am aware of. > > You can look at the test cases in hive cod

Re: write orcfile exception

2014-01-08 Thread Prasanth Jayachandran
Does it happen with trunk or any specific version of hive? Can you provide a test data that reproduces this issue? Thanks Prasanth Jayachandran On Jan 9, 2014, at 9:15 AM, bhsc.happy wrote: > write orcfile with compress CompressionKind.ZLIB or CompressionKind.SNAPPY > occur exc

Re: [ANNOUNCE] New Hive Committers - Sergey Shelukhin and Jason Dere

2014-01-27 Thread Prasanth Jayachandran
Congrats!! Sergey and Jason.. Thanks Prasanth Jayachandran On Jan 27, 2014, at 10:19 AM, Sergey Shelukhin wrote: > Thanks guys! > > > On Mon, Jan 27, 2014 at 9:24 AM, Jarek Jarcec Cecho wrote: > Congratulations Sergey and Jason, good job! > > Jarcec > > On Mon,

Re: What are all the factors that go into the number of mappers - ORC

2014-02-02 Thread Prasanth Jayachandran
be only one mapper. Can you provide the value for following configs so that we can understand it better? 1) hive.input.format 2) hive.min.split.size 3) hive.max.split.size 4) total size on disk for the table Thanks Prasanth Jayachandran On Feb 2, 2014, at 5:25 PM, John Omernik wrote: > I h

Re: Optimising mappers for number of nodes

2014-02-03 Thread Prasanth Jayachandran
. Thanks Prasanth Jayachandran On Feb 3, 2014, at 10:20 AM, KingDavies wrote: > Our platform has a 40GB raw data file that was compressed lzo (12GB > compressed) to reduce network IO between S3. > Without indexing the file is unsplittable resulting in 1 map task and poor > cluster

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
the input format is set to HiveInputFormat. Thanks Prasanth Jayachandran On Feb 10, 2014, at 12:49 AM, Avrilia Floratou wrote: > Hi all, > > I'm running a query that scans a file stored in ORC format and extracts some > columns. My file is about 92 GB, uncompressed. I kept t

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
Hi Avrilia I have few more questions 1) Have you enabled ORC predicate pushdown by setting hive.optimize.index.filter? 2) What is the value for hive.input.format? 3) Which hive version are you using? 4) What query are you using? Thanks Prasanth Jayachandran On Feb 10, 2014, at 1:26 PM

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
using hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. My suspicion is that ORC generates wrong splits because of this bug https://issues.apache.org/jira/browse/HIVE-6326. I will try to reproduce your scenario and see if I hit similar issue. Thanks Prasanth Jayachandran On Feb 10, 2014

Re: ORC file question

2014-02-10 Thread Prasanth Jayachandran
Great to hear! Thanks Prasanth Jayachandran On Feb 10, 2014, at 2:50 PM, Avrilia Floratou wrote: > Hi Prasanth, > > It seems that I was actually using the > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and > that was generating 363 map tasks. I tried

Re: Compiling Hive 0.12.0

2014-02-12 Thread Prasanth Jayachandran
regenerated. Can you try regenerating the file and see if it happens again? Thanks Prasanth Jayachandran On Feb 12, 2014, at 11:35 AM, Bryan Jeffrey wrote: > Hello. > > I am running Hive 0.12.0 & Hadoop 2.2.0. I attempted to apply the fix > described in the patc

Re: Compiling Hive 0.12.0

2014-02-12 Thread Prasanth Jayachandran
/browse/HIVE-6382 You can try applying patch from HIVE-6382 on top of HIVE-5991 and see if it solves the issue. HIVE-6382 added some protection against integer overflows. Let me know if it helps. Thanks Prasanth Jayachandran On Feb 12, 2014, at 3:46 PM, Bryan Jeffrey wrote: > Prasanth, >

Re: Compiling Hive 0.12.0

2014-02-13 Thread Prasanth Jayachandran
Glad that it helped! HIVE-6382 is waiting on some other ORC related patches. Will get in soon. Thanks Prasanth Jayachandran On Feb 13, 2014, at 8:04 AM, Bryan Jeffrey wrote: > Prasanth, > > That appears to have resolved the issue. I originally intended to apply > both, but d

Re: Bug - Hive Filer Index & VARCHAR

2014-02-24 Thread Prasanth Jayachandran
Hi Bryan Yes. This is a known issue. Hive-5950 (https://issues.apache.org/jira/browse/HIVE-5950) will address it. It will soon be committed to trunk. Thanks Prasanth Jayachandran On Feb 24, 2014, at 11:21 AM, Bryan Jeffrey wrote: > All, > > I am running Hadoop 2.2.0 &

Re: ORC queries inefficient for sorted field

2014-02-24 Thread Prasanth Jayachandran
required columns. If column statistics exists, then certain queries like min, max, count etc. will be answered without ever scanning the table. This is the JIRA that added the above feature (its available in hive version 0.13.0) https://issues.apache.org/jira/browse/HIVE-5483 Thanks Prasanth

Re: ORC 'BETWEEN' Error

2014-02-27 Thread Prasanth Jayachandran
Hi Martin This is an known issue and its fixed in hive trunk. It should be available in 0.13 release. https://issues.apache.org/jira/browse/HIVE-5601 Thanks Prasanth Jayachandran On Feb 26, 2014, at 8:55 AM, Martin, Nick wrote: > Hi all, > > (Running Hive 12.0) > > I have

Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

2014-02-28 Thread Prasanth Jayachandran
Congratulations Xuefu! Thanks Prasanth Jayachandran On Feb 28, 2014, at 11:04 AM, Vaibhav Gumashta wrote: > Congrats Xuefu! > > > On Fri, Feb 28, 2014 at 9:20 AM, Prasad Mujumdar wrote: > >> Congratulations Xuefu !! >> >> thanks >> Prasad >>

Re: running out of memory with orc

2014-03-17 Thread Prasanth Jayachandran
Try setting table properties “orc.compress.size” to lesser value. Default is 256KB. Thanks Prasanth Jayachandran On Mar 17, 2014, at 8:12 AM, Saltys, Zilvinas wrote: > Hi, > > My write queries using ORC keep dying with out of memory errors. Sample query: > insert ove

Re: Error using ORC Format with Hive

2014-04-05 Thread Prasanth Jayachandran
=org.apache.hadoop.hive.ql.io.HiveInputFormat. Thanks Prasanth Jayachandran On Apr 5, 2014, at 12:48 AM, Amit Tewari wrote: > Thanks for the reply. I did solve protobuf issue by upgrading to 2.5 but then > hive 0.12 also started showing the same issue as 0.13 and 0.14 > > I was working through cli > > Turns

Re: [ANNOUNCE] New Hive Committers - Alan Gates, Daniel Dai, and Sushanth Sowmyan

2014-04-14 Thread Prasanth Jayachandran
Congratulations everyone!! Thanks Prasanth Jayachandran On Apr 14, 2014, at 10:51 AM, Carl Steinbach wrote: > The Apache Hive PMC has voted to make Alan Gates, Daniel Dai, and Sushanth > Sowmyan committers on the Apache Hive Project. > > Please join me in congratulating Alan,

Re: Hive 0.13.0 - IndexOutOfBounds Exception

2014-04-21 Thread Prasanth Jayachandran
Hi Bryan Can you provide more information about the input and output tables? Schema? Partitioning and bucketing information? Explain plan of your insert query? These information will help to diagnose the issue. Thanks Prasanth Sent from my iPhone > On Apr 21, 2014, at 7:00 PM, Bryan Jeffre

Re: Hive 0.13.0 - IndexOutOfBounds Exception

2014-04-22 Thread Prasanth Jayachandran
Thanks Bryan. This is more than sufficient. As a workaround, can you try setting hive.optimize.sort.dynamic.partition=false and see if it helps? In the meantime, I will diagnose the issue. Thanks Prasanth Jayachandran On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey wrote: > Prasanth, >

Re: Hive 0.13.0 - IndexOutOfBounds Exception

2014-04-22 Thread Prasanth Jayachandran
0.13. It will go into the next patch release/next release. I will request for a backport to hive 0.13 source as well. Thanks Prasanth Jayachandran On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey wrote: > Prasanth, > > Was this additional information sufficient? This is a large road

Re: Skewed Tables

2014-04-23 Thread Prasanth Jayachandran
can track for progress on this issue https://issues.apache.org/jira/browse/HIVE-6968 Thanks Prasanth Jayachandran On Apr 23, 2014, at 6:52 AM, Mayur Gupta wrote: > Below is my skewedInfo > > skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], > skewedColValue

Re: Skewed Tables

2014-04-25 Thread Prasanth Jayachandran
Lefty, I can add this information. Can you please point me to the location to add this? Perhaps, you can help reviewing it. Thanks Prasanth Jayachandran On Apr 24, 2014, at 1:13 PM, Lefty Leverenz wrote: > I'm looking at the docs and thinking of ways to include this information.

Re: Skewed Tables

2014-04-27 Thread Prasanth Jayachandran
to edit the wiki? or alternatively if you can update the docs adding “stored as directories” to the examples, it will be great. Also updating the docs with “CTAS not supported for list bucketing”. Thanks Prasanth Jayachandran On Apr 26, 2014, at 8:03 AM, Mayur Gupta wrote: > Hey Prasa

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-28 Thread Prasanth Jayachandran
expected the later to fail as it had less memory. Thanks Prasanth Jayachandran On Apr 28, 2014, at 4:45 AM, John Omernik wrote: > Prasanth - > > This is easily the best and most complete explanation I've received to any > online posted question ever. I know that sounds like a

Re: Skewed Tables

2014-04-28 Thread Prasanth Jayachandran
wiki.apache.org/confluence/display/Hive/ListBucketing Thanks Prasanth Jayachandran On Apr 27, 2014, at 11:28 PM, Lefty Leverenz wrote: > Prasanth, Hive's user docs are wiki-only at this point so there's no version > control. We just add notes about which release introduced or change

Re: ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread Prasanth Jayachandran
lower value as suggested by John. Thanks Prasanth Jayachandran On May 16, 2014, at 12:31 PM, John Omernik wrote: > When I created the table, I had to reduce the orc.compress.size quite a bit > to make my table with many columns work. This was on Hive 0.12 (I thought it > was supposed to

Re: Query Using Stats

2014-05-16 Thread Prasanth Jayachandran
are not used to answer metadata only queries. Hive considers metastore as the only source of truth for answering such queries. You can look at this jira for further details https://issues.apache.org/jira/browse/HIVE-5483 Thanks Prasanth Jayachandran On May 16, 2014, at 5:35 AM, Bryan Jeffrey

Re: custom table/column statistics

2014-06-08 Thread Prasanth Jayachandran
Column group statistics is not supported in hive yet. Thanks Prasanth Sent from my iPhone > On Jun 8, 2014, at 6:33 PM, Alex Nastetsky wrote: > > Table statistics collection was added in HIVE-33 (numRows, rawDataSize, etc). > Is there anything that lets you create your own statistics gatheri

Re: custom table/column statistics

2014-06-09 Thread Prasanth Jayachandran
are often used to get approximate count. Hive uses such probabilistic algorithms to estimate the distinct count. The error rate of the estimation can be tuned using “hive.stats.ndv.error”. By default the error is set to 20%. Thanks Prasanth Jayachandran On Jun 9, 2014, at 12:33 PM, Alex

Re: ORC with bloom filters

2014-07-17 Thread Prasanth Jayachandran
ORC does not have bloom filters yet. Thanks Prasanth Jayachandran On Jul 17, 2014, at 10:23 PM, Suma Shivaprasad wrote: > Hi, > > I do not see any examples of extending ORC with bloom filters on Hive > documentation. Can someone please explain what exactly needs to be done to &

Re: hive 13: dynamic partition inserts

2014-07-22 Thread Prasanth Jayachandran
y shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue. Thanks Prasanth Jayachandran On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu wrote: > adding user@hive.apache.org for wider audience > From: Gajendran, Vishnu > Sent: Tuesday, July 22, 20

Re: select list for dynamic partition insert

2014-07-22 Thread Prasanth Jayachandran
>equal to that number. Hive uses the last column of select * query as the partition column. Projecting the list of columns in select query is not always mandatory. Thanks Prasanth Jayachandran On Jul 22, 2014, at 1:07 PM, Kristof Vanbecelaere wrote: > While playing with the movielens da

Re: Reg:Column Statistics with Parquet

2014-07-24 Thread Prasanth Jayachandran
ute statistics; To collect column statistics add the column list like below analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics for columns a, b, c; Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:13 AM, Sandeep Samudrala wrote: > I am tr

Re: CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread Prasanth Jayachandran
What version of hive are you using? What file format are you using? Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:03 PM, wrote: > I am trying to Create a table in Hive. It’s a very long script contained > large number of columns and also contains complex fields like STRUCT,

Re: ORC File IndexOutOfBoundsException error when PPD enabled

2014-08-05 Thread Prasanth Jayachandran
This is probably causing the issue https://issues.apache.org/jira/browse/HIVE-6320 It is fixed in hive 0.13 and trunk.. If you are using older version you probably want to backport this small fix.. Let me know if it helps.. Thanks Prasanth Jayachandran On Aug 5, 2014, at 12:21 PM, Shangzhong

Re: High performance Count Distinct - NO Error

2014-08-06 Thread Prasanth Jayachandran
hen you might need to use https://github.com/prasanthj/hyperloglog (the prototype implementation mentioned above uses this but I don’t think it will print the error) cli tool. And yes, it uses very low memory when compared to full and accurate count distinct. Thanks Prasanth Jayachandran On

Re: ORC String error

2014-08-10 Thread Prasanth Jayachandran
Hi My suspicion for the error is because of this issue https://issues.apache.org/jira/browse/HIVE-6320 Applying this patch should resolve the issue. The alternative workaround would to “set hive.optimize.index.filter=false" Thanks Prasanth Jayachandran On Aug 10, 2014, at 11:45 PM, Woj

Re: How to merge the output file when insert data into a table

2014-08-28 Thread Prasanth Jayachandran
Hi If its an rcfile you can use "alter table tablename concatenate" to merge them into 1 file. For text files, you can may have to reload these 2 files into another table with "Order By". This will force one reducer to generate total ordering and thereby generating 1 output file. But remember i

Re: Nested types in ORC

2014-09-08 Thread Prasanth Jayachandran
icate pushdown. The performance difference depends mainly on the data layout/if column is sorted or not. Thanks Prasanth Jayachandran On Sep 8, 2014, at 6:16 AM, Abhishek Agarwal wrote: > Hi all, > I have few questions with regards to nested columns in Hive. > > How does ORC inte

Re: Nested types in ORC

2014-09-09 Thread Prasanth Jayachandran
Yes. It does now. Thanks Prasanth Jayachandran On Sep 9, 2014, at 12:30 AM, Abhishek Agarwal wrote: > Thanks Prasanth. Does it also mean that a query reading nested.k column will > invariably read nested.v as well even if nested.v column in not used in the > query? > > On M

Re: [ANNOUNCE] New Hive Committer - Eugene Koifman

2014-09-12 Thread Prasanth Jayachandran
Congrats Eugene! Thanks Prasanth Jayachandran On Sep 12, 2014, at 3:46 PM, Xiaobing Zhou wrote: > Congrats, Eugene! > > On Fri, Sep 12, 2014 at 3:35 PM, Carl Steinbach wrote: > >> The Apache Hive PMC has voted to make Eugene Koifman a committer on the >> Apache Hiv

Re: Join error with ORC Hive tables

2014-10-01 Thread Prasanth Jayachandran
Hi Can you post the exception stacktrace from hadoop execution logs? What version of hive are you using? Can you provide the join query that you are using? Thanks Prasanth Jayachandran On Oct 1, 2014, at 7:41 AM, Thiago Henrique dos Santos Bento wrote: > Hi! > > I’m trying to run

Re: percentile_approx slowness

2014-10-02 Thread Prasanth Jayachandran
You can look for explode(), posexplode() UDF’s in hive. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode Thanks Prasanth Jayachandran On Oct 2, 2014, at 7:15 AM, Kevin Weiler wrote: > Hi all, > > I wanted to note that I figured out

Re: RES: Join error with ORC Hive tables

2014-10-02 Thread Prasanth Jayachandran
he ORC table >loaded? My suspicion is some non-ORC file ended up in ORC table. Thanks Prasanth Jayachandran On Oct 2, 2014, at 12:26 PM, Thiago Henrique dos Santos Bento wrote: > Erro: > > 2014-10-02 15:59:08,654 FATAL [IPC Server

Re: Multitable insert does not work with ORCFiles

2014-10-17 Thread Prasanth Jayachandran
Hi Dmitry Yes. I can confirm that this is an issue. But the issue seems to be with vectorized execution and not with ORC. If I disable vectorization the query seems to work fine even for ORC tables. I will dig through the JIRAs to see if this is a known issue else I will file a bug. Thanks f

Re: Multitable insert does not work with ORCFiles

2014-10-17 Thread Prasanth Jayachandran
Here is the JIRA for tracking https://issues.apache.org/jira/browse/HIVE-8498 - Prasanth On Fri, Oct 17, 2014 at 11:25 AM, Dmitry Tolpeko wrote: > Thank you, Prasanth. If you file a Jira please post its number here for > tracking. > Dmitry > On Fri, Oct 17, 2014 at 8:59

Re: [ANNOUNCE] New Hive PMC Member - Alan Gates

2014-10-27 Thread Prasanth Jayachandran
Congrats! - Prasanth On Mon, Oct 27, 2014 at 3:44 PM, Matthew McCline wrote: > Congratulations! > On Mon, Oct 27, 2014 at 3:38 PM, Carl Steinbach wrote: >> I am pleased to announce that Alan Gates has been elected to the Hive >> Project Management Committee. Please join me in congratulating A

Re: Fwd: Question on ORC file stripe size.

2014-12-03 Thread Prasanth Jayachandran
Stripe size is too low. ORC maintains multiple buffers in memory. ORC’s memory manager flushes a stripe when the in-memory data size (which includes buffers in memory) is greater than specified stripe size. This check happens after every 5000 rows.  This is what is happening in this case There

Re: Concatenating ORC files

2014-12-10 Thread Prasanth Jayachandran
Hi Daniel In you first run, are there some files with “orc.create.index”=“false”? What are the table properties used to create ORC files in both cases? - Prasanth On Wed, Dec 10, 2014 at 7:55 AM, Daniel Haviv wrote: > Hi, > I'm trying to use the new concatenate command merge small ORC fil

Re: Concatenating ORC files

2014-12-10 Thread Prasanth Jayachandran
I can see a bug for the case 2 where orc index is disabled. I have created a jira to track that issue. https://issues.apache.org/jira/browse/HIVE-9067 I am not sure why does it fail in case 1 though. Can you create a jira with a reproducible case? I can take a look at it. - Prasanth On W

Re: Concatenating ORC files

2014-12-10 Thread Prasanth Jayachandran
I am unable to reproduce the case that causes exception that you are seeing. Will be great if you can provide a repro. - Prasanth On Wed, Dec 10, 2014 at 1:43 PM, Prasanth Jayachandran wrote: > I can see a bug for the case 2 where orc index is disabled. I have created a > jira to trac

Re: Concatenating ORC files

2014-12-11 Thread Prasanth Jayachandran
-9080 > Thanks! > Daniel > On Thu, Dec 11, 2014 at 12:49 AM, Prasanth Jayachandran < > pjayachand...@hortonworks.com> wrote: >> I am unable to reproduce the case that causes exception that you are >> seeing. Will be great if you can provide a repro. >> >

Re: orc ppd bug report

2015-01-06 Thread Prasanth Jayachandran
Sure. Will look into this. - Prasanth On Mon, Jan 5, 2015 at 9:59 PM, wzc wrote: > @Prasanth would you help me look into this problem? > Thanks. > On Mon Jan 05 2015 at 上午12:03:42 wzc wrote: >> Recently we find a bug with orc ppd, here is the testcase: >> >> use test; >> create table if not

Re: orc ppd bug report

2015-01-06 Thread Prasanth Jayachandran
Hi  Which version of hive are you using? I tried your test case in hive trunk and it seems to work fine. In both cases where PPD enabled and disabled I am getting 3 as the result. - Prasanth On Sun, Jan 4, 2015 at 3:04 PM, wzc wrote: > Recently we find a bug with orc ppd, here is the te

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Prasanth Jayachandran
My guess is.. "java" binary is not in PATH of the shell script that launches the container.. try creating a symbolic link in /bin/ to point to java.. On Tue, Jan 20, 2015 at 7:22 AM, Edward Capriolo wrote: > It seems that CDH does not ship with enough jars to run tez out of the box. > > I have

Re: sum a double in ORC bug?

2015-01-20 Thread Prasanth Jayachandran
Hi Nick Can you try disabling predicate pushdown in ORC and see if you are getting correct results? set hive.optimize.index.filter=false; This is just to rule out the possibility of bug in predicate pushdown.  - Prasanth On Wed, Jan 14, 2015 at 6:46 PM, Martin, Nick wrote: > *Hive

Re: Getting Tez working against cdh 5.3

2015-01-20 Thread Prasanth Jayachandran
t; On Tue, Jan 20, 2015 at 2:02 PM, Prasanth Jayachandran < > pjayachand...@hortonworks.com> wrote: >> My guess is.. >> "java" binary is not in PATH of the shell script that launches the >> container.. try creating a symbolic link in /bin/ to point to java..

Re: [ANNOUNCE] New Hive PMC Members - Szehon Ho, Vikram Dixit, Jason Dere, Owen O'Malley and Prasanth Jayachandran

2015-01-28 Thread Prasanth Jayachandran
san...@apache.org<mailto:prasan...@apache.org> Subject: [ANNOUNCE] New Hive PMC Members - Szehon Ho, Vikram Dixit, Jason Dere, Owen O'Malley and Prasanth Jayachandran I am pleased to announce that Szehon Ho, Vikram Dixit, Jason Dere, Owen O'Malley and Prasanth Jayachandr

Re: Predicate push-down on nested types ?

2015-02-05 Thread Prasanth Jayachandran
ORC does not support at this point. There are plans to do so. > On Feb 5, 2015, at 10:29 AM, The Watcher wrote: > > I'm wondering if predicates are pushed down when they apply to elements of a > nested struct. More specifically, imagine a table such as > > CREATE TABLE t ( > c1 int, > c2 STRUC

Re: [ANNOUNCE] New Hive Committers -- Chao Sun, Chengxiang Li, and Rui Li

2015-02-09 Thread Prasanth Jayachandran
Congratulations! > On Feb 9, 2015, at 1:57 PM, Na Yang wrote: > > Congratulations! > > On Mon, Feb 9, 2015 at 1:06 PM, Vikram Dixit K > wrote: > >> Congrats guys! >> >> On Mon, Feb 9, 2015 at 12:42 PM, Szehon Ho wrote: >> >>> Congratulations guys ! >>> >>> On Mon, Feb 9, 2015 at 3:38 PM,

Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-25 Thread Prasanth Jayachandran
Congrats Sergey! On Feb 25, 2015, at 1:50 PM, Alexander Pivovarov mailto:apivova...@gmail.com>> wrote: Congrats! On Wed, Feb 25, 2015 at 12:33 PM, Vaibhav Gumashta mailto:vgumas...@hortonworks.com>> wrote: Congrats Sergey! On 2/25/15, 9:06 AM, "Vikram Dixit" mailto:vik...@hortonworks.com>> w

Re: hive cli problem

2015-03-09 Thread Prasanth Jayachandran
Hi Garry Try removing jline-0.9.94.jar from hadoop. The exact path is this $HADOOP_PREFIX/share/hadoop/yarn/lib/jline-0.9.94.jar See here for discussion https://issues.apache.org/jira/browse/HIVE-8609?focusedCommentId=14215543&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpane

Re: inserting dynamic partitions - need more reducers

2015-03-12 Thread Prasanth Jayachandran
Hi Can you try with hive.optimize.sort.dynamic.partition set to false? Thanks Prasanth On Thu, Mar 12, 2015 at 9:02 PM -0700, "Alex Bohr" mailto:a...@gradientx.com>> wrote: I'm inserting from an unpartitioned table with a 6 hours of data into a table partitioned by hour. The source table

Re: [ANNOUNCE] New Hive Committers - Jimmy Xiang, Matt McCline, and Sergio Pena

2015-03-23 Thread Prasanth Jayachandran
Congratulations everyone! On Mar 23, 2015, at 11:26 AM, Chinna Rao Lalam mailto:lalamchinnara...@gmail.com>> wrote: Congratulations to all... On Mon, Mar 23, 2015 at 11:38 PM, Carl Steinbach mailto:c...@apache.org>> wrote: The Apache Hive PMC has voted to make Jimmy Xiang, Matt McCline, and Se

Re: Sort order in Hive plan?

2015-03-27 Thread Prasanth Jayachandran
Hi In the image that you had posted. There are 4 plus signs which means sorting happens on all four columns in key expression. Number of plus signs indicate the number of columns in key expressions that are used for sorting. Also minus sign indicate that sorting happens in descending order. Th

Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-14 Thread Prasanth Jayachandran
Congrats Mithun! Thanks Prasanth On Tue, Apr 14, 2015 at 8:51 PM -0700, "Jimmy Xiang" mailto:jxi...@cloudera.com>> wrote: Congrats! On Tue, Apr 14, 2015 at 8:46 PM, Lefty Leverenz mailto:leftylever...@gmail.com>> wrote: Congrats Mithun -- when they gave me the cape, they called it a cloak

Re: hive on Tez - merging orc files

2015-04-24 Thread Prasanth Jayachandran
Hi This has been fixed recently https://issues.apache.org/jira/browse/HIVE-9529. Merging is triggered in two different ways. INSERT/CTAS can trigger merging of small files and CONCATENATE can trigger merging of small files. The later had a bug which generated MR task instead of TEZ task which w

Re: hive on Tez - merging orc files

2015-04-24 Thread Prasanth Jayachandran
AM, patcharee wrote: > > Hi, > > The sandbox 2.2 comes with hive 0.14. Does it also have the bug? If so, how > can I patch hive on sandbox? > > BR, > Patcharee > > On 24. april 2015 09:42, Prasanth Jayachandran wrote: >> Hi >> >> This has been fixed

Re: Set the ORC file HDFS block size to 64MB

2015-07-29 Thread Prasanth Jayachandran
Hi OrcFile.createWriter() methods accepts WriterOptions where you can specify the block size to use using blockSize() method. Thanks Prasanth On Jul 29, 2015, at 5:22 PM, Ashish Shenoy mailto:ashe...@instartlogic.com>> wrote: Hi, I am using the OrcFile.createWriter() function to get an ORC w

Re: Verifying that a query uses orc bloom filters, orc storage indexes

2015-07-30 Thread Prasanth Jayachandran
If you are using tez, you can verify that using counters that gets printed after query execution. You need set hive.tez.exec.print.summary=true for tez to print counters after execution. Thanks Prasanth On Jul 30, 2015, at 9:31 AM, Jörn Franke mailto:jornfra...@gmail.com>> wrote: Hi, Is the

Re: Running hive on tez locally

2015-08-07 Thread Prasanth Jayachandran
Hi Can you make sure the following configs are set and appropriately pointing to your corresponding local directories? set hive.user.install.directory=file:///tmp; set fs.default.name=file:///; set fs.defaultFS=file:///; set tez.staging-dir=/tmp; set tez.ignore.lib.uris=true; set tez.runtime.opt

Re: Hive 12 - CDH 5.0.1 - many small files when using ORC table

2015-08-18 Thread Prasanth Jayachandran
Are you using bucketing? If so those are empty ORC files without any data containing only metadata information.  _ From: Juraj jiv Sent: Tuesday, August 18, 2015 8:28 AM Subject: Hive 12 - CDH 5.0.1 - many small files when using ORC table To:

Re: Hive 12 - CDH 5.0.1 - many small files when using ORC table

2015-08-18 Thread Prasanth Jayachandran
se metada information are required? I cant just delete those 43b files? JV On Tue, Aug 18, 2015 at 5:35 PM, Prasanth Jayachandran mailto:j.prasant...@gmail.com>> wrote: Are you using bucketing? If so those are empty ORC files without any data containing only metadata information. _

Re: ORC NPE while writing stats

2015-09-02 Thread Prasanth Jayachandran
Memory manager is made thread local https://issues.apache.org/jira/browse/HIVE-10191 Can you try the patch from HIVE-10191 and see if that helps? On Sep 2, 2015, at 8:58 PM, David Capwell mailto:dcapw...@gmail.com>> wrote: I'll try that out and see if it goes away (not seen this in the past 24

Re: ORC NPE while writing stats

2015-09-03 Thread Prasanth Jayachandran
:34 PM, David Capwell wrote: >> Thanks for the jira, will see if that works for us. >> >> On Sep 2, 2015 7:11 PM, "Prasanth Jayachandran" >> wrote: >>> >>> Memory manager is made thread local >>> https://issues.apache.org/jira/browse/

Re: Error: java.lang.IllegalArgumentE:Column has wrong number of index entries found - when trying to insert from JSON external table to ORC table

2015-09-08 Thread Prasanth Jayachandran
What hive version are you using? Can you disable automerging of files and see if that works? set hive.merge.orcfile.stripe.level to false. Also set hive.merge.mapfiles to false and hive.merge.mapredfiles to false so that slow merging is also disabled. On Tue, Sep 8, 2015 at 5:29 AM -0700, "

Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-16 Thread Prasanth Jayachandran
Congratulations Ashutosh! On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" mailto:xzh...@cloudera.com>> wrote: Congratulations, Ashutosh!. Well-deserved. Thanks to Carl also for the hard work in the past few years! --Xuefu On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach wrote: > I

Re: hive ORC wrong number of index entries error

2015-09-23 Thread Prasanth Jayachandran
Looks like you are running out of memory. Trying increasing the heap memory or reducing the stripe size. How many columns are you writing? Any idea how many record writers are open per map task? - Prasanth On Sep 22, 2015, at 4:32 AM, Patrick Duin mailto:patd...@gmail.com>> wrote: Hi all, I

Re: hive ORC wrong number of index entries error

2015-09-24 Thread Prasanth Jayachandran
we have a production cluster that is running same hadoop/hive versions, same code and same data and processing just fine I get this error only in our QA cluster. It's hard to locate the difference :). Anyway thanks for the pointers I'll do some more digging. Cheers, Patrick 2015-09-24 0:51

  1   2   >