Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Rajesh Balamohan
Congratulations Simhadri. :) ~Rajesh.B On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha wrote: > Congrats Simhadri ! > > On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam > wrote: > >> Congrats Simhadri. Looking forward to many more contributions in the >> future. >> >> On Thu, Apr 18, 2024 at 12:25 

Re: Tez hook for "INSERT INTO TABLE PARTITION(...)" query

2023-01-03 Thread Rajesh Balamohan
If it is at the end of creating the partition, check whether "HMS::MetaStoreEventListener::onAddPartition" can be of help. This may need customer listener to be added in HMS side.

Re: TPCDS query degrade with hive-3.1.2 because of wrong estimation for reducers

2022-10-02 Thread Rajesh Balamohan
Based on the plan, filtered output in map-1 had mis-estimates and also groupby operators have large misestimates. This is causing the number of reducers to be estimated as "4" which is less for this query. Due to the partition factor of tez, it ends up with 8 reducer slots at runtime for hive

Re: engine for hive3

2022-04-12 Thread Rajesh Balamohan
Defaults to tez. MR is deprecated and hive on spark isn't under active dev. On Wed, Apr 13, 2022 at 9:02 AM linuxspace wrote: > for hive3, what's the suggested engine? tez, spark or the default mr? > > Thanks. >

Re: Too many S3 API calls for simple queries like select and create external table

2022-02-21 Thread Rajesh Balamohan
If you are using parquet format, HIVE-25827 would be causing additional calls to s3 as the footer is read atleast twice. Add to this atleast 9+ list_status calls being made for split gen. ~Rajesh.B On Mon, Feb 21, 2022 at 10:16 AM Sungwoo Park

Re: Question regarding lock manager

2021-09-06 Thread Rajesh Balamohan
For the specific code you mentioned, check if you have "hive.privilege.synchronizer" enabled or not. If so, disable it explicitly. PrivSync is needed for populating information_schema. ~Rajesh.B On Mon, Sep 6, 2021 at 8:04 PM Antoine DUBOIS wrote: > Hello all > After some digging and remote

Re: Running Hive on Spark

2019-03-13 Thread Rajesh Balamohan
uot;what does it mean if I do that" > > Best regards > Daniel > > On Tue 12 Mar 2019, 02:21 Rajesh Balamohan, wrote: > >> Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be >> good enough for this. >> >> Is there any specific reaso

Re: Running Hive on Spark

2019-03-11 Thread Rajesh Balamohan
Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be good enough for this. Is there any specific reason for moving from tez to spark as execution engine? ~Rajesh.B On Mon, Mar 11, 2019 at 9:45 PM Daniel Mateus Pires wrote: > Hi there, > > I would like to run Hive using

Re: hive-testbench - Hive + TEZ TPC-DS job gets stuck

2017-09-25 Thread Rajesh Balamohan
'Pending' count of 4 for long time suggests that you may have to check the cluster capacity. ~Rajesh.B On Sun, Sep 24, 2017 at 7:41 AM, Krishnanand Khambadkone < kkhambadk...@yahoo.com> wrote: > Hi, I am trying to run a small 4GB TPC-DS test using the hortonworks > hive-testbench framework. I

Re: Out of Memory while generating ORC Splits

2017-09-13 Thread Rajesh Balamohan
t; on this? > > Thanks, > Jayadeep > > On Wed, Sep 13, 2017 at 3:14 PM, Rajesh Balamohan <rbalamo...@apache.org> > wrote: > >> With "HYBRID" can you try with "hive.orc.cache.use.soft.references=true"? >> That should help in preventing O

Re: Out of Memory while generating ORC Splits

2017-09-13 Thread Rajesh Balamohan
With "HYBRID" can you try with "hive.orc.cache.use.soft.references=true"? That should help in preventing OOM with Hybrid strategy. ~Rajesh.B On Wed, Sep 13, 2017 at 2:54 PM, Jay wrote: > Hi All, > > I am running a simple select query as below > > select distinct

Re: Fail to load table via Tez

2017-07-07 Thread Rajesh Balamohan
You can run *"yarn logs -applicationId application_1499426430661_0113 > application_1499426430661_**0113.log"* to get the app logs. Would suggest you to try with *"hive --hiveconf tez.grouping.max-size=134217728 --hiveconf tez.grouping.min-size=** 134217728" *for running your hive query. You may

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
6/10/how-to- > start-hive-llap-functionality.html. > > > 2016. 12. 26., 오후 2:34, Rajesh Balamohan <rajesh.balamo...@gmail.com> 작성: > > Much easier option is to make use of https://github.com/ > t3rmin4t0r/tez-autobuild (edit/set args in slider-gen.sh). > > ~Rajesh.B

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
Much easier option is to make use of https://github.com/t3rmin4t0r/tez-autobuild (edit/set args in slider-gen.sh). ~Rajesh.B On Mon, Dec 26, 2016 at 11:02 AM, Rajesh Balamohan <rbalamo...@apache.org> wrote: > Here is an example: > > hive --service llap --instances 1 --arg

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
Here is an example: hive --service llap --instances 1 --args "-XX:+UseG1GC -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000" --cache 48000m --executors 8 --iothreads 8 --size 18m --xmx 128000m --loglevel INFO --javaHome /usr/jdk64/jdk1.8.0_77/ This would generate a "run.sh"

Re: tez + union stmt

2016-12-24 Thread Rajesh Balamohan
Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the select query? Assuming you are creating the table in staging.db, it would have created the table location as staging.db/foo (as you have not specified the location). Adding user@hive.apache.org as this is hive related.

Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-13 Thread Rajesh Balamohan
am.di...@gmail.com> > wrote: > >> Congrats Rajesh! :) >> >> On Tue, Dec 13, 2016 at 9:36 PM, Pengcheng Xiong <pxi...@apache.org> >> wrote: >> >>> Congrats Rajesh! :) >>> >>> On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachan

Re: Trace Key-Value pairs

2016-12-04 Thread Rajesh Balamohan
Hi Robert, Tez deals with bytes and does not understand if the data is coming from Hive/Pig/Cascading etc. So in case you print the content from Hive, you would get mostly binary data. For hive, org.apache.hadoop.hive.ql.io.HiveKey, and value would be org.apache.hadoop.io.BytesWritable. Printing

Re: Some Hive on Tez queries don't finish

2016-11-28 Thread Rajesh Balamohan
Are there are any exceptions seen in the app logs (you can ignore the Interrupted exceptions in the logs as you killed the job). It would be helpful if you can share the app logs. ~Rajesh.B On Mon, Nov 28, 2016 at 2:53 PM, Premal Shah wrote: > Hi, > We've been running

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Rajesh Balamohan
Hi Stephen, Can you try by turning off multi-threaded approach by setting "hive.mv.files.thread=0"? You mentioned that your tables tables are in s3, but the external table created was pointing to HDFS. Was that intentional? ~Rajesh.B On Fri, Jul 15, 2016 at 6:58 AM, Stephen Sprague

Re: How the actual "sample data" are implemented when using tez reduce auto-parallelism

2016-02-28 Thread Rajesh Balamohan
"tez.shuffle-vertex-manager.desired-task-input-size" - Determines the amount of desired task input size per reduce task. Default is around 100 MB. "tez.shuffle-vertex-manager.min-task-parallelism" - Min task parallelism that ShuffleVertexManager should honor. I.e, if the client has set it as 100,

Re: Hive on TEZ fails starting

2016-01-06 Thread Rajesh Balamohan
/talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not

Re: Hive on TEZ fails starting

2016-01-05 Thread Rajesh Balamohan
he information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technolo

Re: Hive on TEZ fails starting

2016-01-04 Thread Rajesh Balamohan
Can you try removing double-quotes for "tez.lib.uris" in tez-site.xml (i.e just use hdfs://rhes564:9000/apps/tez-0.7.1-SNAPSHOT/tez-0.7.1- SNAPSHOT.tar.gz)? ~Rajesh.B On Tue, Jan 5, 2016 at 5:30 AM, Mich Talebzadeh wrote: > Hi, > > > > Trying to run Hive on TEZ for the

Re: Hive on TEZ fails starting

2016-01-04 Thread Rajesh Balamohan
t; recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email

Re: config recommendations to boost performance

2015-02-25 Thread Rajesh Balamohan
A query like select name,count(id) from table where date='2015-01-01' or date='2015-01-02' group by (name) takes almost forever and needs to be cancelled after ~30min. It should have ideally scanned only the 2 partitions. Do you see any container launches after which you had to kill the job? Or

Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-22 Thread Rajesh Balamohan
Congratulations Gopal and Szehon On Mon, Jun 23, 2014 at 9:12 AM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl

Re: [ANNOUNCE] New Hive Committers - Prasanth J and Vaibhav Gumashta

2014-04-25 Thread Rajesh Balamohan
Congrats folks. On Apr 25, 2014 8:52 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Congrats, guys! :) On Fri, Apr 25, 2014 at 12:33 AM, Lefty Leverenz leftylever...@gmail.com wrote: Congratulations! -- Lefty On Fri, Apr 25, 2014 at 12:10 AM, Hari Subramaniyan

Re: Vectorizied execution on RCFile

2014-01-10 Thread Rajesh Balamohan
for vectorized query on ORC. Eric *From:* Rajesh Balamohan [mailto:rajesh.balamo...@gmail.com] *Sent:* Wednesday, January 8, 2014 6:47 PM *To:* user@hive.apache.org *Subject:* Vectorizied execution on RCFile Hi All, Vectorization with ORCFile provides amazing performance. Does vectorization

Vectorizied execution on RCFile

2014-01-08 Thread Rajesh Balamohan
Hi All, Vectorization with ORCFile provides amazing performance. Does vectorization work with RCFile as well? As per explain plan of Hive 0.13 (snapshot), it does not use vectorization with RCFile. Any pointers would be appreciated. -- ~Rajesh.B

Re: Hive skewed tables

2013-11-14 Thread Rajesh Balamohan
to worry about which data is skewed and let the framework handle it. On Thu, Nov 14, 2013 at 11:16 AM, Rajesh Balamohan rajesh.balamo...@gmail.com wrote: Thanks Nitin. I have only one partition in this table for testing. I thought within the partition it will scan only certain files based

Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Hi All, I have the following skewed table addresses_1 select id, count(*) c from addresses_1 group by id order by c desc limit 10; 1426246531554806 198477395958492 102641838220181 138947865211331 156483436193429 96411677179771 210082076168033 800174765152421

Re: Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
it will look at all partitions. The setting you have kept is only applicable to join queries as it clearly says skewjoin. Non join queries it does not have an affect. Thanks, Nitin On Thu, Nov 14, 2013 at 6:35 AM, Rajesh Balamohan rajesh.balamo...@gmail.com wrote: Hi All, I have

Re: Impala Query problem

2013-10-22 Thread Rajesh Balamohan
Can you check whether you have connectivity to the port for meta store? On Oct 22, 2013 1:44 PM, Garg, Rinku rinku.g...@fisglobal.com wrote: Hi All, ** ** We have installed cludera *hadoop-2.0.0-mr1-cdh4.2.0* with * hive-0.10.0-cdh4.2.0*. Both are working as desired. We can run any

Hive 12 with Hadoop 2.x with ORC

2013-10-22 Thread Rajesh Balamohan
Hi All, When running Hive 12 with Hadoop 2.x with ORC, I get the following error while converting a table with text file to ORC format table. Any help will be greatly appreciated 2013-10-22 06:50:49,563 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child :

Re: Hive 12 with Hadoop 2.x with ORC

2013-10-22 Thread Rajesh Balamohan
of hadoop 2.x . (but i am certain if this is a protobuf version issue). On Tue, Oct 22, 2013 at 6:53 AM, Rajesh Balamohan rajesh.balamo...@gmail.com wrote: Hi All, When running Hive 12 with Hadoop 2.x with ORC, I get the following error while converting a table with text file to ORC format

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Create the LZO index after moving the file to hive directory (i.e after executing your LOAD DATA* statement). Index file is needed only during job execution and if its not present in the same directory, it would not split the large file. On Thu, Aug 22, 2013 at 7:11 AM, 闫昆

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Good to hear that. On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 yankunhad...@gmail.com wrote: thanks all i move lzo index to hive directory is work fine . thanks 2013/8/22 Rajesh Balamohan rajesh.balamo...@gmail.com Create the LZO index after moving the file to hive directory (i.e after

Re: HBase -- Hive / HCatalog -- PIG

2013-07-10 Thread Rajesh Balamohan
. -Thiruvel From: Rajesh Balamohan rajesh.balamo...@gmail.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Wednesday, July 10, 2013 5:30 PM To: user@hive.apache.org user@hive.apache.org Subject: HBase -- Hive / HCatalog -- PIG Hi All, Has anyone tried out the following

RCFile performance

2013-02-04 Thread Rajesh Balamohan
Hi Experts, I have a large file with 300+ columns. In order to query only few rows efficiently, I am using RCFile format in Hive. I have tried setting the RCFile rowgroup size from default size till 32 MB. ex: set hive.io.rcfile.record.buffer.size = 134217728; However, I do not see major