Re: Query Using Stats

2014-05-16 Thread Edward Capriolo
Hive does not know that the values of column `seconds` and partition `range` or related. Hive can only use the WHERE clause to remove partitions that do not match the range criteria. All the data inside the partition is not ordered in any way so the minimum seconds and maximum seconds could be in

Re: Hive UDF error

2014-05-16 Thread Jason Dere
What version of Hive are you running? It looks like the error you're seeing might be from Hive trying to retrieve the error message from the logs and might not be related to the actual error. Might want to check the logs for the Hadoop task that was run as part of this query, to see if that ha

Re: Query Using Stats

2014-05-16 Thread Bryan Jeffrey
Prasanth, I had the correct flag enabled (see query in original email). Issue is that it does not appear to be correctly using partition stats for the calculation. Table is an orc table. It appears in the log that stats are being calculated, but does not appear to be working when queries are run a

Re: Running a hive script from Java via API.

2014-05-16 Thread Edward Capriolo
Actually since hive 13 you seem to need a driver and a username and password. The username and pw can be blank or whatever but DriverManager.getConnection(url) does not seem to work any more. On Fri, May 16, 2014 at 5:11 PM, Jay Vyas wrote: > So i guess your saying "yes : just use the JDBC driv

Hive UDF error

2014-05-16 Thread Leena Gupta
Hi, I'm trying to create a function that generates a UUID, want to use it in a query to insert data into another table. Here is the function: package com.udf.example; import java.util.UUID; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import o

Re: Query Using Stats

2014-05-16 Thread Prasanth Jayachandran
Bryan, The flag you are looking for is hive.compute.query.using.stats. By default this optimization is disabled. You might need to enable it to use it. Also the min/max/sum metadata are not looked up from the file but instead from metastore. Although file formats like ORC contains stats, they a

Count(distinct col) in Windowing

2014-05-16 Thread John Omernik
Is there a reason why I can't use select col1, col2, count(distinct col3) over (PARTITION by col4 order by col5 ROWS BETWEEN 5 PRECEDING AND FOLLOWING) as col1 from table ? I am trying to see for any given window if there is a lot of variability in a col4, and it just doesn't work with count dis

RE: LEFT SEMI JOIN

2014-05-16 Thread java8964
>From Hive manual, there is only "left semi join", no "semi join", nor "inner >semi join". >From the Database world, it is just a traditional name for this kind of join: >"LEFT semi join", as a reminder to the reader that the resultset comes out >from the LEFT table ONLY. Yong > From: lukas.e..

Re: Hive UDF error

2014-05-16 Thread Edward Capriolo
try public class Uuid extends UDF{ On Thu, May 15, 2014 at 2:07 PM, Leena Gupta wrote: > Hi, > > I'm trying to create a function that generates a UUID, want to use it in a > query to insert data into another table. > > Here is the function: > > package com.udf.example; > > import java.util.UUI

Re: Running a hive script from Java via API.

2014-05-16 Thread Jay Vyas
So i guess your saying "yes : just use the JDBC driver with the jdbc:hive2://", and that is the equivalent of PigServer (with caveat that it can't run a hive script).

Re: Hive Table : Read or load data in Hive Table from plural subdirectories

2014-05-16 Thread Matouk IFTISSEN
files are all the same format (.gz) but they are in different subdirectories !! my problematique is : I want to do an import by day from oracle to hdfs (in directory : hdfs_my_parent_directory/import_dir_day1/part_data_import.gz hdfs_my_parent_directory/import_dir_day2/part_data_import.gz ...

Re: ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread Prasanth Jayachandran
With Hive 0.13 the ORC memory issue is mitigated because of this optimization https://issues.apache.org/jira/browse/HIVE-6455. This optimization is enabled by default. But having 3283 columns is still huge. So I would still recommend reducing the default compression (256KB) buffer size to a lowe

Re: ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread John Omernik
When I created the table, I had to reduce the orc.compress.size quite a bit to make my table with many columns work. This was on Hive 0.12 (I thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge) The default of orc.compress size is quite a bit larger ( think in the 268k range)

Re: ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread Premal Shah
Sorry for the double post. I did not show up for a while and then I could not get to the archives page, so I thought I'd needed to resend. On Fri, May 16, 2014 at 12:54 AM, Premal Shah wrote: > I have a table in hive stored as text file with 3283 columns. All columns > are of string data type. >

Re: Running a hive script from Java via API.

2014-05-16 Thread Szehon Ho
It is not really recommended anymore, as HiveServer2 with a client is the where the community is focusing now. And JDBC-client supports mode where the HiveServer2 is embedded. Some older product like HiveCLI or Beeline did like what you said, but again this mode might not be fully supported. For

ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread Premal Shah
I have a table in hive stored as text file with 3283 columns. All columns are of string data type. I'm trying to convert that table into an orc file table using this command *create table orc_table stored as orc as select * from text_table;* This is the setting under mapred-site.xml mapred.

Running a hive script from Java via API.

2014-05-16 Thread Jay Vyas
Hi hive. Is there an API akin to PigServer, which allows you to run a hive script from Java directly, using hive embedded mode, without use of JDBC? -- Jay Vyas http://jayunit100.blogspot.com

RE: Metastore service

2014-05-16 Thread Dima Machlin
Thanks a lot Bryan that did the trick. From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com] Sent: Wednesday, May 14, 2014 8:47 PM To: user@hive.apache.org Subject: Re: Metastore service Dima, You can simply set the variable in your hive-site.xml: datanucleus.connectionPool.maxPoolSize 20

Re: java.lang.NoSuchFieldError: HIVE_ORC_FILE_MEMORY_POOL when inserting data to ORC table

2014-05-16 Thread Edward Capriolo
add jar /home/dguser/hive-0.12.0/lib/hive-exec-0.12.0.jar; Having to do the above ^ command is a strong indication that your setup is not correct. Hive-exec is the map-reduce job jar should should not need to add it as a secondary jar. On Fri, May 9, 2014 at 9:18 PM, John Zeng wrote: > Hi, A

Hive 0.13.0 Memory Leak

2014-05-16 Thread Bryan Jeffrey
All, We are running Hadoop 2.2.0 and Hive 0.13.0. One typical application is to load data (as text), and then convert that data to ORC to decrease query time. When running these processes we are seeing significant memory leaks (leaking 4 GB in about 5 days). We're running HiveServer2 with the f

Query Using Stats

2014-05-16 Thread Bryan Jeffrey
All, I am executing the following query using Hadoop 2.2.0 and Hive 0.13.0. /opt/hadoop/latest-hive/bin/beeline -u jdbc:hive2://server:10002/database -n root --hiveconf hive.compute.query.using.stats=true -e "select min(seconds), max(seconds), range from data where range > 1400204700 group by ran

ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread Premal Shah
I have a table in hive stored as text file with 3283 columns. All columns are of string data type. I'm trying to convert that table into an orc file table using this command *create table orc_table stored as orc as select * from text_table;* This is the setting under mapred-site.xml mapred.