RE: Partition performance

2013-07-04 Thread Peter Marron
Hi,

Just to check that I understand this problem, my reading suggests that the 
overhead of
many partitions is currently unavoidable. Specifically this means that any 
query on a table that has, let’s say, 10,000 partitions
will be significantly slower (than on un-partitioned table with the “same” 
data) even if
the query explicitly specifies a single partition.
(I mean I _could_ actually do the experiments myself…)

Regards,

Z

From: Owen O'Malley [mailto:omal...@apache.org]
Sent: 02 July 2013 15:52
To: user@hive.apache.org
Subject: Re: Partition performance

On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron 
peter.mar...@trilliumsoftware.commailto:peter.mar...@trilliumsoftware.com 
wrote:
Hi Owen,

I’m curious about this advice about partitioning. Is there some fundamental 
reason why Hive
is slow when the number of partitions is 10,000 rather than 1,000?

The precise numbers don't matter. I wanted to give people a ballpark range that 
they should be looking at. Most tables at 1000 partitions won't cause big slow 
downs, but the cost scales with the number of partitions. By the time you are 
at 10,000 the cost is noticeable. I have one customer who has a table with 1.2 
million partitions. That causes a lot of slow downs.

And the improvements
that you mention are they going to be in version 12? Is there a JIRA raised so 
that I can track them?
(It’s not currently a problem for me but I can see that I am going to need to 
be able to explain the situation.)

I think this is the one they will use: 
https://issues.apache.org/jira/browse/HIVE-4051

-- Owen


Elastic MapReduce Hive Avro SerDe

2013-07-04 Thread Dan Filimon
Hi!

I'm working on a few Avro MapReduce jobs whose output will end up on S3 to
be processed by Hive.
Amazon's latest Hive version [1] is 0.8.1 but Avro support was added in
0.9.1.

I can only find the haivvreo project [2] that supports 0.7.
Is this my only option?

Thanks!

[1] http://aws.amazon.com/elasticmapreduce/faqs/#hive-19
[2] https://github.com/jghoman/haivvreo


RE: Partition performance

2013-07-04 Thread Peter Marron
Sorry, just caught up with the last couple of day’s email and I feel that this 
question
has already been answered fairly comprehensively. Apologies.

Z

From: Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
Sent: 04 July 2013 08:37
To: user@hive.apache.org
Subject: RE: Partition performance

Hi,

Just to check that I understand this problem, my reading suggests that the 
overhead of
many partitions is currently unavoidable. Specifically this means that any 
query on a table that has, let’s say, 10,000 partitions
will be significantly slower (than on un-partitioned table with the “same” 
data) even if
the query explicitly specifies a single partition.
(I mean I _could_ actually do the experiments myself…)

Regards,

Z

From: Owen O'Malley [mailto:omal...@apache.org]
Sent: 02 July 2013 15:52
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Partition performance

On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron 
peter.mar...@trilliumsoftware.commailto:peter.mar...@trilliumsoftware.com 
wrote:
Hi Owen,

I’m curious about this advice about partitioning. Is there some fundamental 
reason why Hive
is slow when the number of partitions is 10,000 rather than 1,000?

The precise numbers don't matter. I wanted to give people a ballpark range that 
they should be looking at. Most tables at 1000 partitions won't cause big slow 
downs, but the cost scales with the number of partitions. By the time you are 
at 10,000 the cost is noticeable. I have one customer who has a table with 1.2 
million partitions. That causes a lot of slow downs.

And the improvements
that you mention are they going to be in version 12? Is there a JIRA raised so 
that I can track them?
(It’s not currently a problem for me but I can see that I am going to need to 
be able to explain the situation.)

I think this is the one they will use: 
https://issues.apache.org/jira/browse/HIVE-4051

-- Owen


How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a
specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
DIRECTORY '/directory_path_name/')?
Thanks for your answers


RE: metastore security issue

2013-07-04 Thread Shunichi Otsuka
One setting was missing:
hive.metastore.authorization.storage.checks   true

This solves the problem



-Original Message-
From: Shunichi Otsuka [mailto:sots...@yahoo-corp.jp] 
Sent: Thursday, July 04, 2013 2:28 PM
To: user@hive.apache.org
Subject: metastore security issue

I am trying to setup hive securely doing authorization at the metastore. 
However there is a problem.
I have relied on hive JIRA HIVE-3705 to decide the configuration which were set 
as below:

javax.jdo.option.ConnectionURLjdbc
javax.jdo.option.ConnectionDriverName java.database.jdbc.mysql
javax.jdo.option.ConnectionUserName   hive
javax.jdo.option.ConnectionPassword   userpass
hive.metastore.execute.setugi true
hive.metastore.uris   
thrift://thriftserver.example.com:9083
hive.metastore.sasl.enabled   true
hive.metastore.kerberos.keytab.file   /etc/grid-keytabs/hive.keytab
hive.metastore.kerberos.principal 
hive/thriftserver.example@example.com
hive.security.metastore.authorization.enabled true
hive.security.metastore.authenticator.manager 
org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
hive.security.metastore.authorization.manager 
org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider
hive.security.authorization.enabled   false


However this does authorize an unauthorized user to drop a table or database 
from the metastore as below:

alice create database db1 location '/user/alice/warehouse/db1.db';
[The permission of db1.db is drwx-- alice:users] However,
bob drop database db1;
OK

This should not happen, so why is it happening? Is my setting wrong or is it 
that the code has not covered this case?
If it is that it has not been implemented yet, what measures have you taken to 
avoid malicious users from dropping other users' database/tables?

Java version  is 1.6.0_33
hive version is 0.11

Thanks


RE: Experience of Hive local mode execution style

2013-07-04 Thread Guillaume Allain
 Local mode really helps with those little delays.

It definately helps for small data sets. But my concerns are about consistency 
of results with distributed modes and some requests that fails only when it is 
triggered (see my description below).



From: Edward Capriolo
Sent: 03 July 2013 00:07
To: user@hive.apache.org
Subject: Re: Experience of Hive local mode execution style

Local mode is fast. In particular older version pf hadoop take a lot of time 
scheduling tasks and a delay betwen map and reduce phase.

Local mode really helps with those little delays.

On Monday, July 1, 2013, Guillaume Allain 
guillau...@blinkbox.commailto:guillau...@blinkbox.com wrote:
 Hi all,

 Would anybody have any comments or feedback about the hive local mode 
 execution? It is advertised as providing a boost to performance for small 
 data sets. It seem to fit nicely when running unit/integration tests on 
 single node or virtual machine.

 My exact questions are the following :

 - How significantly diverge the local mode execution of queries compared to 
 distributed mode? Do the results may be different in some way?

 - I have had encountered error when running complex queries (with several 
 joins/distinct/groupbys) that seem to relate to configuration (see below). I 
 got no exact answers from the ML and I am kind of ready to dive into the 
 source code.

 Any idea where I should aim in order to solve that particular problem?

 Thanks in advance,

 Guillaume

 
 From: Guillaume Allain
 Sent: 18 June 2013 12:14
 To: user@hive.apache.orgmailto:user@hive.apache.org
 Subject: FileNotFoundException when using hive local mode execution style

 Hi all,

 I plan to use  hive local in order to speed-up unit testing on (very) small 
 data sets. (Data is still on hdfs). I switch the local mode by setting the 
 following variables :

 SET hive.exec.mode.local.auto=true;
 SET mapred.local.dir=/user;
 SET mapred.tmp.dir=file:///tmp;
 (plus creating needed directories and permissions)

 Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to 3 
 jobs) with nice performance improvements.

 Unfortunately I ran into a  
 FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-1/1/emptyFile)
  on some more complex query (4 jobs, distinct on top of several joins, see 
 below logs if needed).

 Any idea about that error? What other option I am missing to have a fully 
 fonctional local mode?

 Thanks in advance, Guillaume

 $ tail -50 
 /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo

 2013-06-17 16:10:05,669 INFO  exec.ExecDriver (ExecDriver.java:execute(320)) 
 - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
 2013-06-17 16:10:05,688 INFO  exec.ExecDriver (ExecDriver.java:execute(342)) 
 - adding libjars: 
 file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
 2013-06-17 16:10:05,688 INFO  exec.ExecDriver 
 (ExecDriver.java:addInputPaths(840)) - Processing alias dc
 2013-06-17 16:10:05,688 INFO  exec.ExecDriver 
 (ExecDriver.java:addInputPaths(858)) - Adding input file 
 hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
 2013-06-17 16:10:05,689 INFO  exec.Utilities 
 (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for 
 hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
 2013-06-17 16:10:06,185 INFO  exec.ExecDriver 
 (ExecDriver.java:addInputPath(789)) - Changed input file to 
 file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-1/1
 2013-06-17 16:10:06,226 INFO  exec.ExecDriver 
 (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
 2013-06-17 16:10:06,226 INFO  exec.ExecDriver 
 (ExecDriver.java:addInputPaths(858)) - Adding input file 
 hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_407729448242367/-mr-10004
 2013-06-17 16:10:06,226 INFO  exec.Utilities 
 (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for 
 hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_407729448242367/-mr-10004
 2013-06-17 16:10:06,681 WARN  conf.Configuration 
 (Configuration.java:warnOnceIfDeprecated(808)) - 
 session.idhttp://session.id is deprecated. Instead, use 
 dfs.metrics.session-id
 2013-06-17 16:10:06,682 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76)) - 
 Initializing JVM Metrics with processName=JobTracker, sessionId=
 2013-06-17 16:10:06,688 INFO  exec.ExecDriver 
 (ExecDriver.java:createTmpDirs(215)) - Making Temp Directory: 
 hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_407729448242367/-mr-10002
 2013-06-17 16:10:06,706 WARN  mapred.JobClient 
 (JobClient.java:copyAndConfigureFiles(704)) - Use 

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
will hive -e query  filename  or hive -f query.q  filename will do ?

you specially want it to write into a named file on hdfs only?


On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN
matouk.iftis...@ysance.comwrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Bertrand Dechoux
The question is what is the volume of your output. There is one file per
output task (map or reduce) because that way each can write it
independently and in parallel. That's how mapreduce work. And except by
forcing the number of tasks to 1, there is no certain way to have one
output file.

But indeed if the volume is low enough, you could also capture the standard
output into a local file like Nitin described.

Bertrand


On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




-- 
Bertrand Dechoux


Re: Elastic MapReduce Hive Avro SerDe

2013-07-04 Thread Ruslan Al-Fakikh
Hi.

My guess is that you can try to look it up in their docs or mailing lists
(Amazon EMR). IIRC, CDH had the patch for Avro+Hive before it was included
in Hive itself, so Amazon EMR can have similar patches...

Ruslan


On Thu, Jul 4, 2013 at 12:20 PM, Dan Filimon dangeorge.fili...@gmail.comwrote:

 Hi!

 I'm working on a few Avro MapReduce jobs whose output will end up on S3 to
 be processed by Hive.
 Amazon's latest Hive version [1] is 0.8.1 but Avro support was added in
 0.9.1.

 I can only find the haivvreo project [2] that supports 0.7.
 Is this my only option?

 Thanks!

 [1] http://aws.amazon.com/elasticmapreduce/faqs/#hive-19
 [2] https://github.com/jghoman/haivvreo



Hortonworks HDP 1.3 vs. HDP 1.1

2013-07-04 Thread Kumar Chinnakali
Hi Hive Team,

Currently am developing and testing the Hive queries in HDP 1.1 with Hadoop 
1.0.3 and Hive 0.9.0

However, it seems that my production is going to get upgraded to HDP 1.3 in 
near future. Will it will impact with respect to design, optimization?

Please suggest.

Regards, Kumar Chinnakali

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Michael Malak
I have found that for output larger than a few GB, redirecting stdout results 
in an incomplete file.  For very large output, I do CREATE TABLE MYTABLE AS 
SELECT ... and then copy the resulting HDFS files directly out of 
/user/hive/warehouse.
 


 From: Bertrand Dechoux decho...@gmail.com
To: user@hive.apache.org 
Sent: Thursday, July 4, 2013 7:09 AM
Subject: Re: How Can I store the Hive query result in one file ?
  


The question is what is the volume of your output. There is one file per output 
task (map or reduce) because that way each can write it independently and in 
parallel. That's how mapreduce work. And except by forcing the number of tasks 
to 1, there is no certain way to have one output file.

But indeed if the volume is low enough, you could also capture the standard 
output into a local file like Nitin described.

Bertrand



On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar
 


-- 
Bertrand Dechoux 

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Thanks for your responses,
effctively  the answer of Bertrand make this possible: the set of hive
properities below froce thet job to write the hive result in one file
whithout specifing the name (_0) :
 set hive.exec.reducers.max = 1;

set mapred.reduce.tasks = 1;

for Nitin, I want to store the results of SELECT not the stdout (log) of
execution of the query, is this applicable for the results of SELECT?




2013/7/4 Michael Malak michaelma...@yahoo.com

 I have found that for output larger than a few GB, redirecting stdout
 results in an incomplete file.  For very large output, I do CREATE TABLE
 MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out
 of /user/hive/warehouse.

*From:* Bertrand Dechoux decho...@gmail.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, July 4, 2013 7:09 AM
 *Subject:* Re: How Can I store the Hive query result in one file ?

 The question is what is the volume of your output. There is one file per
 output task (map or reduce) because that way each can write it
 independently and in parallel. That's how mapreduce work. And except by
 forcing the number of tasks to 1, there is no certain way to have one
 output file.

 But indeed if the volume is low enough, you could also capture the
 standard output into a local file like Nitin described.

 Bertrand


 On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




 --
 Bertrand Dechoux





Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
the one i said does not work on hdfs files. Its just one way to write the
stdlog to a file.

I am not sure if hive allows you named files for output and the above
settings will make your query run really slow if you have large dataset.

if you are really specific on having a filename then for now I am not aware
if hive supports it. I did a quick search but did not find anything useful.
If you need a quick way to get to your solution then pig supports the store
function and its written to a named file.

i will search in depth and see if there is anything in configurations of
hive


On Thu, Jul 4, 2013 at 8:50 PM, Matouk IFTISSEN
matouk.iftis...@ysance.comwrote:

 Thanks for your responses,
 effctively  the answer of Bertrand make this possible: the set of hive
 properities below froce thet job to write the hive result in one file
 whithout specifing the name (_0) :
  set hive.exec.reducers.max = 1;

 set mapred.reduce.tasks = 1;

 for Nitin, I want to store the results of SELECT not the stdout (log) of
 execution of the query, is this applicable for the results of SELECT?




 2013/7/4 Michael Malak michaelma...@yahoo.com

 I have found that for output larger than a few GB, redirecting stdout
 results in an incomplete file.  For very large output, I do CREATE TABLE
 MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out
 of /user/hive/warehouse.

*From:* Bertrand Dechoux decho...@gmail.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, July 4, 2013 7:09 AM
 *Subject:* Re: How Can I store the Hive query result in one file ?

 The question is what is the volume of your output. There is one file per
 output task (map or reduce) because that way each can write it
 independently and in parallel. That's how mapreduce work. And except by
 forcing the number of tasks to 1, there is no certain way to have one
 output file.

 But indeed if the volume is low enough, you could also capture the
 standard output into a local file like Nitin described.

 Bertrand


 On Thu, Jul 4, 2013 at 12:38 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar




 --
 Bertrand Dechoux






-- 
Nitin Pawar


Re: Experience of Hive local mode execution style

2013-07-04 Thread Edward Capriolo
Since you are launching locally you have to account for this.
1) If multiple jobs are running they become a burden on the local memory of
the system
2) Your local parameters like java heap size Xmx or mapred.child.java.opts
may be getting applied locally, if you are doing distinct queries they may
use a lot of memory or spill to disk quite often

However what you are reporting does not look like a memory error, although
distinct queries can become fairly intense. If you can repeat the problem
with empty tables it is likely a bug but if you can't it just means that
query takes to much memory for local mode.


On Thu, Jul 4, 2013 at 6:21 AM, Guillaume Allain guillau...@blinkbox.comwrote:

   Local mode really helps with those little delays.

 It definately helps for small data sets. But my concerns are about
 consistency of results with distributed modes and some requests that fails
 only when it is triggered (see my description below).


  --
 *From:* Edward Capriolo
 *Sent:* 03 July 2013 00:07
 *To:* user@hive.apache.org
 *Subject:* Re: Experience of Hive local mode execution style

  Local mode is fast. In particular older version pf hadoop take a lot of
 time scheduling tasks and a delay betwen map and reduce phase.

 Local mode really helps with those little delays.

 On Monday, July 1, 2013, Guillaume Allain guillau...@blinkbox.com wrote:
  Hi all,
 
  Would anybody have any comments or feedback about the hive local mode
 execution? It is advertised as providing a boost to performance for small
 data sets. It seem to fit nicely when running unit/integration tests on
 single node or virtual machine.
 
  My exact questions are the following :
 
  - How significantly diverge the local mode execution of queries compared
 to distributed mode? Do the results may be different in some way?
 
  - I have had encountered error when running complex queries (with
 several joins/distinct/groupbys) that seem to relate to configuration (see
 below). I got no exact answers from the ML and I am kind of ready to dive
 into the source code.
 
  Any idea where I should aim in order to solve that particular problem?
 
  Thanks in advance,
 
  Guillaume
 
  
  From: Guillaume Allain
  Sent: 18 June 2013 12:14
  To: user@hive.apache.org
  Subject: FileNotFoundException when using hive local mode execution style
 
  Hi all,
 
  I plan to use  hive local in order to speed-up unit testing on (very)
 small data sets. (Data is still on hdfs). I switch the local mode by
 setting the following variables :
 
  SET hive.exec.mode.local.auto=true;
  SET mapred.local.dir=/user;
  SET mapred.tmp.dir=file:///tmp;
  (plus creating needed directories and permissions)
 
  Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to
 3 jobs) with nice performance improvements.
 
  Unfortunately I ran into a
 FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-1/1/emptyFile)
 on some more complex query (4 jobs, distinct on top of several joins, see
 below logs if needed).
 
  Any idea about that error? What other option I am missing to have a
 fully fonctional local mode?
 
  Thanks in advance, Guillaume
 
  $ tail -50
 /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo
 
  2013-06-17 16:10:05,669 INFO  exec.ExecDriver
 (ExecDriver.java:execute(320)) - Using
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
  2013-06-17 16:10:05,688 INFO  exec.ExecDriver
 (ExecDriver.java:execute(342)) - adding libjars:
 file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
  2013-06-17 16:10:05,688 INFO  exec.ExecDriver
 (ExecDriver.java:addInputPaths(840)) - Processing alias dc
  2013-06-17 16:10:05,688 INFO  exec.ExecDriver
 (ExecDriver.java:addInputPaths(858)) - Adding input file
 hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
  2013-06-17 16:10:05,689 INFO  exec.Utilities
 (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
 hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
  2013-06-17 16:10:06,185 INFO  exec.ExecDriver
 (ExecDriver.java:addInputPath(789)) - Changed input file to
 file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-1/1
  2013-06-17 16:10:06,226 INFO  exec.ExecDriver
 (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
  2013-06-17 16:10:06,226 INFO  exec.ExecDriver
 (ExecDriver.java:addInputPaths(858)) - Adding input file
 hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_407729448242367/-mr-10004
  2013-06-17 16:10:06,226 INFO  exec.Utilities
 (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
 

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Edward Capriolo
Normally if use set mapred.reduce.tasks=1 you get one output file. You can
also look at
*hive*.*merge*.*mapfiles*, mapred.reduce.tasks, hive.merge.reducefiles also
you can use a separate tool https://github.com/edwardcapriolo/filecrush


On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

 will hive -e query  filename  or hive -f query.q  filename will do ?

 you specially want it to write into a named file on hdfs only?


 On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN 
 matouk.iftis...@ysance.com wrote:

 Hello Hive users,
 Is there a manner to store the Hive  query result (SELECT *.) in a
 specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL
 DIRECTORY '/directory_path_name/')?
 Thanks for your answers




 --
 Nitin Pawar



Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar


Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop


Adding to that

- Multiple files can be concatenated from the directory like
Example:  cat 0-0 00-1 0-2  final




 From: Raj Hadoop hadoop...@yahoo.com
To: user@hive.apache.org user@hive.apache.org; matouk.iftis...@ysance.com 
matouk.iftis...@ysance.com 
Sent: Friday, July 5, 2013 12:17 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


 

 hive  set hive.io.output.fileformat=CSVTextFile;
 hive  insert overwrite local directory '/usr/home/hadoop/da1/' select * from 
customers

*** customers is a Hive table



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org 
Sent: Friday, July 5, 2013 12:10 AM
Subject: Re: How Can I store the Hive query result in one file ?
 


Normally if use set mapred.reduce.tasks=1 you get one output file. You can also 
look at
hive.merge.mapfiles, mapred.reduce.tasks, hive.merge.reducefiles also you can 
use a separate tool https://github.com/edwardcapriolo/filecrush




On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote:

will hive -e query  filename  or hive -f query.q  filename will do ? 


you specially want it to write into a named file on hdfs only? 



On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.com 
wrote:

Hello Hive users,
Is there a manner to store the Hive  query result (SELECT *.) in a 
specfique  and alone file (given the file name) like (INSERT OVERWRITE LOCAL 
DIRECTORY '/directory_path_name/')?
Thanks for your answers






-- 
Nitin Pawar


Re: Hortonworks HDP 1.3 vs. HDP 1.1

2013-07-04 Thread Owen O'Malley
For HDP specific questions, you should use the Hortonworks lists:
http://hortonworks.com/community/forums/forum/hive/

Your question is about the difference between Hive 0.9 and Hive 0.11.

The big additions are:
  Decimal type
  ORC files
  Analytics functions - cube roll up
  Windowing functions
  Join improvements

There are some blog entries for Hive 0.10 and Hive 0.11.

http://hortonworks.com/blog/apache-hive-0-10-0-is-now-available/
http://hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered

-- Owen


On Thu, Jul 4, 2013 at 6:26 AM, Kumar Chinnakali 
kumar_chinnak...@infosys.com wrote:

  Hi Hive Team,

 ** **

 Currently am developing and testing the Hive queries in HDP 1.1 with
 Hadoop 1.0.3 and Hive 0.9.0

 ** **

 However, it seems that my production is going to get upgraded to HDP 1.3
 in near future. Will it will impact with respect to design, optimization?*
 ***

 ** **

 Please suggest. 

 ** **

 Regards, Kumar Chinnakali

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***