Re: Hive too slow?

2011-03-07 Thread Ajo Fod
hmm I don't know of such a place ... but if I had to debug, I'd try to understand the following: 1) are the underlying files zipped/compressed ... that ususally makes it slower. 2) are the files located on the hard drive or hdfs? 3) are all the cores being used? ... check number of reduce and map

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-07 Thread John Sichi
For native tables, Hive reads rows directly from HDFS. For HBase tables, it has to go through the HBase region servers, which reconstruct rows from column families (combining cache + HDFS). HBase makes it possible to keep your table up to date in real time, but you have to pay an overhead cost

Re: Hive too slow?

2011-03-07 Thread Igor Tatarinov
Most likely, Hadoop's memory settings are too high and Linux starts swapping. You should be able to detect that too using vmstat. Just a guess. On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod ajo@gmail.com wrote: hmm I don't know of such a place ... but if I had to debug, I'd try to understand

Re: Date function unix_timestamp() with input values null doen't work as desired

2011-03-08 Thread Bejoy Ks
Thanks Viral. That was a good piece of info.COALESCE() was some thing new to me as I'm not from a db background. I googled more on COALESCE() and found it really good,However that didn't resolve my problem. I got it resolved by re framing my query this way and

How to support SQL NOT IN function in Hive QL

2011-03-08 Thread Bejoy Ks
Hi Experts I'm facing a hurdle in transforming a SQL query to equivalent Hive QL with SQL NOT IN functionality. My SQL query would like this INSERT INTO Table1 (T1_field1,T1_field2_ID,T1_field3,T1_field4) SELECT field1,field2,field3,field4 FROM Table2 JOIN Table3 T3 WHERE

Re: How to support SQL NOT IN function in Hive QL

2011-03-08 Thread Bejoy Ks
Thanks Rekha. I went with your first option 'LEFT OUTER JOIN' and it worked like a charm. The second one was not fitting for my case as it was popping out parse errors due to multiple columns separated by comma coming under the same NOT IN clause {(field1,field2,field3,field4 )NOT IN (SELECT

Re: How to support SQL NOT IN function in Hive QL

2011-03-08 Thread Bejoy Ks
Thanks Rekha for such a quick response. A few more doubts out here If I use the comparison operators on the dates directly would they give a desired result ?,as the dates are stored in Hive tables as String Also in the comparison of dates if we use the unix_timestamp() it would consider the

Is it possible to run a query over multiple cores for a (small) dataset in local mode ?

2011-03-08 Thread Philippe Girolami
Hi, I am testing the Hive 0.6 on parts of my data set. It's only a couple GB of log files that I am reading through a custom SerDe. The table is partitionned. I am using Hadoop local mode for testing. When I run simple Group By queries (4 MR jobs), I am getting logs such as - map : 100% -

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-08 Thread Otis Gospodnetic
Hi, John, are there plans or specific JIRA issues related to this particular performance hit that you or somebody else is working on and that those of us interested in performance improvements when Hive points to external tables in HBase should watch? Thanks, Otis Sematext ::

hive hbase handler metadata NullPointerException

2011-03-09 Thread Bennie Schut
Hi All, I was trying out hbase 0.89.20100924 with hive trunk with hadoop 0.20.2 When I'm running a simple insert I get this: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at

HIVE Query : concurrent map reduce jobs

2011-03-09 Thread vaibhav negi
Hi , I am using hadoop 0.20.2 . I have configured for fair scheduler in hadoop . Now when i fire two queries simultaneously , each running map reduce , second query waits for map/reduce task slots until first query release some . many a times queries are over big data ~ 500 GB , so second query

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-09 Thread John Sichi
There's one here specifically for the Hive portion, but really a full-stack system profile is needed for deciding where to attack it: https://issues.apache.org/jira/browse/HIVE-1231 I don't know of anyone currently working in this area. JVS On Mar 8, 2011, at 9:51 PM, Otis Gospodnetic wrote:

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-09 Thread Otis Gospodnetic
Hi, Biju's example shows a factor of 5 decrease in performance when Hive points to HBase tables. Does anyone know how much this factor varies? Is if often closer to 1 or is is more often close to 10? Just trying to get a better feel for this... Thanks, Otis Sematext ::

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-09 Thread John Sichi
Factor of 5 closely matches the results I got when I was testing. JVS On Mar 9, 2011, at 1:23 PM, Otis Gospodnetic wrote: Hi, Biju's example shows a factor of 5 decrease in performance when Hive points to HBase tables. Does anyone know how much this factor varies? Is if often closer

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-09 Thread Edward Capriolo
On Wed, Mar 9, 2011 at 4:31 PM, John Sichi jsi...@fb.com wrote: Factor of 5 closely matches the results I got when I was testing. JVS On Mar 9, 2011, at 1:23 PM, Otis Gospodnetic wrote: Hi, Biju's example shows a factor of 5 decrease in performance when Hive points to HBase tables.

Hive not reflecting hdfs data

2011-03-10 Thread abhishek pathak
Hi, I am a hive newbie.I am managing a setup where data is regularly fed into HDFS using flume.However, hive does not show the data that is recently added to the HDFS.It used to earlier,but somehow its not updating now.The queries i fire all give answers to the old HDFS and do not reflect the

RE: Hive not reflecting hdfs data

2011-03-10 Thread Vivek Mishra
Please check for metastore_db location. That should help Vivek From: abhishek pathak [mailto:forever_yours_a...@yahoo.co.in] Sent: Thursday, March 10, 2011 5:05 PM To: Hive mailing list Subject: Hive not reflecting hdfs data Hi, I am a hive newbie.I am managing a setup where data is regularly

WARN logs using embedded Hive JDBC on CDH3B4

2011-03-10 Thread Andrew Harrison
We're upgrading our Hadoop cluster (including Hive) to CDH3B4 on our dev cluster. After doing so, we see some concerning WARN logs, when connecting via the embedded hive JDBC connector. It doesn't seem to cause any obvious ill-effects, but I thought I'd post to this list to see if anyone had

Re: WARN logs using embedded Hive JDBC on CDH3B4

2011-03-10 Thread Carl Steinbach
Hi Andrew, You can safely ignore these warnings. Datanucleus is complaining because the metastore mapping file fails to validate against the JDO DTD, apparently because we failed to put the elements in a certain specific order. As far as I can tell Datanucleus ignores the validation errors, so

UDAF documentation

2011-03-10 Thread Christopher, Pat
Hi Guys, I'm writing a UDAF to run against hive 0.5 or hive 0.7. The documentation I can find says to implement UDAFEvaluator and ensure that you implement init() , aggregate() and evaluate(). However, all of the examples I can find implement init(), iterate(), merge(), terminatePartial() and

Re: UDAF documentation

2011-03-10 Thread Edward Capriolo
On Thu, Mar 10, 2011 at 8:27 PM, Christopher, Pat patrick.christop...@hp.com wrote: Hi Guys, I’m writing a UDAF to run against hive 0.5 or hive 0.7.  The documentation I can find says to implement UDAFEvaluator and ensure that you implement init() , aggregate() and evaluate().  However, all

RE: UDAF documentation

2011-03-10 Thread Steven Wong
Take a look at http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy, in case you haven't found it already. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, March 10, 2011 6:18 PM To: user@hive.apache.org Cc: Christopher, Pat Subject: Re: UDAF

OLAP Tool for Hive

2011-03-10 Thread hadoop n00b
Hi, Just curious, has anybody experience on or is aware of an OLAP tool that works directly on Hive? OLAP4Cloud claims that ability on HBase but I don't have much details on that. Cheers!

Re: OLAP Tool for Hive

2011-03-10 Thread Joe Andrew Key
Pentaho On 03/10/2011 09:40 PM, hadoop n00b wrote: Hi, Just curious, has anybody experience on or is aware of an OLAP tool that works directly on Hive? OLAP4Cloud claims that ability on HBase but I don't have much details on that. Cheers!

Re: Hive too slow?

2011-03-11 Thread Ajo Fod
I'd say start with something simpler ... say how about converting all files to tab delimited text files in uncompressed format and run the same query on the new table. If that works, you know the problem is with the .seq files ... if not there is something funky about the configuration or the

Re: Hive too slow?

2011-03-11 Thread Edward Capriolo
You should also look for the basics look at your job tracker web interface. One of you nodes can be mis-configured. At times a job may sit on a node for several minutes before it fails and moves to the other. You also want to make sure none of the Hadoop components are having memory related JVM

Re: OLAP Tool for Hive

2011-03-11 Thread Swinney, Austin
Hi hadoop n00b: Jasperforge has some connectors for Hadoop. I have not tried them out yet as I am new to the JasperServer platform. It is on my todo list. It's presence on JasperForge suggests you can run it with the community edition of their JasperServer (and enterprise too).

Re: UDAF documentation

2011-03-11 Thread Aurora Skarra-Gallagher
Hadoop: The Definitive Guide has a good section on this. Chapter 12: Hive: User Defined Functions. It has a diagram that shows how things are called and when. The example I'm looking at shows this sequence: (first instance) init() iterate(1) iterate(2) iterate(3) terminatePartial() (second

Re: UDAF documentation

2011-03-11 Thread Aurora Skarra-Gallagher
Hi, Did you actually call those functions directly from your unit tests? I'm looking for examples of that working, but all I see reference to are tests to make sure the query produces the expected output (rather than directly testing the UDAF). -Aurora On Mar 11, 2011, at 3:44 PM,

Re: In UDAF, possible for terminatePartial to be called without init?

2011-03-11 Thread Aurora Skarra-Gallagher
Anyone know the answer to this? Thanks, Aurora On Feb 15, 2011, at 8:53 AM, Aurora Skarra-Gallagher wrote: Hi, I wrote a simple UDAF for Hive 0.6 and I had to include null checks in terminatePartial even though the object should never be null if init is always called before

Re: UDAF documentation

2011-03-11 Thread Aurora Skarra-Gallagher
I'm looking for something like this, but for a UDAF instead of a UDF: http://svn.apache.org/repos/asf/hive/branches/branch-0.7/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFDateDiff.java -Aurora On Mar 11, 2011, at 4:44 PM, Aurora Skarra-Gallagher wrote: Hi, Did you actually call those

Re: UDAF documentation

2011-03-11 Thread Aurora Skarra-Gallagher
I'll just keep responding to myself. ;) I ended up figuring out how to do it. I just used junit and called init, iterate, terminatePartial, etc from inside the unit test. After knowing a typical flow of function calls (as I mentioned below), the main other gotcha is making sure to have a new

Re: UDAF documentation

2011-03-11 Thread Christopher, Pat
Hey sorry, I was moving house all evening and driving around town :) I did not automate my unit tests, I created a small frame app to test each function and make sure it responded appropriately. Good to know junit will do it. For your other question, did you include a call to init() in your

Syntax for using json_tuple() for accessing array elements in JSON

2011-03-12 Thread A A
Hi All, Iam using hive-0.7.0. I want to use some like : eg: select a.timestamp, b.* from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2; For array elements in JSON. So I tried something like json_tuple(a.appevent, 'arr[0].property1', 'arr[1].property1').

Re: UDAF documentation

2011-03-12 Thread Aurora Skarra-Gallagher
No problem. Yeah, I called init first for each instance. -Aurora On Mar 11, 2011, at 11:35 PM, Christopher, Pat patrick.christop...@hp.com wrote: Hey sorry, I was moving house all evening and driving around town :) I did not automate my unit tests, I created a small frame app to test each

Restoring configuration variables after issuing set

2011-03-12 Thread Tim Robertson
Hi all Can someone please tell me how to achieve the following in a single hive script? set original_value = mapred.reduce.tasks; set mapred.reduce.tasks=1; ... do stuff set mapred.reduce.tasks=original_value; It is the first and last lines that don't work - is it possible? Thanks, Tim

Restoring configuration variables after issuing set

2011-03-12 Thread Tim Robertson
Hi all Can someone please tell me how to achieve the following in a single hive script? set original_value = mapred.reduce.tasks; set mapred.reduce.tasks=1; ... do stuff set mapred.reduce.tasks=original_value; It is the first and last lines that don't work - is it possible? Thanks, Tim

Re: Restoring configuration variables after issuing set

2011-03-12 Thread Edward Capriolo
On Sat, Mar 12, 2011 at 1:46 PM, Tim Robertson timrobertson...@gmail.com wrote: Hi all Can someone please tell me how to achieve the following in a single hive script? set original_value = mapred.reduce.tasks; set mapred.reduce.tasks=1; ... do stuff set

don't use DBCP to connection mysql

2011-03-13 Thread lei liu
I know hive mestore use DBCP to connection mysql, I want to use short connection to connection mysql, how can I close DBCP configuration? Thanks, LiuLei**

Re: OLAP Tool for Hive

2011-03-14 Thread hadoop n00b
Thanks for the response Joe and Austin. Will check out Pentaho and Jasperforge. I knew about Pentaho's integration with Hadoop on PDI but did not know if I could create reports directly from Hive using the BI server. Will check it out. Cheers! On Fri, Mar 11, 2011 at 8:47 PM, Swinney, Austin

Root/ Fetch Stage

2011-03-14 Thread Joerg Schad
Hi, when exploring the Hive Explain statement we were wondering about the different stages. So here two questions regarding the below Explain statement 1. Why are there two root stages? What exactly does root stage mean (i assume it meanst there are no predecessors)? 2. What exactly is a Fetch

Re: optimizing Hive/Hadoop for latency

2011-03-14 Thread Andrew Hitchcock
Hi, Quick note on #3. In order to make mapred.reduce.tasksperslot work, you need to completely remove all mentions of mapred.reduce.tasks from your configuration (including removing it from the default config file). Tasksperslot only takes effect as a last resort. Andrew On Wed, Mar 9, 2011 at

Re: In UDAF, possible for terminatePartial to be called without init?

2011-03-15 Thread Amareshwari Sri Ramadasu
init() is called for all the aggregation evaluators, then terminatePartial() is called. In your code, init() function is not overriding GenericUDAFEvaluator.init(Mode m, ObjectInspector[] parameters). Hive calls GenericUDAFEvaluator.init. Was the signature of the method wrong in your code?

Re: Specifying a double precision in HiveQL

2011-03-15 Thread Amareshwari Sri Ramadasu
You can use UDF round(x,d) which rounds of x to d decimal places Thanks Amareshwari On 2/25/11 1:01 AM, Aurora Skarra-Gallagher aur...@yahoo-inc.com wrote: Hi, I have a Hive query that has a statement like this (sum(itemcount) / count(item)). I want to specify only two digits of precision

Changing the Hive Tracking URL in job output

2011-03-15 Thread Swinney, Austin
Hi, I've been searching for a info on changing the Hive tracking url that shows up during processing, for example: Starting Job = job_201103091819_0073, Tracking URL = http://some_internal_cluster_url:50030/jobdetails.jsp?jobid=job_201103091819_0073 In this case, it is giving hostname that

Re: Changing the Hive Tracking URL in job output

2011-03-15 Thread Swinney, Austin
I figured as much. Thanks for the reply, Edward. On Mar 15, 2011, at 11:48 AM, Edward Capriolo wrote: On Tue, Mar 15, 2011 at 11:42 AM, Swinney, Austin austin.swin...@vimeo.com wrote: Hi, I've been searching for a info on changing the Hive tracking url that shows up during processing,

maven package for Hive ?

2011-03-15 Thread Igor Tatarinov
Does anyone know a maven repository with one? I only need the basic stuff for writing UDFs. Right now, I pull Cloudera's maven package but that contains a lot of stuff I don't need. Thanks.

Re: hello everybody,i am fresher,i meet a problem,please help.

2011-03-16 Thread 徐厚道
i have dealed with problem ,indeed ,is commons-lang-2.4.jar confict with commons-lang-2.1.jar. my hive run with the nutch's(v1.0) hadoop enviroment. thanks everybody 2011/3/7 徐厚道 xuhou...@gmail.com my eng is very poor. i set up hive env use

Fwd: Hadoop error 2 while joining two large tables

2011-03-16 Thread hadoop n00b
Hello, I am trying to execute a query that joins two large tables (3 million and 20 million records). I am getting the Hadoop error code 2 during execution. This happens mainly while the reducers are running. Sometimes the reducers complete 100% and then the error comes. The logs talk about

RE: Hadoop error 2 while joining two large tables

2011-03-16 Thread Christopher, Pat
Are you using Hive on top of Hadoop or writing a raw Hadoop job? This is a the hive list so I'm going to assumed you're running hive... can you send your HiveQL query along? Pat From: hadoop n00b [mailto:new2h...@gmail.com] Sent: Wednesday, March 16, 2011 3:33 AM To: user@hive.apache.org

Problem with Hive HBase Integration - Running Mapper task

2011-03-16 Thread Abhijit Sharma
Hi, I am trying to connect the hive shell running on my laptop to a remote hadoop / hbase cluster and test out the HBase/Hive integration. I manage to connect and create the table in hbase from remote Hive shell. I am also passing the auxpath parameter to the shell (specifying the Hive/HBase

Re: Problem with Hive HBase Integration - Running Mapper task

2011-03-16 Thread Edward Capriolo
On Wed, Mar 16, 2011 at 12:51 PM, Abhijit Sharma abhijit.sha...@gmail.com wrote: Hi, I am trying to connect the hive shell running on my laptop to a remote hadoop / hbase cluster and test out the HBase/Hive integration. I manage to connect and create the table in hbase from remote Hive shell.

Re: Hadoop error 2 while joining two large tables

2011-03-16 Thread Edward Capriolo
On Wed, Mar 16, 2011 at 12:51 PM, Christopher, Pat patrick.christop...@hp.com wrote: Are you using Hive on top of Hadoop or writing a raw Hadoop job? This is a the hive list so I’m going to assumed you’re running hive...  can you send your HiveQL query along? Pat From: hadoop n00b

Re: Problem with Hive HBase Integration - Running Mapper task

2011-03-16 Thread Abhijit Sharma
Thanks a ton - That worked like a charm. I have been struggling with this the whole day! I did not need to specify auxlib or auxpath - Just putting the 3 Hive/HBase jars in the HADOOP_HOME/lib on the remote job server worked fine. Btw if I use ADD JAR from hive will that obviate the need to put

Re: Problem with Hive HBase Integration - Running Mapper task

2011-03-16 Thread Edward Capriolo
On Wed, Mar 16, 2011 at 1:21 PM, Abhijit Sharma abhijit.sha...@gmail.com wrote: Thanks a ton - That worked like a charm. I have been struggling with this the whole day! I did not need to specify auxlib or auxpath - Just putting the 3 Hive/HBase jars in the HADOOP_HOME/lib on the remote job

unsubscribe

2011-03-16 Thread Paul Trout

Re: Hadoop error 2 while joining two large tables

2011-03-16 Thread Bejoy Ks
Hey hadoop n00b I second Mark's thought. But definitely you can try out re framing your query to get things rolling. I'm not sure on your hive Query.But still, from my experience with joins on huge tables (record counts in the range of hundreds of millions) you should give join conditions

We had this wierd behvior

2011-03-17 Thread Guy Doulberg
Hey guys, I have a hive partitioned table. First I ran a query that look like this: Select count(*) From table Where field like '%bla%' and (partition'10' and partition '20') For this query I got Some records let's say 640 When I ran this query Select count(*) From table Where field like

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread hadoop n00b
Hello All, Thanks a lot for your response. To clarify a few points - I am on CDH2 with Hive 0.4 (I think). We cannot move to a higher version of Hive as we have to use Cloudera distro only. All records in the smaller table have at least one record in the larger table (of course a few exceptions

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread Edward Capriolo
I am pretty sure the cloudera distro has an upgrade path to a more recent hive. On Thursday, March 17, 2011, hadoop n00b new2h...@gmail.com wrote: Hello All, Thanks a lot for your response. To clarify a few points - I am on CDH2 with Hive 0.4 (I think). We cannot move to a higher version of

Re: We had this wierd behvior

2011-03-17 Thread Edward Capriolo
2011/3/17 Guy Doulberg guy.doulb...@conduit.com: Strings I actually simplified the scenario so I could the question, Our partitions are actually string of dates with hour So the query was actually Partition = '20110301_20' and Partition = '2011030223' Still using a single quote wouldn't

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread bejoy_ks
Try out CDH3b4 it has hive 0.7 and the latest of other hadoop tools. When you work with open source it is definitely a good practice to upgrade those with latest versions. With newer versions bugs would be minimal , performance would be better and you get more functionalities. Your query looks

Re: Building Custom RCFiles

2011-03-17 Thread yongqiang he
You need to customize Hive's ColumnarSerde (maybe functions in LazySerde)'s serde and deserialize function (depends you want to read or write.). And the main thing is that you need to use your own type def (not LazyInt/LazyLong). If your type is int or long (not double/float), casting it to

Re: Building Custom RCFiles

2011-03-17 Thread yongqiang he
A side note, in hive, we make all columns saved as Text internally (even the column's type is int or double etc). And with some experiments, string is more friendly to compression. But it needs CPU to decode to its original type. Thanks Yongqiang On Thu, Mar 17, 2011 at 4:04 PM, yongqiang he

RE: Building Custom RCFiles

2011-03-17 Thread Severance, Steve
Thanks Yongqiang. So for more complex types like map do I just setup a ROW FORMAT DELIMITED KEYS TERMINATED BY '|' etc... Thanks. Steve -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, March 17, 2011 4:35 PM To: user@hive.apache.org Subject:

i want to load data into table from nutch segments data but....

2011-03-17 Thread 徐厚道
good morning everybody i want to load data into table from nutch segments data but i don't understand the python script in wikigetstarthttp://wiki.apache.org/hadoop/Hive/GettingStarted#Apache_Weblog_Data . what mean's* for line in sys.stdin ** * does it mean's the cell value? or just a

Worng file format error

2011-03-18 Thread abhishek pathak
Hi all, I have created an external hive table with the STORED AS SEQUENCEFILE option.However,when I try to load a blank file into that table it gives me the following error: Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from

End User Clients for Hive?

2011-03-18 Thread Sunderlin, Mark
As we prepare Hive for use by general business analysts and other end-users, I am wondering what the community's experience is with clients for end users? My user base currently is using tools such as WinSQL and Toad on their Windows machines to access current systems, such as MySQL, Oracle and

RE: Building Custom RCFiles

2011-03-18 Thread Severance, Steve
One more question. I have everything working except a MapString,String. I understand that the whole Map will be physically stored as a single Text object in the RCFile. I have had considerable trouble setting up the delimiters for this Map. I want to have MAP KEYS TERMINATED BY '='

Apache Web Log Question

2011-03-18 Thread bichonfrise74
Hi, I am trying to use this: add jar ../build/contrib/hive_contrib.jar; CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE

Re: End User Clients for Hive?

2011-03-18 Thread Kirk True
Hi Mark, Karmasphere Analyst[1] is a powerful, yet easy-to-use UI for Hive. Disclaimer: Karmasphere is one of my clients and I work on Analyst. Thanks, Kirk [1] http://www.karmasphere.com/Products-Information/karmasphere-analyst.html On 3/18/11 1:24 PM, Sunderlin, Mark wrote: As we prepare

RE: Building Custom RCFiles

2011-03-18 Thread Severance, Steve
Got it working using the columnar serde with the default seperators. Steve -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Friday, March 18, 2011 3:50 PM To: user@hive.apache.org Subject: Re: Building Custom RCFiles what's your table definition?

skew join optimization

2011-03-20 Thread Igor Tatarinov
I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of

Re: skew join optimization

2011-03-20 Thread Jov
2011/3/20 Igor Tatarinov i...@decide.com: I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size

Re: skew join optimization

2011-03-20 Thread bharath vissapragada
Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers

Re: skew join optimization

2011-03-20 Thread Ted Yu
Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join

Re: skew join optimization

2011-03-20 Thread Ted Yu
How about link to http://imageshack.us/ or TinyPic ? Thanks On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote: Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar

Re: skew join optimization

2011-03-20 Thread Edward Capriolo
On Sun, Mar 20, 2011 at 11:20 AM, Ted Yu yuzhih...@gmail.com wrote: How about link to http://imageshack.us/ or TinyPic ? Thanks On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote: Can someone

Problems with MetaStore

2011-03-20 Thread Anja Gruenheid
Hi! I'm trying to set up a test environment locally on my laptop and it works if I use the standard embedded derby driver. As soon as I add a hive-site.xml, I tried both MySQL and Derby with servers definitely running and the respective parameters set in that hive-site.xml, I get the

Re: skew join optimization

2011-03-20 Thread yongqiang he
skew join does not work together with map join. Map join does not require any reducer. Please double check the hive that you use has the auto map join feature. If there is auto covert join is your hive, only SET set hive.auto.convert.join = true; should do the work. thanks yongqiang On Sun, Mar

Re: skew join optimization

2011-03-20 Thread Igor Tatarinov
Thanks everyone! I had a typo when setting auto convert to true. You can actually see it in my first email ('set' was repeated twice but there was no syntax error). With map joins enabled, my join finished in 30 minutes. Sweet! Looks like 'true' should be the default option for auto.convert

Wrong file format error

2011-03-21 Thread abhishek pathak
Hi all, I have created an external hive table with the STORED AS SEQUENCEFILE option.However,when I try to load a blank file into that table it gives me the following error: Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from

Re: Hi,all.Is it possible to generate mutiple records in one SerDe?

2011-03-21 Thread Ted Yu
I don't think so: Object deserialize(Writable blob) throws SerDeException; On Mon, Mar 21, 2011 at 4:55 AM, 幻 ygnhz...@gmail.com wrote: Hi,all.Is it possible to generate mutiple records in one SerDe? I mean if I can return more than one rows in deserialize? Thanks!

Dynamic Configuration support in Hive SQL

2011-03-21 Thread amit jaiswal
Hi, Does hive support dynamic configuration? For example: is it possible to write a hive script with some ${PARAM} variables and let hive replace these parameters with their values at runtime. Eg. Original hive script: select * from person where age ${MIN_AGE}; Config file: MIN_AGE=18 And

Re: Dynamic Configuration support in Hive SQL

2011-03-21 Thread Andrew Wilson
We use the Cloudera Oozie distro which has a Hive action, and provides that kind of template support for Hive scripts. On Mar 21, 2011, at 11:35 AM, Lenin Gali wrote: Best way to do this is to write shell or python scripts and echo with substitute variables and call hive from command line

RE: Problems with MetaStore

2011-03-21 Thread Christopher, Pat
That message is saying, 'found all drivers, trying to connect to specified metastore db. Can't'. Check the following: - MySQL/Derby server running - correct ip/hostname specified (I've had troubles with dynamic ips) - userid/pwd exists on server and has correct grants - will need

Re: Dynamic Configuration support in Hive SQL

2011-03-21 Thread Edward Capriolo
On Mon, Mar 21, 2011 at 11:43 AM, Andrew Wilson awil...@conductor.com wrote: We use the Cloudera Oozie distro which has a Hive action, and provides that kind of template support for Hive scripts. On Mar 21, 2011, at 11:35 AM, Lenin Gali wrote: Best way to do this is to write shell or

Re: Dynamic Configuration support in Hive SQL

2011-03-21 Thread Sameer Kalburgi
Any plans, if possible, to allow you to set the variable from a query result e.g. 'set zzz = (select count(1) from t)'? On Mon, Mar 21, 2011 at 5:20 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Mon, Mar 21, 2011 at 11:43 AM, Andrew Wilson awil...@conductor.com wrote: We use the

Create Table Error

2011-03-21 Thread Anja Gruenheid
Hi! I've tried to run a create table statement via the HiveCLI in Eclipse and it fails with the following exception: FAILED: Error in metadata: MetaException(message:Got exception: java.io.FileNotFoundException File file:/user/hive/warehouse/customer does not exist.) 11/03/21 19:05:36 ERROR

Re: Dynamic Configuration support in Hive SQL

2011-03-21 Thread Avram Aelony
You can probably do this in Bash now, try something like this: zz=`hive -S -e select count(1) from t` ~Avram ~ Avram Aelony|Sr. Analyst|eHarmony.com|(424) 258-1199|x1098|skype: avram.aelony ~ On Mar 21, 2011, at 3:55 PM, Sameer Kalburgi wrote: Any plans, if possible, to allow you to set

Error while reading from task log url ?

2011-03-22 Thread
Hi,all.I meet a problem,here is the exception: select a.mid from origin_performance a; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201103220234_0003, Tracking URL =

Re: Return code 2 from org.apache.hadoop.hive.ql.exec.ExecDRiver error

2011-03-22 Thread
你试过这样查么? select a.id from table1 a; 在 2011年3月22日 下午4:31,王世森 wangshi...@oraro.net写道: Hi, My hive version is 0.6.0, I can query data like this: select * from table1; OK 1 wss 2 chenliang Time taken: 7.366 seconds But when the sql is ‘select id from table1;’, hive

Re: Create Table Error

2011-03-22 Thread Anja Gruenheid
Hi! Where (and what) exactly do I have to set it? I added HADOOP_CONF_DIR to the path variables and to the source folders in the build path, but it doesn't work. Thanks, Anja On 03/22/2011 04:48 AM, Thiruvel Thirumoolan wrote: Looks like your IDE classpath does not include

Re: An error occurred while using RCFile on S3

2011-03-22 Thread Shunsuke Mikami
Thank you! I tried with set hive.optimize.cp=false;, then it works! However, it reduce the merit of RCFile which skip unnecessary data. It may be better to use *SequenceFile in the present state of things?* -- Shunsuke Mikami 2011/3/23 yongqiang he heyongqiang...@gmail.com Don't know the

Re: Hive/hbase integration - Rebuild the Storage Handler

2011-03-22 Thread amit jaiswal
Hi, I am also trying the same but don't know the exact build steps. Someone please tell the same. -regards Amit From: Jean-Charles Thomas jctho...@autoscout24.com To: Hive mailing list user@hive.apache.org Sent: Tue, 22 March, 2011 11:40:18 AM Subject:

Re: Return code 2 from org.apache.hadoop.hive.ql.exec.ExecDRiver error

2011-03-23 Thread sangeetha s
Hi, Did you checked the field names in the table properly? Actually from the log file it is clear that there is no element named id in the table 'table1'. Kindly check if there is any typo. Also the alias is not required if you are dealing with a single table with simple queries. Kindly execute

Re: Return code 2 from org.apache.hadoop.hive.ql.exec.ExecDRiver error

2011-03-23 Thread 王世森
Hi, Thanks for your reply. Here is the result of ‘Describe tables;’ hive Describe table1; OK id int namestring Time taken: 18.104 seconds Jack 发件人: sangeetha s [mailto:sangee@gmail.com] 发送时间: 2011年3月23日 16:40 收件人: user@hive.apache.org 主题: Re: Return code 2 from

Re: Return code 2 from org.apache.hadoop.hive.ql.exec.ExecDRiver error

2011-03-23 Thread 王世森
Hi, My environment is win7 + cygwin + hadoop-0.20.2 hive-config,sh: export HIVE_HOME=`dirname $bin` export JAVA_HOME=/cygdrive/d/Cygwin/home/Administrator/Java/jdk1.6.0_07 export HADOOP_HOME=/cygdrive/d/Cygwin/home/Wangss/hadoop-0.20.2 ext\util\execHiveCmd.sh: HADOOP_HEAPSIZE=256

Re: can somebody help me ?eclipse compile hive

2011-03-23 Thread MIS
Checkout from the hive trunk and once the checkout is complete then run the build script with targets such as compile, compile-test and eclipse-files. On Wed, Mar 23, 2011 at 2:01 PM, 徐厚道 xuhou...@gmail.com wrote: i down load hive 0.6 source tar.gz package. unzip and create project by

Recommended approaches for large data import from RDBMS into hive (Approx Terabyte of data)

2011-03-23 Thread Ryan Greenhall
My desire is to be able to run a query on the existing RDBMS (SQLServer) to de-normalise the required data into single rows that can then be imported into a hive table partitioned by date. Previously, on a much smaller scale, I have achieved data import into Hive by copying a tsv of my data

<    1   2   3   4   5   6   7   8   9   10   >