Re: Efficient mechanism to simulate the row level updates in Hive

2011-02-16 Thread Ashish Thusoo
This is quite difficult to do in Hive on Hadoop. Hive over Hadoop really does not support row level updates so basically you are reduced to periodically merging the raw stream of updates with the main table and generating a new snapshot of the table. Another possible approach could be to use

LazySimpleSerDe: last column takes rest

2011-02-16 Thread Aurora Skarra-Gallagher
Hi, What does setting the serialization.last.column.takes.rest SERDEPROPERTIES do for the LazySimpleSerDe? http://hive.apache.org/docs/r0.6.0/api/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.SerDeParameters.html#isLastColumnTakesRest() I came across that in considering a blob table

Question about Transform and M/R scripts

2011-02-16 Thread Vijay
Hi, I'm trying this use case: do a simple select from an existing table and pass the results through a reduce script to do some analysis. The table has web logs so the select uses a pseudo user ID as the key and the rest of the data as values. My expectation is that a single reduce script should

[WAS Re: periodic execution] Oozie Hive action

2011-02-16 Thread Alejandro Abdelnur
An update on this. I've finished doing changes in Oozie Hive-action to work with Hive 0.7. As mentioned before the problem is that not all needed Hive dependent JARs are available in public Maven repos. Early next week the Cloudera Maven repositories should have beta versions of these JARs

hive with hibernate(ONLY SELECT)

2011-02-16 Thread Amlan Mandal
Is there way to use to use hibernate to work with hive ONLY for select queries. Amlan

[no subject]

2011-02-16 Thread Stuart Scott
Thanks for the reply.. (I'm new to Hive). I can't find the driver class. Do you know which files I should be looking for? Regards Stuart by the sound of the error ... it sounds like you don't have HiveDriver in your path Can you locate the calss that supposedly has the HiveDriver class?

Re:

2011-02-16 Thread Amlan Mandal
$HIVE_HOME/lib/ will contain the jar hive-jdbc-0.6.0.jar On Thu, Feb 17, 2011 at 12:06 PM, Stuart Scott stuart.sc...@e-mis.comwrote: Thanks for the reply.. (I’m new to Hive). I can’t find the driver class. Do you know which files I should be looking for? Regards Stuart by the sound

Re: does hive support Sequence File format ?

2011-02-17 Thread Ted Yu
Look under http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table On Thu, Feb 17, 2011 at 12:00 PM, Mapred Learn mapred.le...@gmail.comwrote: Hi, I was wondering if hive supports Sequence File format. If yes, could me point me to some documentation about how to use Seq files in

Re: does hive support Sequence File format ?

2011-02-17 Thread Mapred Learn
Thanks Ted ! Just found it few minutes ago. On Feb 17, 2011, at 1:46 PM, Ted Yu yuzhih...@gmail.com wrote: Look under http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table On Thu, Feb 17, 2011 at 12:00 PM, Mapred Learn mapred.le...@gmail.com wrote: Hi, I was wondering if hive

Re: does hive support Sequence File format ?

2011-02-17 Thread Karthik
I have a requirement to support data from the SequenceFile KEY (not the VALUE) to be used by Hive table. How can I do this. From the code, it looks like the VALUE part is available for Hive. Please help. Regards. From: Mapred Learn mapred.le...@gmail.com

RE: Hive Not Returning Column Names, even what not using 'When'??

2011-02-17 Thread Sirota, Peter
Hi Mark, You can use JDBC driver provided by Amazon Elastic MapReduce service. When you use that driver with SQL Squirrel it returns column names. Here are the docs on how to get that driver:

problem while performing union on twotables

2011-02-17 Thread sangeetha s
Hi, I am trying to perform union of two tables which are having identical schemas and distinct data.There are two tables 'oldtable' and 'newtable'. The old table contains the information of old users and the new table will conatin the information of new user. I am trying to update the new entry

left outer join and nulls

2011-02-18 Thread Cam Bazz
Hello, When we do a left outer join, and the right table does not have row, it will return NULL s for those values. is there any way to turn those nulls into 0's ? since it is cointing operation, if the right table does not have the row, it means 0's not nulls. best regards, -c.b.

Re: problem while performing union on twotables

2011-02-18 Thread Jov
hive0.4.1 do not support union,only support union all 在 2011-2-18 下午3:12,sangeetha s sangee@gmail.com写道: Hi, I am trying to perform union of two tables which are having identical schemas and distinct data.There are two tables 'oldtable' and 'newtable'. The old table contains the

OutOfMemory errors on joining 2 large tables.

2011-02-18 Thread Bennie Schut
When we try to join two large tables some of the reducers stop with an OutOfMemory exception. Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at

Unit testing Hive script

2011-02-18 Thread Radek Maciaszek
Hello, I was wondering if anyone managed to unit test Hive scripts and share his/her experience? My first thought was to prepare sample data, run hive scripts in order to generate output and then compare the generated output with the expected output. Sounds fairly simple but it may be a bit

RE: Hive Not Returning Column Names, even what not using 'When'??

2011-02-18 Thread Sunderlin, Mark
Hey Peter, this looks like it ought to work for me but The link to the hive 0.5 hive drivers ... seems broken??? http://buyitnw.appspot.com/aws.amazon.com/developertools/Elastic-MapReduce/0196055244487017 seems to be the link from the site you mention below, but it returns a blank page?

RE: Hive Not Returning Column Names, even what not using 'When'??

2011-02-18 Thread Sirota, Peter
Hi Mark, Try this link for Hive .5 JDBC driver: http://aws.amazon.com/developertools/Elastic-MapReduce/0196055244487017 We are actually in Seattle office. Best Regards, Peter- From: Sunderlin, Mark [mailto:mark.sunder...@teamaol.com] Sent: Friday, February 18, 2011 6:26 AM To:

Re: Unit testing Hive script

2011-02-18 Thread Edward Capriolo
On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek radek.macias...@gmail.com wrote: Hello, I was wondering if anyone managed to unit test Hive scripts and share his/her experience? My first thought was to prepare sample data, run hive scripts in order to generate output and then compare the

Re: Unit testing Hive script

2011-02-18 Thread Kirk True
Hi Radek, I'm actually in the process of running the map-join unit tests against EMR as we speak. It's possible but dog slow :) Thanks, Kirk On 2/18/11 11:09 AM, Edward Capriolo wrote: On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek radek.macias...@gmail.com wrote: Hello, I was wondering

Re: date-time functions in hive

2011-02-18 Thread Edward Capriolo
On Fri, Feb 18, 2011 at 3:47 PM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, I have a question regarding the existing date functions in Hive (http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Date_Functions) The unix_timestamp() functions return a bigint while the from_unixtime()

RE: TOAD for hive

2011-02-20 Thread Guy Doulberg
I ran into some problems with this maybe you can help me out. I have aux jars, in them I have a custom writable object, I put my jars in auxlib, using hive interactive mode it works perfectly, but Using TOAD for hive, the jobs fail, looking in the jobtracker I see that my custom writable class

Re: hive on mutinode hadoop

2011-02-20 Thread Amlan Mandal
Thanks Mafish. Can you please point me which config need to be set correctly? Amlan On Mon, Feb 21, 2011 at 12:45 PM, Mafish Liu maf...@gmail.com wrote: It seem you did not config your HDFS properly. Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://

Re: hive on mutinode hadoop

2011-02-21 Thread MIS
Please have the host-name and ip address mapping in the /etc/hosts file on both the nodes that are running hadoop cluster. One more thing : I hope secondary namenode is also running along namenode but you may have forgot to mention it. Thanks, MIS On Mon, Feb 21, 2011 at 12:47 PM, Amlan Mandal

calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
Hello, So I have table of item views with item_sid, ip_number, session_id I know it will not be that exact, but I want to get unique views per item, and i will accept ip_number, session_id tuple as an unique view. when I want to query just item hits I say: select item_sid, count(*) from

RE: TOAD for hive

2011-02-21 Thread Guy Doulberg
I think I found a lead, The following code is taken from the hiveserver.sh if [ $minor_ver -lt 20 ]; then exec $HADOOP jar $AUX_JARS_CMD_LINE $JAR $CLASS $HIVE_PORT $@ else # hadoop 20 or newer - skip the aux_jars option and hiveconf exec $HADOOP jar $JAR $CLASS $HIVE_PORT $@

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

2011-02-21 Thread Jov
在 2011-2-21 下午10:54,Bejoy Ks bejoy...@yahoo.com写道: Hi Experts I'm using hive for a few projects and i found it a great tool in hadoop to process end to end structured data. Unfortunately I'm facing a few challenges out here as follows Availability of database/schemas in Hive I'm having

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
You can group by item_sid (drop session_id and ip_number from group by clause) and then join with the parent table to get session_id and ip_number. -Ajo On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz camb...@gmail.com wrote: Hello, So I have table of item views with item_sid, ip_number,

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

2011-02-21 Thread Ajo Fod
On using SQL IN ... what would happen if you created a short table with the enteries in the IN clause and used a inner join ? -Ajo On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks bejoy...@yahoo.com wrote: Thanks Jov for the quick response Could you please let me know which is the latest stable

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
Hello, I did not understand this: when I do a: select item_sid, count(*) from item_raw group by item_sid i get hits per item. how do we join this to the master table? best regards, -c.b. On Mon, Feb 21, 2011 at 6:28 PM, Ajo Fod ajo@gmail.com wrote: You can group by item_sid (drop

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
Oh, I think I see what you are getting at .. basically you are getting duplicate item_sids because they represent different views. ... try this: select item_sid, ip_number, session_id, count(*) from item_raw group by item_sid, ip_number, session_id; On Mon, Feb 21, 2011 at 11:54 AM, Cam Bazz

Extract Create Table statement from Hive

2011-02-21 Thread Jay Ramadorai
Does anyone have a way of generating the create table statement for a table that is in Hive? I see a jira for this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed Capriolo might have a solution for this. Ed, are you able to share this solution? My goal is to copy a

Re: Extract Create Table statement from Hive

2011-02-21 Thread Edward Capriolo
On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai jramado...@tripadvisor.com wrote: Does anyone have a way of generating the create table statement for a table that is in Hive?  I see a jira for this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed Capriolo might have a

Re: hive on mutinode hadoop

2011-02-21 Thread sangeetha s
Ya,What Jeff said is correct. You should not name different ip's in a common name. Map the Ip's and host name correctly and try again. Cheers! On Mon, Feb 21, 2011 at 7:43 PM, Jeff Bean jwfb...@cloudera.com wrote: One thing i notice is that /etc/hosts is different on each host: amlan-laptop is

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
The query you have produced mulltiple item_sid's. This is rather what I have done: select u.item_sid, count(*) cc from (select distinct item_sid, ip_number, session_id from item_raw where date_day='20110202') u group by u.eser_sid date_day is a partition and this produced the results i wanted,

implementing moving average as a UDF

2011-02-21 Thread Igor Tatarinov
I would like to implement the moving average as a UDF (instead of a streaming reducer). Here is what I am thinking. Please let me know if I am missing something here: SELECT product, date, mavg(product, price, 10) FROM ( SELECT * FROM prices DISTRIBUTE BY product SORT BY product, date )

Re: Extract Create Table statement from Hive

2011-02-22 Thread Jay Ramadorai
Thank you, Ed. Works like a charm after I remove the Hive2rdbms references. I've uploaded the jar to the JIRA for those who want to use it. On Feb 22, 2011, at 1:13 PM, Edward Capriolo wrote: On Tue, Feb 22, 2011 at 1:09 PM, Jay Ramadorai jramado...@tripadvisor.com wrote: Thank you, Ed.

Re: implementing moving average as a UDF

2011-02-22 Thread Igor Tatarinov
Thank you, John. It's not quite clear from the page whether my solution: 1. makes sense 2. works now 3. will work in the future if the issue is resolved/implemented Could you elaborate? Also, there is no mentioning of UDF object sharing (between mappers) in the current implementation. Is this a

Re: why this query gives wrong results

2011-02-23 Thread Cam Bazz
Hello, Here are the table descriptions. they only have the identifier, hits, unqiques and date_day which is the partition hive describe selection_daily_hits; OK sel_sid int hitsint date_daystring hive describe selection_daily_uniques; OK sel_sid int uniques int date_day

Re: metastore with mysql

2011-02-24 Thread hive1
Hello, thank you for your quick responses! Seems my root mysql user wasn't really 'root'. GRANT ALL PRIVILEGES ... with a new user got it running. however I don't understand it, because my root user has the same privileges as the new one... but whatever. Malte -- NEU: FreePhone - kostenlos

FW: Query Regarding HIVE-1535.

2011-02-24 Thread Mohit
*** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any

Re: How to add hours/minutes to a timestamp column in Hive Query

2011-02-24 Thread Thiruvel Thirumoolan
You can use unix_timestamp(), do the math and convert the result to timestamp. something like from_unixtime(unix_timestamp(Arrival) + n). Use the proper units though. Will that not work for you? On Feb 24, 2011, at 7:57 PM, Bejoy Ks wrote: Hi Experts Could some one please help me out

Altering partition fileformat

2011-02-24 Thread charlie w
I am trying to query against a partitioned Hive table where the input format of different partitions may be different. I'd like to change the partition file format, and reading the language manual at http://wiki.apache.org/hadoop/Hive/LanguageManual, it seems to indicate that I should be able to

Specifying a double precision in HiveQL

2011-02-24 Thread Aurora Skarra-Gallagher
Hi, I have a Hive query that has a statement like this (sum(itemcount) / count(item)). I want to specify only two digits of precision (i.e. 53.55). The result is stored inside of a string, not its own column, so I'd need to set the precision in the statement. Is this possible? Thanks, Aurora

RE: Specifying a double precision in HiveQL

2011-02-24 Thread Paul Yang
Hacky, but maybe something like select concat( cast(num as int), '.' , cast(abs(num)*100 as int) % 100) from (select 1.234 as num from src limit 1) a; ? -Original Message- From: Aurora Skarra-Gallagher [mailto:aur...@yahoo-inc.com] Sent: Thursday, February 24, 2011 11:31 AM To:

Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Ayush Gupta
Hi! I'm having some trouble running queries from a java client against a remote Thrift Hive server. Its all setup and quicker queries do run through fine. But queries which run longer than about 10 minutes disconnect the client with a TTransportException: Connection reset exception.. The query

Re: FW: Query Regarding HIVE-1535.

2011-02-24 Thread Carl Steinbach
Hi Mohit, The fix for HIVE-1535 did not include a testcase. See the discussion in the ticket for an explanation of why this was the case. The steps you outlined in your email seem to indicate that HIVE-1535 was not actually fixed, or that the problem was reintroduced later. Please file a JIRA

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Ayush Gupta
Probing this further reveals that the connection is reset by the server in exactly 10 minutes every time. I'm running Hive 0.6. I do not see anything relevant at http://wiki.apache.org/hadoop/Hive/AdminManual/Configuration but is there some configuration property which controls this? -ayush On

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Adarsh Sharma
Did you start hiverserver service before running the client Program. Cheers, Adarsh Ayush Gupta wrote: Probing this further reveals that the connection is reset by the server in exactly 10 minutes every time. I'm running Hive 0.6. I do not see anything relevant at

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Viral Bajaria
What do the logs of the thrift server say ?? If it does not give any relevant information, I would enable DEBUG level logging on the console. Also a point to remember is the single-threaded nature of the hive thrift server (atleast upto v0.5) But looking at the logs is what will be the first

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Ayush Gupta
Yes, the hiveserver server was started and running before the client program was run. -ayush On Fri, Feb 25, 2011 at 12:14 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Did you start hiverserver service before running the client Program. Cheers, Adarsh Ayush Gupta wrote: Probing

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Ayush Gupta
On Fri, Feb 25, 2011 at 12:17 PM, Viral Bajaria viral.baja...@gmail.comwrote: What do the logs of the thrift server say ?? If it does not give any relevant information, I would enable DEBUG level logging on the console. the hiveserver is pretty quiet, the connection appears to be terminated

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Ayush Gupta
Thanks Carl, I'll check that. But, surely, I cant be the only one running Hive queries which last more than 10 minutes over a thrift client! The hive model is somewhat intended to work with large data sets and long running queries should be expected. I wonder why there is no discussion around

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Viral Bajaria
Carl, Do you think this issue was not there before 0.6 ? We run our thrift servers for hours and have never faced this issue. I don't think I have restarted any of my thrift servers for days. My hive wrapper does have logic to handle timeouts, it reconnects whenever it sees that the thrift

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-24 Thread Carl Steinbach
Hi Viral, Hive 0.5.0 and 0.6.0 use the same version of libthrift, so the problem is more likely related to some difference in the way 0.5.0 and 0.6.0 configure/initialize Thrift, or to some other issue related to the way the Thrift connection is managed on the client or server side (though it

HiveODBC build error

2011-02-25 Thread Paul Trout
25 Feb 2011 We're interested in using the HiveODBC interface, so I've been trying to build it. I'm using Hadoop 0.20.2+320, from Cloudera, with Hive 0.7.0CDH3B4. Initially I was trying this with Hive 0.5.0+32, but we intend to upgrade to CDH3B4 very soon, so I decided to try with the newer

Re: Thrift Java Client - TTransportException (SocketException: Connection reset)

2011-02-25 Thread Ning Zhang
I tried on the latest trunk (through CLI connecting to Hive Server) and there is no disconnection after 10 mins for a long query. @Ayush, is this Java client using JDBC connection? If so the client may have set a timeout for JDBC queries. I'm suspecting the disconnection is from the Java

Re: TOAD for hive

2011-02-25 Thread Otis Gospodnetic
Hi, I've had a quick look at Toad for Cloud the other day, too. * One complaint I heard (but have not verified) is that it crashed. I don't have the details. Anyone seen any crashes? * The other complaint I heard is that just like it allows easy querying, it allows the person using it easy

implementing a UDF

2011-02-25 Thread Igor Tatarinov
I am trying to implement a simple UDF but that doesn't seem to work for some reason. It looks like Hive is not able to cast the arguments to the right type. select price, mavg(0, price, 2) from prices limit 1; FAILED: Error in semantic analysis: line 1:14 Wrong Arguments 2: No matching method for

RE: TOAD for hive

2011-02-27 Thread Peter Hall
Hi Otis, If you have any details regarding crashes we’d be most interested in collecting more information about what lead to the crash. Toad for Cloud forums http://toadforcloud.com/forumindex.jspa?categoryID=735 would be the best place to post any such information. The credentials supplied

Re: Not able to run Hive

2011-02-28 Thread bharath vissapragada
I am also getting this error .. any suggestions? hive : 0.6 had :0.20.2 = On Mon, Jun 7, 2010 at 1:03 AM, Shuja Rehman shujamug...@gmail.com wrote: Hi all Thanks for reply. I have changed the heap size to 1024, then 512 then even 100 in the specified file. But i am still getting this

Re: Hbase Hive Intergration (for the latest versions)

2011-02-28 Thread Vivek Krishna
In short, I am trying to make hbase_handler to work with hive-0.6 and hbase-0.90.1. I am trying to integrate Hbase and Hive. There is a pretty good documentation at http://wiki.apache.org/hadoop/Hive/HBaseIntegration . But looks like they have become old. The hbase_handler was written

Re: about User scripte in HiveQL

2011-02-28 Thread Jianhua Wang
Thanks a lot for Roberto Congiu and wil's help. The problem has been solved with your assistance. I think I should read the wiki guide more carefully! Thank you very much! Best regards! 2011-03-01 Jianhua Wang

cannot start the transform script. reason : argument list too long

2011-03-01 Thread Irfan Mohammed
Hi, I have a hive script [given below] which calls a python script using transform and for large datasets [ 100M rows ], the reducer is not able to start the python process and the error message is argument list too long. The detailed error stack is given below. The python script takes only 1

Re: counting impressions strategy

2011-03-01 Thread Dave Viner
I am not super familiar with lists inside a column for Hive, but that might let you define a table that has a schema of page-type, page-name, items-displayed, and then query for a count of individual items ( http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL and

Percent Rank Calculation

2011-03-01 Thread Sameer Kalburgi
Hello, Does anyone have any experience calculating the percentile / percentrank for each row in a table? I see that there are built in UDAFs to calculate the percentile, but that would only return a single value for the entire table. Essentially, I'm trying to recreate the Excel PercentRank

Re: Percent Rank Calculation

2011-03-01 Thread Ajo Fod
I know of this type of a call would give you a subset of the table ... also I think you can use a group by clause to get it for groups of data. SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val 100; Couldn't you use this call a few times to get the value for each percentile value? I think

Re: cannot start the transform script. reason : argument list too long

2011-03-01 Thread Ajo Fod
instead of using 'python2.6 user_id_output.py hbase' try something like this: using 'user_id_output.py' ... and a #! line with the location of the python binary. I think you can include a parameter too in the call like : using 'user_id_output.py hbase' Cheers, Ajo. On Tue, Mar 1, 2011 at

RE: cannot start the transform script. reason : argument list too long

2011-03-01 Thread Steven Wong
Looks like this is the command line it was executing: 2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/python2.6, user_id_output.py, hbase] From: Irfan Mohammed [mailto:irfan...@gmail.com] Sent: Tuesday, March 01, 2011 1:39 PM To:

Re: cannot start the transform script. reason : argument list too long

2011-03-01 Thread Irfan Mohammed
Yes. That is the command it is executing but what I do not understand is why I am getting argument list too long when I am running the same sql with the same python script with a large dataset. Thanks. On Tue, Mar 1, 2011 at 2:53 PM, Steven Wong sw...@netflix.com wrote: Looks like this is the

Re: Dynamic partition - support for distribute by

2011-03-01 Thread Wil -
Thanks, the query works as expected. I guess the query on the wiki is out of date. - Original Message From: Thiruvel Thirumoolan thiru...@yahoo-inc.com To: user@hive.apache.org user@hive.apache.org Sent: Tue, March 1, 2011 3:26:13 AM Subject: Re: Dynamic partition - support for

Oozie Hive action patch

2011-03-01 Thread Alejandro Abdelnur
[crosspost on Oozie and Hive aliases as there threads in both] I've just posted a pull request (patch) for Oozie that add support for Hive actions in Oozie workflows. IMPORTANT: * The pull requests have an additional commit, GH-0226, that fixes groupId/artifacts of Hadoop/Pig/Hive to the

Associative Arrays in Hive?

2011-03-02 Thread Sunderlin, Mark
Let us say my log data that I want to place a log file into hive. And the log file itself looks something like this: Event_time, event_type, event_data_blob And the blob data looks like Key1=value1;key2=value2;key3=value3 ... keyn=valuen This looks like maybe I start like this: Create table

Re: cannot start the transform script. reason : argument list too long

2011-03-02 Thread Dave Brondsema
We've gotten this error a couple of times too - it is very misleading, not correct at all. IIRC, I determined the root cause is selecting too many input files (even though those do NOT get passed as arguments to transform script). For example, this happened once we had a lot of dynamic

Re: Trouble using mysql metastore

2011-03-02 Thread Bennie Schut
Usually this is caused by not having the mysql jdbc driver on the classpath (it's not default included in hive). Just put the mysql jdbc driver in the hive folder under lib/ On 03/02/2011 03:15 PM, Ajo Fod wrote: I've checked the mysql connection with a separate java file with the same string.

Re: Associative Arrays in Hive?

2011-03-02 Thread Edward Capriolo
On Wed, Mar 2, 2011 at 9:27 AM, Sunderlin, Mark mark.sunder...@teamaol.comwrote: Let us say my log data that I want to place a log file into hive. And the log file itself looks something like this: Event_time, event_type, event_data_blob And the blob data looks like

Re: Associative Arrays in Hive?

2011-03-02 Thread 김영우
Refer to this http://dev.bizo.com/2011/02/columns-in-hive.html http://dev.bizo.com/2011/02/columns-in-hive.htmlHTH - Youngwoo 2011/3/2 Sunderlin, Mark mark.sunder...@teamaol.com Let us say my log data that I want to place a log file into hive. And the log file itself looks something like

Re: Trouble using mysql metastore

2011-03-02 Thread Ajo Fod
Hi Bennie, Thanks for the response ! I had CLASSPATH set to include /usr/share/java/mysql.jar ... in addition, I just copied the mysql.jar to the lib directory of hive. I still get the same bug. Any other ideas? Thanks, -Ajo On Wed, Mar 2, 2011 at 7:01 AM, Bennie Schut bsc...@ebuddy.com

Re: Trouble using mysql metastore

2011-03-02 Thread Viral Bajaria
This definitely looks like a CLASSPATH error. Where did you get the mysql.jar from ? Can you open it up and make sure that it includes the com.mysql.jdbc.Driver namespace ? I am guessing the mysql.jar is not the one that you need. you can download a new one from the mysql website. To be clear,

RCfile is not working with BZip2. Interesting in using LZO in general.

2011-03-02 Thread phil young
I'm wondering if my configuration/stack is wrong, or if I'm trying to do something that is not supported in Hive. My goal is to choose a compression scheme for Hadoop/Hive and while comparing configurations, I'm finding that I can't get BZip2 or Gzip to work with the RCfile format. Is that

Re: Stats Gathering Problems

2011-03-04 Thread Ajo Fod
The good news is that this is a simple XML section .. and this looks like a XML read error. Try to copy-paste one of the existing properties sections and pasting over just the name and value strings from the message. Cheers, Ajo On Fri, Mar 4, 2011 at 6:40 AM, Anja Gruenheid

Date function unix_timestamp() with input values null doen't work as desired

2011-03-04 Thread Bejoy Ks
Hi Everyone I'm facing an issue with hive on a relatively larger query which involves joins on six hive tables. My query is running fine without any errors, all the map reduce jobs run to completion but unfortunately it is not showing up any results. I tried debugging the query and to

Re: Stats Gathering Problems

2011-03-04 Thread Anja Gruenheid
I fixed the XML problem and wrote everything into hive-site.xml. The update error still exists though. Anja On 03/04/2011 09:47 AM, Ajo Fod wrote: The good news is that this is a simple XML section .. and this looks like a XML read error. Try to copy-paste one of the existing properties

Re: [Oozie-users] Oozie Hive action patch

2011-03-05 Thread Alejandro Abdelnur
Andreas, Well, that is not entirely true, Oozie consumes Yahoo distributions of Hadoop and Pig (from Yahoo GH maven). [BTW, this brings up again the GH-0226 issue] Thanks for reviewing the patch. In the mean time, anybody wanting to use Oozie with Hive action can use CDH Oozie CDH3b4 which

Re: Stats Gathering Problems

2011-03-05 Thread Ning Zhang
Can you search your /tmp/username/hive.log for 'Stats' and see if there is any error message? You can also log on to mysql and see if the database you specified in the JDBC URI has been created and if there is any table in the database. On Mar 5, 2011, at 7:35 AM, Anja Gruenheid wrote: I also

Re: Stats Gathering Problems

2011-03-05 Thread Anja Gruenheid
I tried to use the default settings and with that it works (at least it doesn't throw an error). What's weird is that it collects the data on files/files size etc., but it doesn't compute the row count. Do you have any idea why that could be? The table is based on a textfile and is handled as

hello everybody,i am fresher,i meet a problem,please help.

2011-03-06 Thread 徐厚道
my eng is very poor. i set up hive env use http://wiki.apache.org/hadoop/Hive/GettingStarted#Apache_Weblog_Data but i catch a exception when i run SHOW TABLES; script somebody can help me ? thanks a lot! hive SHOW TABLES; Exception in thread main java.lang.NoSuchMethodError:

Hive too slow?

2011-03-07 Thread abhishek pathak
Hi, I am a hive newbie.I just finished setting up hive on a cluster of two servers for my organisation.As a test drill, we operated some simple queries.It took the standard map-reduce algorithm around 4 minutes just to execute this query: count(1) from tablename; The answer returned was

RE: hello everybody,i am fresher,i meet a problem,please help.

2011-03-07 Thread Chinna
Check the lib path, commons-lang-2.4.jar is in the lib or not. _ From: 徐厚道 [mailto:xuhou...@gmail.com] Sent: Monday, March 07, 2011 11:54 AM To: user@hive.apache.org Subject: hello everybody,i am fresher,i meet a problem,please help. my eng is very poor. i set up hive env

RE: hello everybody,i am fresher,i meet a problem,please help.

2011-03-07 Thread Chinna
*** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information

Re: hello everybody,i am fresher,i meet a problem,please help.

2011-03-07 Thread 徐厚道
thank you reply! yes,it is. and hadoop lib dir has commons-lang-2.1.jar, is they Conflict ? 2011/3/7 Chinna chinna...@huawei.com Check the lib path, commons-lang-2.4.jar is in the lib or not. -- *From:* 徐厚道 [mailto:xuhou...@gmail.com] *Sent:*

RE: hello everybody,i am fresher,i meet a problem,please help.

2011-03-07 Thread Chinna
No, It won't be a conflict. In u r hive installation/lib/commons-lang-2.4.jar if this jar is there . It will come to class path while starting the hive. I think u r using hive version 0.5.0 or above. if still this problem is there send the details like how u r starting and which

Re: Hive too slow?

2011-03-07 Thread Ajo Fod
In my experience, hive is not instantaneous like other DBs, but 4 minutes to count 2200 rows seems unreasonable. For comparison my query of 169k rows one one computer with 4 cores running 1Ghz (approx) took 20 seconds. Cheers, Ajo. On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak

Re: Pointers for talking directly to Hive server to execute queries

2011-03-07 Thread Ryan LeCompte
Nevermind, looks like this has already been done using the Thrift APIs! https://github.com/forward/rbhive On Mon, Mar 7, 2011 at 1:24 PM, Ryan LeCompte lecom...@gmail.com wrote: Hey guys, I'm thinking about writing a native Ruby client that can be used to connect to a running Hive server

Loading data into a Clustered/bucketed table

2011-03-07 Thread Jay Ramadorai
I am Sqooping data from an external source into a bucketed Hive table. Sqoop seems completely bucket-unaware, it simply used LOAD INPATH which moves the single file containing Sqooped data into the Hive warehouse location. My question: - is there any way to get data into an empty

Re: hello everybody,i am fresher,i meet a problem,please help.

2011-03-07 Thread 徐厚道
sorry,i have not reply Immediately,i have confirmed the commons-lang-2.4.jar is in the installation/lib. my installation info is hive 0.6.0 hadoop 0.20.2 with nutch 1.1 i have view the bin/hive script ,and echo the CLASSPTH,HADOOP_CLASSPATH,they all contains the commons-lang-2.4.jar. but throw

Re: Hive too slow?

2011-03-07 Thread abhishek pathak
I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4 minutes is indeeed not normal behaviour.Could you point me at some places where i can get some info on how to tune this up? Regards, Abhishek

Performance between Hive queries vs. Hive over HBase queries

2011-03-07 Thread Biju Kaimal
Hi, I loaded a data set which has 1 million rows into both Hive and HBase tables. For the HBase table, I created a corresponding Hive table so that the data in HBase can be queried from Hive QL. Both tables have a key column and a value column For the same query (select value, count(*) from

Re: Performance between Hive queries vs. Hive over HBase queries

2011-03-07 Thread John Sichi
Yes. JVS On Mar 7, 2011, at 9:59 PM, Biju Kaimal wrote: Hi, I loaded a data set which has 1 million rows into both Hive and HBase tables. For the HBase table, I created a corresponding Hive table so that the data in HBase can be queried from Hive QL. Both tables have a key column and a

Re: Hive too slow?

2011-03-07 Thread Vijay
If you go to the jobtracker's web UI, it provides plenty of details about each job. Even with all the default settings of a typical hadoop/hive installation, 4 minutes for 2200 rows is extremely slow. It feels like there is some kind of problem but it is hard to guess what that could be. Digging

<    1   2   3   4   5   6   7   8   9   10   >