Setting up stats database

2011-08-15 Thread wd
hi, I'm try to use postgres as stats database. And made following settings in hive-site.xml property namehive.stats.dbclass/name valuejdbc:postgresql/value descriptionThe default database that stores temporary hive statistics./description /property property

Re: Setting up stats database

2011-08-15 Thread wd
oh, found hive only support mysql and hbase. I'll try hbase. On Mon, Aug 15, 2011 at 3:09 PM, wd w...@wdicc.com wrote: hi, I'm try to use postgres as stats database. And made following settings in hive-site.xml property  namehive.stats.dbclass/name  valuejdbc:postgresql/value  

Re: Setting up stats database

2011-08-15 Thread wd
HBase Publisher/Aggregator classes cannot be loaded. need to configure publisher/aggregator for hbase...there is only one way, that is use mysql .. does stats database will optimize hive query? Consider whether or not setup a mysql for this. On Mon, Aug 15, 2011 at 3:17 PM, wd w...@wdicc.com

Re: slow performance when using udf

2011-08-15 Thread Carl Steinbach
Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance. On Mon, Aug 15, 2011 at 1:49 AM, wd w...@wdicc.com wrote: hi, I create a udf to decode urlencoded things, but found the speed for mapred is 3 times(73sec - 213 sec) as before. How to

Re: slow performance when using udf

2011-08-15 Thread Edward Capriolo
On Monday, August 15, 2011, Carl Steinbach c...@cloudera.com wrote: Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance. On Mon, Aug 15, 2011 at 1:49 AM, wd w...@wdicc.com wrote: hi, I create a udf to decode urlencoded things, but found

copy table, change serde

2011-08-15 Thread Jonathan Grimm
Hi, I'm trying to do what I think should be a simple task, but I'm running into some issues with carrying through column names. All I want to do is essentially copy an existing table but change the serialization format (if you're curious, this is to help integrate with some existing map

Re: Single Map task for Hive queries

2011-08-15 Thread Loren Siebert
Is your external file compressed with GZip or BZip? Those file formats aren’t splittable, so they get assigned to one mapper. On Aug 15, 2011, at 10:23 AM, Jon Bender wrote: Hello, I have external tables in Hive stored in a single flat text file. When I execute queries against it, all

Re: Single Map task for Hive queries

2011-08-15 Thread Jon Bender
It's actually just an uncompressed UTF-8 text file. This was essentially the create table clause: CREATE EXTERNAL TABLE foo ROW FORMAT DELIMITED STORED AS TEXTFILE LOCATION '/data/foo' Using Hive 0.7. On Mon, Aug 15, 2011 at 10:37 AM, Loren Siebert lo...@siebert.org wrote: Is your external

Re: Single Map task for Hive queries

2011-08-15 Thread Loren Siebert
You should not have to do anything special to Hive to make it use all of your TT’s. The actual MR job should be governed by your mapred-site.xml file. When you run sample MR jobs (like the Pi example) and look at the job tracker, are you seeing all your TT’s getting used? On Aug 15, 2011, at

Re: Single Map task for Hive queries

2011-08-15 Thread Jon Bender
Yeah MapReduce itself is set up to use all of my task trackers--only one Map Task gets created one the external table queries. I tried querying another external table (composed of some 20 files) and it created 20 map tasks in turn during the query. I will try the LINES TERMINATED BY clause next

Wiki write access, please

2011-08-15 Thread Jakob Homan
The current DDL page doesn't have documentation about the describe database command. I'd like to add that. I'm listed under my apache addr: jgho...@apache.org Thanks, Jakob

Re: slow performance when using udf

2011-08-15 Thread wd
Thanks for all your advise, I'll try it out. On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Monday, August 15, 2011, Carl Steinbach c...@cloudera.com wrote: Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with