Alter location of database in Hive
Hey all, I'm on Hive 0.10.0 on one of my clusters. We had a namenode hostname change, so I'm trying to point all of our tables, partitions and databases to the new locations. When i describe database mydb, the location shows up as hdfs://old_hostname/user/hive/warehouse/mydb.db, and i want to set it to hdfs://new_hostname/user/hive/warehouse/mydb.db Is there a way to do this. Or do I need to go poking around in the mysql metadata to actually carry this out? Regards, Jon
Re: Alter location of database in Hive
Answered my own question, no there is not. The way to do is is to modify the DB_LOCATION_URI field in metastore.DBS (at least if you're using MySQL) On Mon, Jun 30, 2014 at 5:14 PM, Jon Bender jonathan.ben...@gmail.com wrote: Hey all, I'm on Hive 0.10.0 on one of my clusters. We had a namenode hostname change, so I'm trying to point all of our tables, partitions and databases to the new locations. When i describe database mydb, the location shows up as hdfs://old_hostname/user/hive/warehouse/mydb.db, and i want to set it to hdfs://new_hostname/user/hive/warehouse/mydb.db Is there a way to do this. Or do I need to go poking around in the mysql metadata to actually carry this out? Regards, Jon
Passing mapreduce configuration parameters to hive udf
Hi there, I'm trying to pass some external properties to a UDF. In the MapReduce world I'm used to extending Configured in my classes, but in my UDF class when initializing a new Configuration object or HiveConf object it doesn't inherit any of those properties. I see it in the Job Configuration XML when the job runs but my UDF can't pick it up when it creates a new instance. Are there any other suggested ways of doing this? I could probably just add some conf file to distributed cache and load the properties on UDF initialization, but I figured I could get at the configuration through other means. Thanks in advance, Jon
Re: Single Map task for Hive queries
It's actually just an uncompressed UTF-8 text file. This was essentially the create table clause: CREATE EXTERNAL TABLE foo ROW FORMAT DELIMITED STORED AS TEXTFILE LOCATION '/data/foo' Using Hive 0.7. On Mon, Aug 15, 2011 at 10:37 AM, Loren Siebert lo...@siebert.org wrote: Is your external file compressed with GZip or BZip? Those file formats aren’t splittable, so they get assigned to one mapper. On Aug 15, 2011, at 10:23 AM, Jon Bender wrote: Hello, I have external tables in Hive stored in a single flat text file. When I execute queries against it, all of my jobs are run as a single map task, even on very large tables. What steps do I need to make to ensure that these queries are split up and pushed out to multiple TTs? Do I need to store the Hive tables in a different internal file format? Make some configuration changes? Thanks! Jon
Re: Single Map task for Hive queries
Yeah MapReduce itself is set up to use all of my task trackers--only one Map Task gets created one the external table queries. I tried querying another external table (composed of some 20 files) and it created 20 map tasks in turn during the query. I will try the LINES TERMINATED BY clause next to try and parallelize within a single file. On Mon, Aug 15, 2011 at 11:00 AM, Loren Siebert lo...@siebert.org wrote: You should not have to do anything special to Hive to make it use all of your TT’s. The actual MR job should be governed by your mapred-site.xml file. When you run sample MR jobs (like the Pi example) and look at the job tracker, are you seeing all your TT’s getting used? On Aug 15, 2011, at 10:47 AM, Jon Bender wrote: It's actually just an uncompressed UTF-8 text file. This was essentially the create table clause: CREATE EXTERNAL TABLE foo ROW FORMAT DELIMITED STORED AS TEXTFILE LOCATION '/data/foo' Using Hive 0.7. On Mon, Aug 15, 2011 at 10:37 AM, Loren Siebert lo...@siebert.org wrote: Is your external file compressed with GZip or BZip? Those file formats aren’t splittable, so they get assigned to one mapper. On Aug 15, 2011, at 10:23 AM, Jon Bender wrote: Hello, I have external tables in Hive stored in a single flat text file. When I execute queries against it, all of my jobs are run as a single map task, even on very large tables. What steps do I need to make to ensure that these queries are split up and pushed out to multiple TTs? Do I need to store the Hive tables in a different internal file format? Make some configuration changes? Thanks! Jon
Rename Hive partition
Hey all, Just wondering what the best way is to rename specific Hive table partitions. Is there some HiveQL command for this, or will I need to insert into new partitions to reflect the new naming convention? Cheers, Jon