Fwd: Error message for PigPen

2009-05-20 Thread George Pang
Dear users, When I try to use pig editor on Eclipse (with PigPen plugin), one error message appears on the console: "*org.apache.hadoop.dfs.DistributedFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem*" Does this have something to do with hadoop version? Thank you! George

RE: More replication of map reduce output

2009-05-20 Thread Puri, Aseem
Yes I hav mark it as final. Now one more exception arises, my map reduce program for word count is throwing exception. 09/05/21 11:31:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 09/05/21 11:31:37 INFO hdfs.DFSClient

Re: More replication of map reduce output

2009-05-20 Thread Michael C. Toren
On Thu, May 21, 2009 at 11:16:59AM +0530, Puri, Aseem wrote: > I mean when my reduce tasks is set 1 part-0 filw shows replication facor > as 3. But I set replication factor as 1 in hadoop-site.xml Did you mark the replication factor configuration option as "final"? e.g.: dfs.repl

RE: More replication of map reduce output

2009-05-20 Thread Puri, Aseem
I mean when my reduce tasks is set 1 part-0 filw shows replication facor as 3. But I set replication factor as 1 in hadoop-site.xml -Original Message- From: edw...@udanax.org [mailto:edw...@udanax.org] On Behalf Of Edward J. Yoon Sent: Thursday, May 21, 2009 11:14 AM To: core-user@had

Re: More replication of map reduce output

2009-05-20 Thread Edward J. Yoon
Do you mean the three files such as, part-0? If so, you can set the number of reduce tasks as 1. On Thu, May 21, 2009 at 2:39 PM, Puri, Aseem wrote: > Hi > >            I have running a map reduce program on two node. My DFS > replication factor is one. Al files for input have one replication

More replication of map reduce output

2009-05-20 Thread Puri, Aseem
Hi I have running a map reduce program on two node. My DFS replication factor is one. Al files for input have one replication but the output from reduce always have replication 3. Can anyone please tell why it is so? Thanks & Regards Aseem Puri

Re: Number of maps and reduces not obeying my configuration

2009-05-20 Thread Foss User
On Wed, May 20, 2009 at 3:18 PM, Tom White wrote: > The number of maps to use is calculated on the client, since splits > are computed on the client, so changing the value of mapred.map.tasks > only on the jobtracker will not have any effect. > > Note that the number of map tasks that you set is o

Re: Constantly getting DiskErrorExceptions - but logged as INFO

2009-05-20 Thread Lance Riedel
Thanks, found it: http://issues.apache.org/jira/browse/HADOOP-4963 Lance On Wed, May 20, 2009 at 8:15 AM, Lance Riedel wrote: > We're still seeing this error in our log files. Is this an expected > output? (the fact that it is INFO makes it seem not so bad, but anythng to > do with DiskChecker

Re: Username in Hadoop cluster

2009-05-20 Thread Alex Loddengaard
Ah ha! Good point, Todd. Pankil, with Todd's suggestion, you can ignore the first option I proposed. Thanks, Alex On Wed, May 20, 2009 at 4:30 PM, Todd Lipcon wrote: > On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard > wrote: > > > First of all, if you can get all machines to have the same

Re: Username in Hadoop cluster

2009-05-20 Thread Todd Lipcon
On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard wrote: > First of all, if you can get all machines to have the same user, that would > greatly simplify things. > > If, for whatever reason, you absolutely can't get the same user on all > machines, then you could do either of the following: > > 1

Re: Username in Hadoop cluster

2009-05-20 Thread Alex Loddengaard
First of all, if you can get all machines to have the same user, that would greatly simplify things. If, for whatever reason, you absolutely can't get the same user on all machines, then you could do either of the following: 1) Change the *-all.sh scripts to read from a slaves file that has two f

Username in Hadoop cluster

2009-05-20 Thread Pankil Doshi
Hello everyone, Till now I was using same username on all my hadoop cluster machines. But now I am building my new cluster and face a situation in which I have different usernames for different machines. So what changes will have to make in configuring hadoop. using same username ssh was easy. no

Re: Shutdown in progress exception

2009-05-20 Thread Stas Oskin
> > You should only use this if you plan on manually closing FileSystems > yourself from within your own shutdown hook. It's somewhat of an advanced > feature, and I wouldn't recommend using this patch unless you fully > understand the ramifications of modifying the shutdown sequence. Standard df

Re: Shutdown in progress exception

2009-05-20 Thread Todd Lipcon
On Wed, May 20, 2009 at 2:07 PM, Stas Oskin wrote: > Hi. > > 2009/5/20 Tom White > > > Looks like you are trying to copy file to HDFS in a shutdown hook. > > Since you can't control the order in which shutdown hooks run, this is > > won't work. There is a patch to allow Hadoop's FileSystem shutd

Re: Shutdown in progress exception

2009-05-20 Thread Stas Oskin
Hi. 2009/5/20 Tom White > Looks like you are trying to copy file to HDFS in a shutdown hook. > Since you can't control the order in which shutdown hooks run, this is > won't work. There is a patch to allow Hadoop's FileSystem shutdown > hook to be disabled so it doesn't close filesystems on exit

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread dealmaker
Should I use LOCK TABLE then? How do I prevent my_table from being accessed before I create my_table back? And how do I run these mysql commands in hadoop? Thanks. Todd Lipcon-4 wrote: > > On Wed, May 20, 2009 at 10:59 AM, dealmaker wrote: > >> >> Your second option is similar to what I had

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread Todd Lipcon
On Wed, May 20, 2009 at 10:59 AM, dealmaker wrote: > > Your second option is similar to what I had in my original post, following > are my mysql commands: > BEGIN; > RENAME TABLE my_table TO backup_table; > CREATE TABLE my_table LIKE backup_table; > COMMIT; > FYI, the "BEGIN" and "COMMIT" there

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread dealmaker
Your second option is similar to what I had in my original post, following are my mysql commands: BEGIN; RENAME TABLE my_table TO backup_table; CREATE TABLE my_table LIKE backup_table; COMMIT; I just want to know how to run these command in hadoop code. Do I use DBInputFormat.setInput ( )? How

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread Todd Lipcon
On Wed, May 20, 2009 at 10:52 AM, Aaron Kimball wrote: > You said that you're concerned with the performance of DELETE, but I don't > know a better way around this if all your input sources are forced to write > to the same table. Ideally you could have a "current" table and a "frozen" > table; w

Re: Setting up another machine as secondary node

2009-05-20 Thread Aaron Kimball
See this regarding instructions on configuring a 2NN on a separate machine from the NN: http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/ - Aaron On Thu, May 14, 2009 at 10:42 AM, Koji Noguchi wrote: > Before 0.19, fsimage/edits were on the same directory. > So

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread Aaron Kimball
For your use case, you'll need to just do a ranged import (i.e., SELECT * FROM foo WHERE id > X and id < Y), and then delete the same records after the import succeeds (DELETE FROM foo WHERE id > X and id < Y). Before the import, you can SELECT max(id) FROM foo to establish what Y should be; X is i

Re: Linking against Hive in Hadoop development tree

2009-05-20 Thread Aaron Kimball
I've worked around needing any compile-time dependencies for now. :) No longer an issue. - Aaron On Wed, May 20, 2009 at 10:29 AM, Ashish Thusoo wrote: > You could either do what Owen suggested and put the plugin in hive contrib, > or you could just put the whole thing in hive contrib as then y

RE: Linking against Hive in Hadoop development tree

2009-05-20 Thread Ashish Thusoo
You could either do what Owen suggested and put the plugin in hive contrib, or you could just put the whole thing in hive contrib as then you would have access to all the lower level api (core, hdfs, hive etc.). Owen's approach makes a lot of sense if you think that the hive dependency is a loos

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread dealmaker
No, my prime objective is not to backup db. I am trying to move the records from mysql db to hadoop for processing. Hadoop itself doesn't keep any records. After that, I will remove the same mysql records processed in hadoop from the mysql db. The main point isn't about getting the mysql recor

Re: How to Rename & Create Mysql DB Table in Hadoop?

2009-05-20 Thread dealmaker
no, sooner or later it will run out of auto-incremental primary key because there are new records added constantly. Datetime column will force me to use Delete command which maybe slow as well. He Yongqiang wrote: > > I think the simplest one would be finding some key( incremental primary > ke

RE: Hadoop User Group (Bay Area) May 20th

2009-05-20 Thread Ajay Anand
Reminder: the Bay Area Hadoop User Group meeting is today at 6 pm at the Yahoo! Sunnyvale campus - http://upcoming.yahoo.com/event/2659418/ From: Ajay Anand Sent: Wednesday, May 13, 2009 1:12 PM To: 'core-user@hadoop.apache.org'; 'gene...@hadoop.apache.org'; '

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread Edward J. Yoon
Oh.. According to my understanding, To maintain a steady DB size, delete and backup the old records. If so, I guess you can continuously do that using WHERE and LIMIT clauses. Then you can reduce the I/O costs.. It should be dumped at once? On Thu, May 21, 2009 at 12:48 AM, dealmaker wrote:

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread He Yongqiang
I think the simplest one would be finding some key( incremental primary key or datetime column etc) to partition you data. On 09-5-20 下午11:48, "dealmaker" wrote: > > Other parts of the non-hadoop system will continue to add records to mysql db > when I move those records (and remove the very

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread dealmaker
Other parts of the non-hadoop system will continue to add records to mysql db when I move those records (and remove the very same records from mysql db at the same time) to hadoop for processing. That's why I am doing those mysql commands. What are you suggesting? If I do it like you suggest,

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread Edward J. Yoon
Hadoop is a distributed filesystem. If you wanted to backup your table data to hdfs, you can use SELECT * INTO OUTFILE 'file_name' FROM tbl_name; Then, put it to hadoop dfs. Edward On Thu, May 21, 2009 at 12:08 AM, dealmaker wrote: > > No, actually I am using mysql.  So it doesn't belong to Hive

Re: Constantly getting DiskErrorExceptions - but logged as INFO

2009-05-20 Thread Lance Riedel
We're still seeing this error in our log files. Is this an expected output? (the fact that it is INFO makes it seem not so bad, but anythng to do with DiskChecker exceptions scares me). I posted this over a week ago but haven't had a response.. any help? Thanks! lance On Mon, May 11, 2009 at 10:

Re: Hadoop & Python

2009-05-20 Thread s d
Thanks, What would be the # of severs , file sizes that in their range the performance hit will be minor? I am concerned about implementing it all only to rewrite it later to scale economically. Thanks for all the information. On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah wrote: > S d, > > It

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread dealmaker
No, actually I am using mysql. So it doesn't belong to Hive, I think. owen.omalley wrote: > > > On May 19, 2009, at 11:48 PM, dealmaker wrote: > >> >> Hi, >> I want to backup a table and then create a new empty one with >> following >> commands in Hadoop. How do I do it in java? Thanks.

Re: Hama Problem

2009-05-20 Thread Edward J. Yoon
Hi, > I am not sure this code easily can be parallel computed,or how to change > this code to add the parallel compuation.any advice will be > appreciated.thanks in advance. OK, I'm sure it could be run on Hama/Hadoop. According to my understanding of your code, It's a PCA. If you have an M im

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread Min Zhou
In Hive, alter table old_name rename to new_name; create table ( ... ) can solve your problem. On Wed, May 20, 2009 at 8:30 PM, Owen O'Malley wrote: > > On May 19, 2009, at 11:48 PM, dealmaker wrote: > > >> Hi, >> I want to backup a table and then create a new empty one with following >> comm

Re: Optimal Filesystem (and Settings) for HDFS

2009-05-20 Thread Steve Loughran
Bryan Duxbury wrote: We use XFS for our data drives, and we've had somewhat mixed results. Thanks for that. I've just created a wiki page to put some of these notes up -extensions and some hard data would be welcome http://wiki.apache.org/hadoop/DiskSetup One problem we have for hard data

Re: Shutdown in progress exception

2009-05-20 Thread Tom White
Looks like you are trying to copy file to HDFS in a shutdown hook. Since you can't control the order in which shutdown hooks run, this is won't work. There is a patch to allow Hadoop's FileSystem shutdown hook to be disabled so it doesn't close filesystems on exit. See https://issues.apache.org/jir

Re: Linking against Hive in Hadoop development tree

2009-05-20 Thread Owen O'Malley
On May 15, 2009, at 3:25 PM, Aaron Kimball wrote: Yikes. So part of sqoop would wind up in one source repository, and part in another? This makes my head hurt a bit. I'd say rather that Sqoop is in Mapred and the adapter to Hive is in Hive. I'm also not convinced how that helps. Clea

Re: Linking against Hive in Hadoop development tree

2009-05-20 Thread Owen O'Malley
On May 20, 2009, at 3:07 AM, Tom White wrote: Why does mapred depend on hdfs? MapReduce should only depend on the FileSystem interface, shouldn't it? Yes, I should have been consistent. In terms of compile-time dependences, mapred only depends on core. -- Owen

Re: How to Rename & Create DB Table in Hadoop?

2009-05-20 Thread Owen O'Malley
On May 19, 2009, at 11:48 PM, dealmaker wrote: Hi, I want to backup a table and then create a new empty one with following commands in Hadoop. How do I do it in java? Thanks. Since this is a question about Hive, you should be asking on hive-u...@hadoop.apache.org . -- Owen

Re: Linking against Hive in Hadoop development tree

2009-05-20 Thread Tom White
On Fri, May 15, 2009 at 11:06 PM, Owen O'Malley wrote: > > On May 15, 2009, at 2:05 PM, Aaron Kimball wrote: > >> In either case, there's a dependency there. > > You need to split it so that there are no cycles in the dependency tree. In > the short term it looks like: > > avro: > core: avro > hd

Re: multiple results for each input line

2009-05-20 Thread Tom White
Hi John, You could do this with a map only-job (using NLineInputFormat, and setting the number of reducers to 0), and write the output key as docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes that you calculate all 12 statistics in one map. Each output file would have a single l

Re: Number of maps and reduces not obeying my configuration

2009-05-20 Thread Tom White
The number of maps to use is calculated on the client, since splits are computed on the client, so changing the value of mapred.map.tasks only on the jobtracker will not have any effect. Note that the number of map tasks that you set is only a suggestion, and depends on the number of splits actual

multiple results for each input line

2009-05-20 Thread John Clarke
Hi, I'm having some trouble implementing what I want to achieve... essentially I have a large input list of documents that I want to get statistics on. For each document I have 12 different stats to work out. So my input file is a text file with one document filepath on each line. The documents a

Re: Hama Problem

2009-05-20 Thread Robert Burrell Donkin
On Wed, May 20, 2009 at 8:08 AM, ykj wrote: > > Hello,everyone hi >       I am new to hama. in our project ,my team leader let me upload  old > code, run it on hadoop with parallel matrix computation. hama has it's own mailing list and this question is probably better asked there. see http://in

how to change this code with hama

2009-05-20 Thread ykj
Hello,everyone I am new to hama. in our project ,my team leader let me upgrade old code, run it on hadoop with parallel matrix computation.this is old code: public class EigenFaceGenerator { Matrix averageFace; //stores the average face useful when probing the datab

how to change this code with hama

2009-05-20 Thread ykj
Hello,everyone I am new to hama. in our project ,my team leader let me upgrade old code, run it on hadoop with parallel matrix computation.this is old code: public class EigenFaceGenerator { Matrix averageFace; //stores the average face useful when probing the databas

Hama Problem

2009-05-20 Thread ykj
Hello,everyone I am new to hama. in our project ,my team leader let me upload old code, run it on hadoop with parallel matrix computation.this is old code: public class EigenFaceGenerator { Matrix averageFace;//stores the average face useful when probing t

Hama Problem

2009-05-20 Thread ykj
Hello,everyone I am new to hama. in our project ,my team leader let me upload old code, run it on hadoop with parallel matrix computation.this is old code: public class EigenFaceGenerator { Matrix averageFace;//stores the average face useful when probing t