Dear users,
When I try to use pig editor on Eclipse (with PigPen plugin), one error
message appears on the console: "*org.apache.hadoop.dfs.DistributedFileSystem
cannot be cast to org.apache.hadoop.fs.FileSystem*"
Does this have something to do with hadoop version?
Thank you!
George
Yes I hav mark it as final. Now one more exception arises, my map reduce
program for word count is throwing exception.
09/05/21 11:31:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/05/21 11:31:37 INFO hdfs.DFSClient
On Thu, May 21, 2009 at 11:16:59AM +0530, Puri, Aseem wrote:
> I mean when my reduce tasks is set 1 part-0 filw shows replication facor
> as 3. But I set replication factor as 1 in hadoop-site.xml
Did you mark the replication factor configuration option as "final"? e.g.:
dfs.repl
I mean when my reduce tasks is set 1 part-0 filw shows replication facor as
3. But I set replication factor as 1 in hadoop-site.xml
-Original Message-
From: edw...@udanax.org [mailto:edw...@udanax.org] On Behalf Of Edward J. Yoon
Sent: Thursday, May 21, 2009 11:14 AM
To: core-user@had
Do you mean the three files such as, part-0? If so, you can set
the number of reduce tasks as 1.
On Thu, May 21, 2009 at 2:39 PM, Puri, Aseem wrote:
> Hi
>
> I have running a map reduce program on two node. My DFS
> replication factor is one. Al files for input have one replication
Hi
I have running a map reduce program on two node. My DFS
replication factor is one. Al files for input have one replication but
the output from reduce always have replication 3. Can anyone please tell
why it is so?
Thanks & Regards
Aseem Puri
On Wed, May 20, 2009 at 3:18 PM, Tom White wrote:
> The number of maps to use is calculated on the client, since splits
> are computed on the client, so changing the value of mapred.map.tasks
> only on the jobtracker will not have any effect.
>
> Note that the number of map tasks that you set is o
Thanks, found it:
http://issues.apache.org/jira/browse/HADOOP-4963
Lance
On Wed, May 20, 2009 at 8:15 AM, Lance Riedel wrote:
> We're still seeing this error in our log files. Is this an expected
> output? (the fact that it is INFO makes it seem not so bad, but anythng to
> do with DiskChecker
Ah ha! Good point, Todd. Pankil, with Todd's suggestion, you can ignore
the first option I proposed.
Thanks,
Alex
On Wed, May 20, 2009 at 4:30 PM, Todd Lipcon wrote:
> On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard
> wrote:
>
> > First of all, if you can get all machines to have the same
On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard wrote:
> First of all, if you can get all machines to have the same user, that would
> greatly simplify things.
>
> If, for whatever reason, you absolutely can't get the same user on all
> machines, then you could do either of the following:
>
> 1
First of all, if you can get all machines to have the same user, that would
greatly simplify things.
If, for whatever reason, you absolutely can't get the same user on all
machines, then you could do either of the following:
1) Change the *-all.sh scripts to read from a slaves file that has two
f
Hello everyone,
Till now I was using same username on all my hadoop cluster machines.
But now I am building my new cluster and face a situation in which I have
different usernames for different machines. So what changes will have to
make in configuring hadoop. using same username ssh was easy. no
>
> You should only use this if you plan on manually closing FileSystems
> yourself from within your own shutdown hook. It's somewhat of an advanced
> feature, and I wouldn't recommend using this patch unless you fully
> understand the ramifications of modifying the shutdown sequence.
Standard df
On Wed, May 20, 2009 at 2:07 PM, Stas Oskin wrote:
> Hi.
>
> 2009/5/20 Tom White
>
> > Looks like you are trying to copy file to HDFS in a shutdown hook.
> > Since you can't control the order in which shutdown hooks run, this is
> > won't work. There is a patch to allow Hadoop's FileSystem shutd
Hi.
2009/5/20 Tom White
> Looks like you are trying to copy file to HDFS in a shutdown hook.
> Since you can't control the order in which shutdown hooks run, this is
> won't work. There is a patch to allow Hadoop's FileSystem shutdown
> hook to be disabled so it doesn't close filesystems on exit
Should I use LOCK TABLE then? How do I prevent my_table from being accessed
before I create my_table back?
And how do I run these mysql commands in hadoop?
Thanks.
Todd Lipcon-4 wrote:
>
> On Wed, May 20, 2009 at 10:59 AM, dealmaker wrote:
>
>>
>> Your second option is similar to what I had
On Wed, May 20, 2009 at 10:59 AM, dealmaker wrote:
>
> Your second option is similar to what I had in my original post, following
> are my mysql commands:
> BEGIN;
> RENAME TABLE my_table TO backup_table;
> CREATE TABLE my_table LIKE backup_table;
> COMMIT;
>
FYI, the "BEGIN" and "COMMIT" there
Your second option is similar to what I had in my original post, following
are my mysql commands:
BEGIN;
RENAME TABLE my_table TO backup_table;
CREATE TABLE my_table LIKE backup_table;
COMMIT;
I just want to know how to run these command in hadoop code. Do I use
DBInputFormat.setInput ( )? How
On Wed, May 20, 2009 at 10:52 AM, Aaron Kimball wrote:
> You said that you're concerned with the performance of DELETE, but I don't
> know a better way around this if all your input sources are forced to write
> to the same table. Ideally you could have a "current" table and a "frozen"
> table; w
See this regarding instructions on configuring a 2NN on a separate machine
from the NN:
http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
- Aaron
On Thu, May 14, 2009 at 10:42 AM, Koji Noguchi wrote:
> Before 0.19, fsimage/edits were on the same directory.
> So
For your use case, you'll need to just do a ranged import (i.e., SELECT *
FROM foo WHERE id > X and id < Y), and then delete the same records after
the import succeeds (DELETE FROM foo WHERE id > X and id < Y). Before the
import, you can SELECT max(id) FROM foo to establish what Y should be; X is
i
I've worked around needing any compile-time dependencies for now. :) No
longer an issue.
- Aaron
On Wed, May 20, 2009 at 10:29 AM, Ashish Thusoo wrote:
> You could either do what Owen suggested and put the plugin in hive contrib,
> or you could just put the whole thing in hive contrib as then y
You could either do what Owen suggested and put the plugin in hive contrib, or
you could just put the whole thing in hive contrib as then you would have
access to all the lower level api (core, hdfs, hive etc.). Owen's approach
makes a lot of sense if you think that the hive dependency is a loos
No, my prime objective is not to backup db. I am trying to move the records
from mysql db to hadoop for processing. Hadoop itself doesn't keep any
records. After that, I will remove the same mysql records processed in
hadoop from the mysql db. The main point isn't about getting the mysql
recor
no, sooner or later it will run out of auto-incremental primary key because
there are new records added constantly. Datetime column will force me to
use Delete command which maybe slow as well.
He Yongqiang wrote:
>
> I think the simplest one would be finding some key( incremental primary
> ke
Reminder: the Bay Area Hadoop User Group meeting is today at 6 pm at the
Yahoo! Sunnyvale campus - http://upcoming.yahoo.com/event/2659418/
From: Ajay Anand
Sent: Wednesday, May 13, 2009 1:12 PM
To: 'core-user@hadoop.apache.org'; 'gene...@hadoop.apache.org';
'
Oh.. According to my understanding, To maintain a steady DB size,
delete and backup the old records. If so, I guess you can continuously
do that using WHERE and LIMIT clauses. Then you can reduce the I/O
costs.. It should be dumped at once?
On Thu, May 21, 2009 at 12:48 AM, dealmaker wrote:
I think the simplest one would be finding some key( incremental primary key
or datetime column etc) to partition you data.
On 09-5-20 下午11:48, "dealmaker" wrote:
>
> Other parts of the non-hadoop system will continue to add records to mysql db
> when I move those records (and remove the very
Other parts of the non-hadoop system will continue to add records to mysql db
when I move those records (and remove the very same records from mysql db
at the same time) to hadoop for processing. That's why I am doing those
mysql commands.
What are you suggesting? If I do it like you suggest,
Hadoop is a distributed filesystem. If you wanted to backup your table
data to hdfs, you can use SELECT * INTO OUTFILE 'file_name' FROM
tbl_name; Then, put it to hadoop dfs.
Edward
On Thu, May 21, 2009 at 12:08 AM, dealmaker wrote:
>
> No, actually I am using mysql. So it doesn't belong to Hive
We're still seeing this error in our log files. Is this an expected output?
(the fact that it is INFO makes it seem not so bad, but anythng to do with
DiskChecker exceptions scares me). I posted this over a week ago but haven't
had a response.. any help?
Thanks!
lance
On Mon, May 11, 2009 at 10:
Thanks, What would be the # of severs , file sizes that in their range the
performance hit will be minor? I am concerned about implementing it all only
to rewrite it later to scale economically.
Thanks for all the information.
On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah wrote:
> S d,
>
> It
No, actually I am using mysql. So it doesn't belong to Hive, I think.
owen.omalley wrote:
>
>
> On May 19, 2009, at 11:48 PM, dealmaker wrote:
>
>>
>> Hi,
>> I want to backup a table and then create a new empty one with
>> following
>> commands in Hadoop. How do I do it in java? Thanks.
Hi,
> I am not sure this code easily can be parallel computed,or how to change
> this code to add the parallel compuation.any advice will be
> appreciated.thanks in advance.
OK, I'm sure it could be run on Hama/Hadoop.
According to my understanding of your code, It's a PCA. If you have an
M im
In Hive,
alter table old_name rename to new_name;
create table (
...
)
can solve your problem.
On Wed, May 20, 2009 at 8:30 PM, Owen O'Malley wrote:
>
> On May 19, 2009, at 11:48 PM, dealmaker wrote:
>
>
>> Hi,
>> I want to backup a table and then create a new empty one with following
>> comm
Bryan Duxbury wrote:
We use XFS for our data drives, and we've had somewhat mixed results.
Thanks for that. I've just created a wiki page to put some of these
notes up -extensions and some hard data would be welcome
http://wiki.apache.org/hadoop/DiskSetup
One problem we have for hard data
Looks like you are trying to copy file to HDFS in a shutdown hook.
Since you can't control the order in which shutdown hooks run, this is
won't work. There is a patch to allow Hadoop's FileSystem shutdown
hook to be disabled so it doesn't close filesystems on exit. See
https://issues.apache.org/jir
On May 15, 2009, at 3:25 PM, Aaron Kimball wrote:
Yikes. So part of sqoop would wind up in one source repository, and
part in
another? This makes my head hurt a bit.
I'd say rather that Sqoop is in Mapred and the adapter to Hive is in
Hive.
I'm also not convinced how that helps.
Clea
On May 20, 2009, at 3:07 AM, Tom White wrote:
Why does mapred depend on hdfs? MapReduce should only depend on the
FileSystem interface, shouldn't it?
Yes, I should have been consistent. In terms of compile-time
dependences, mapred only depends on core.
-- Owen
On May 19, 2009, at 11:48 PM, dealmaker wrote:
Hi,
I want to backup a table and then create a new empty one with
following
commands in Hadoop. How do I do it in java? Thanks.
Since this is a question about Hive, you should be asking on hive-u...@hadoop.apache.org
.
-- Owen
On Fri, May 15, 2009 at 11:06 PM, Owen O'Malley wrote:
>
> On May 15, 2009, at 2:05 PM, Aaron Kimball wrote:
>
>> In either case, there's a dependency there.
>
> You need to split it so that there are no cycles in the dependency tree. In
> the short term it looks like:
>
> avro:
> core: avro
> hd
Hi John,
You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes
that you calculate all 12 statistics in one map. Each output file
would have a single l
The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only on the jobtracker will not have any effect.
Note that the number of map tasks that you set is only a suggestion,
and depends on the number of splits actual
Hi,
I'm having some trouble implementing what I want to achieve... essentially I
have a large input list of documents that I want to get statistics on. For
each document I have 12 different stats to work out.
So my input file is a text file with one document filepath on each line. The
documents a
On Wed, May 20, 2009 at 8:08 AM, ykj wrote:
>
> Hello,everyone
hi
> I am new to hama. in our project ,my team leader let me upload old
> code, run it on hadoop with parallel matrix computation.
hama has it's own mailing list and this question is probably better
asked there. see http://in
Hello,everyone
I am new to hama. in our project ,my team leader let me upgrade old
code, run it on hadoop with parallel matrix computation.this is old code:
public class EigenFaceGenerator {
Matrix averageFace; //stores the average face useful when probing
the datab
Hello,everyone
I am new to hama. in our project ,my team leader let me upgrade old
code, run it on hadoop with parallel matrix computation.this is old code:
public class EigenFaceGenerator {
Matrix averageFace; //stores the average face useful when probing
the databas
Hello,everyone
I am new to hama. in our project ,my team leader let me upload old
code, run it on hadoop with parallel matrix computation.this is old code:
public class EigenFaceGenerator {
Matrix averageFace;//stores the average face useful when
probing t
Hello,everyone
I am new to hama. in our project ,my team leader let me upload old
code, run it on hadoop with parallel matrix computation.this is old code:
public class EigenFaceGenerator {
Matrix averageFace;//stores the average face useful when
probing t
49 matches
Mail list logo