Hi,
Recently migrated to hadoop-0.20.0 and I am facing
https://issues.apache.org/jira/browse/HADOOP-5254
Failed to set setXIncludeAware(true) for parser
org.apache.xerces.jaxp.documentbuilderfactoryi...@1e9e5c73:java.lang.UnsupportedOperationException:
This parser does not support
Hi Saptarshi,
The group permissions open the firewall ports to enable access, but
there are no shared keys on the cluster by default. See
https://issues.apache.org/jira/browse/HADOOP-4131 for a patch to the
scripts that shares keys to allow SSH access between machines in the
cluster.
Cheers,
Tom
Hi Kun,
The book's code is for 0.20.0. In Hadoop 0.17.x WritableComparable was
not generic, so you need a declaration like:
public class IntPair implements WritableComparable {
}
And the compareTo() method should look like this:
public int compareTo(Object o) {
IntPair ip = (IntPair) o;
Hi All,
Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3?
I tried but interestingly the output was not what i expected versus what
i got when my data was in uncompressed format.
Thanks,
Usman
You might be interested in
https://issues.apache.org/jira/browse/HDFS-385, where there is
discussion about how to add pluggable block placement to HDFS.
Cheers,
Tom
On Tue, Jun 23, 2009 at 5:50 PM, Alex Loddengaarda...@cloudera.com wrote:
Hi Hyunsik,
Unfortunately you can't control the
I believe the cloudera 18.3 supports bzip2
On Wed, Jun 24, 2009 at 3:45 AM, Usman Waheed usm...@opera.com wrote:
Hi All,
Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3?
I tried but interestingly the output was not what i expected versus what i
got when my data was in
Hi,
The exception is a result of having xerces in the classpath. To resolve,
make sure you are using Java 6 and set the following system property:
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
This can also be resolved by the
Thanks Ram,
Setting that system property solved the issue.
Thanks,
Murali Krishna
From: Ram Kulbak ram.kul...@gmail.com
To: core-user@hadoop.apache.org
Sent: Wednesday, 24 June, 2009 7:29:37 PM
Subject: Re: Hadoop 0.20.0, xml parsing related error
Hi,
Hi Usman,
I'm running 0.18.3 from hadoop.apache.org, and have no issues with bz2
files. My experiments with these files have been through Pig. Hope
this is useful to you.
Best regards,
Danny Gross
-Original Message-
From: Usman Waheed [mailto:usm...@opera.com]
Sent: Wednesday, June
Hi Danny,
Hmmm makes me wonder that i might be doing something wrong here. I
imported just one .bz2 files into HDFS and then launched a map/reduce
tasks executing the following command:
/home/hadoop/hadoop/bin/hadoop jar
/home/hadoop/hadoop/contrib/streaming/hadoop-streaming.jar -input
Hello, list!
I'm happy to announce Zohmg, a data store for aggregation of multi-dimensional
time series data built on top of Hadoop, Dumbo and HBase. Data is imported
with a mapreduce job and is exported through an HTTP API.
A typical use-case for Zohmg is the analysis of Apache log files. The
Hi,
I have binary files in the HDFS that I am creating a InputFormat (and
RecordReader) for. The binary format is something like [X of length 4
bytes][Y of X size], where X evaluates to an int, and the pattern
continues as XYXYXYXY. I use X (size) to know the length of the next
record to read
You may also need to set
mapred.job.tracker in mapred-site.xml
-seunghwa
On Thu, 2009-06-18 at 10:42 -0700, llpind wrote:
Hey All,
I'm able to start my master server, but none of the slave nodes come up
(unless I list the master as the slave). After searching a bit, seems
people have
On Wed, 24 Jun 2009 12:45:59 +0200, Usman Waheed wrote:
Hi All,
Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3?
I tried but interestingly the output was not what i expected versus
what i got when my data was in uncompressed format.
Thanks,
Usman
Not AFAIK, but we have added
Hi Usman,
I believe your issue is specifically in the contrib/
hadoop-streaming.jar. I ran a test python job with
hadoop-streaming.jar on a bz2 file with no errors. However, the output
was junk.
Pig has no issue with bz2 files.
According to
Cool, Thanks will check it out.
-Usman
Hi Usman,
I believe your issue is specifically in the contrib/
hadoop-streaming.jar. I ran a test python job with
hadoop-streaming.jar on a bz2 file with no errors. However, the output
was junk.
Pig has no issue with bz2 files.
According to
Thank you! Just to confirm. Consider a JVM (that is being reused), has
to reduce K1,{V11,V12,V13..} and K2,{V21,V22,V23,}. Then the
configure and close methods are called once each for both K1,{V11,...}
and K2,{V2,}?
Is my understanding correct?
Once again, there is no combiner, and it
Hello,
Thank you. This is quite useful.
Regards
Saptarshi
On Wed, Jun 24, 2009 at 6:16 AM, Tom Whitet...@cloudera.com wrote:
Hi Saptarshi,
The group permissions open the firewall ports to enable access, but
there are no shared keys on the cluster by default. See
This is correct - thanks for the note Jason. You can see the current
patch list for Cloudera's Distribution (based on 18.3) at:
http://www.cloudera.com/hadoop-manifest
In addition to Bzip2, we have patched in: DBInputFormat, the fair
scheduler, job level task limiting, soft fd leak fix, a fix for
Very cool, we are using Debian and I checked Cloudera's website. You have
packages for the Debian platform.
Will check it out and install on a test cluster.
Thanks much,
Usman
This is correct - thanks for the note Jason. You can see the current
patch list for Cloudera's Distribution (based
Hi everyone,
Me again. The problem has been solved, I clean the stuff in /tmp and
everything works. Thanks.
Boyu
-Original Message-
From: Boyu Zhang [mailto:bzh...@cs.utsa.edu]
Sent: Wednesday, June 24, 2009 3:54 PM
To: core-user@hadoop.apache.org
Subject: DataNode Cannot Start
I have two files FileA (with 600K records) and FileB (With 2million records)
FileA has a key which is same of all the records
123724101722493
123781676672721
FileB has the same key as FileA
1235026328101569
1235026328001562
Using hadoop join package I can create output file
hello everyone,
I have added 7 nodes to my 3 node cluster. I followed the following steps to
do this
1. added the node's ip to conf/slaves at master
2. ran bin/start-balance.sh at each node
As i loaded the data when the size of the cluster was three which is now
TEN. Can i do anything to
Hi,
Running the rebalancer script (by the way, you only need to run it once)
redistributes all of your data for you. That is, after you've run the
rebalancer, your data should be stored evenly among your 10 nodes.
Alex
On Wed, Jun 24, 2009 at 2:50 PM, asif md asif.d...@gmail.com wrote:
hello
@Alex
Thanks.
http://wiki.apache.org/hadoop/FAQ#6
has anyone any experience with this?
Please suggest.
On Wed, Jun 24, 2009 at 5:44 PM, Alex Loddengaard a...@cloudera.com wrote:
Hi,
Running the rebalancer script (by the way, you only need to run it once)
redistributes all of
Bay Area Hadoop Fans,
We're excited to hold our first Hadoop User Group at Cloudera's office
in Burlingame (just south of SFO). We pushed the start time back 30
minutes to allow a little extra time to drive further north, and we
hope the mid-way location brings more users from San Francisco.
These links should help you to rebalance the nodes:
http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer
http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer
@Konstantin
I'll try those . Thanks.
More comments are welcome.
On Wed, Jun 24, 2009 at 7:27 PM, Konstantin Shvachko s...@yahoo-inc.comwrote:
These links should help you to rebalance the nodes:
http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing
The join package does a streaming merge sort between each part-X in your
input directories,
part- will be handled a single task,
part-0001 will be handled in a single task
and so on
These jobs are essentially io bound, and hard to beat for performance.
On Wed, Jun 24, 2009 at 2:09 PM, pmg
And what decides part-, part-0001input split, block size?
So for example for 1G of data on HDFS with 64m block size get 16 blocks
mapped to different map tasks?
jason hadoop wrote:
The join package does a streaming merge sort between each part-X in your
input directories,
HRoger wrote:
Hi
As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a
method called addDependingJob but not in
org.apache.hadoop.mapreduce.Job.Is there some method works like
addDependingJob in mapreduce package?
org.apache.hadoop.mapred.jobcontrol.Job is moved to
The input split size is Long.MAX_VALUE.
and in actual fact, the contents of each directory are sorted separately.
The number of directory entries for each has to be identical.
and all files in index position I, where I varies from 0 to the number of
files in a directory, become the input to 1
Thanks for your answer,I am using the 0.20 and programing with the new api,so
how can I make one job ran after the other job in one class with the new
api?
Amareshwari Sriramadasu wrote:
HRoger wrote:
Hi
As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a
method called
You can use 0.21-dev.
If not, you can try using old api jobcontrol to create dependingJobs by
getting the conf from
org.apache.hadoop.mapreduce.Job.getConfiguration().
Thanks
Amareshwari
HRoger wrote:
Thanks for your answer,I am using the 0.20 and programing with the new api,so
how can I make
Hi all, I'm a student and I have been tryin to set up the hadoop cluster for
a while
but have been unsuccessful till now.
I'm tryin to setup a 4-node cluster
1 - namenode
1 - job tracker
2 - datanode / task tracker
version of hadoop - 0.18.3
*config in hadoop-site.xml* (which I have replicated
35 matches
Mail list logo