Hadoop 0.20.0, xml parsing related error

2009-06-24 Thread murali krishna
Hi, Recently migrated to hadoop-0.20.0 and I am facing https://issues.apache.org/jira/browse/HADOOP-5254 Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.documentbuilderfactoryi...@1e9e5c73:java.lang.UnsupportedOperationException: This parser does not support

Re: EC2, Hadoop, copy file from CLUSTER_MASTER to CLUSTER, failing

2009-06-24 Thread Tom White
Hi Saptarshi, The group permissions open the firewall ports to enable access, but there are no shared keys on the cluster by default. See https://issues.apache.org/jira/browse/HADOOP-4131 for a patch to the scripts that shares keys to allow SSH access between machines in the cluster. Cheers, Tom

Re: Looking for correct way to implements WritableComparable in Hadoop-0.17

2009-06-24 Thread Tom White
Hi Kun, The book's code is for 0.20.0. In Hadoop 0.17.x WritableComparable was not generic, so you need a declaration like: public class IntPair implements WritableComparable { } And the compareTo() method should look like this: public int compareTo(Object o) { IntPair ip = (IntPair) o;

Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Usman Waheed
Hi All, Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3? I tried but interestingly the output was not what i expected versus what i got when my data was in uncompressed format. Thanks, Usman

Re: Is it possible? I want to group data blocks.

2009-06-24 Thread Tom White
You might be interested in https://issues.apache.org/jira/browse/HDFS-385, where there is discussion about how to add pluggable block placement to HDFS. Cheers, Tom On Tue, Jun 23, 2009 at 5:50 PM, Alex Loddengaarda...@cloudera.com wrote: Hi Hyunsik, Unfortunately you can't control the

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread jason hadoop
I believe the cloudera 18.3 supports bzip2 On Wed, Jun 24, 2009 at 3:45 AM, Usman Waheed usm...@opera.com wrote: Hi All, Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3? I tried but interestingly the output was not what i expected versus what i got when my data was in

Re: Hadoop 0.20.0, xml parsing related error

2009-06-24 Thread Ram Kulbak
Hi, The exception is a result of having xerces in the classpath. To resolve, make sure you are using Java 6 and set the following system property: -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl This can also be resolved by the

Re: Hadoop 0.20.0, xml parsing related error

2009-06-24 Thread Murali Krishna. P
Thanks Ram, Setting that system property solved the issue. Thanks, Murali Krishna From: Ram Kulbak ram.kul...@gmail.com To: core-user@hadoop.apache.org Sent: Wednesday, 24 June, 2009 7:29:37 PM Subject: Re: Hadoop 0.20.0, xml parsing related error Hi,

RE: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Gross, Danny
Hi Usman, I'm running 0.18.3 from hadoop.apache.org, and have no issues with bz2 files. My experiments with these files have been through Pig. Hope this is useful to you. Best regards, Danny Gross -Original Message- From: Usman Waheed [mailto:usm...@opera.com] Sent: Wednesday, June

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Usman Waheed
Hi Danny, Hmmm makes me wonder that i might be doing something wrong here. I imported just one .bz2 files into HDFS and then launched a map/reduce tasks executing the following command: /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-streaming.jar -input

[ANN] Zohmg, a time series data store backed by HBase

2009-06-24 Thread Fredrik Möllerstrand
Hello, list! I'm happy to announce Zohmg, a data store for aggregation of multi-dimensional time series data built on top of Hadoop, Dumbo and HBase. Data is imported with a mapreduce job and is exported through an HTTP API. A typical use-case for Zohmg is the analysis of Apache log files. The

Custom Binary FileInputFormat, splitting

2009-06-24 Thread william kinney
Hi, I have binary files in the HDFS that I am creating a InputFormat (and RecordReader) for. The binary format is something like [X of length 4 bytes][Y of X size], where X evaluates to an int, and the pattern continues as XYXYXYXY. I use X (size) to know the length of the next record to read

Re: Upgrading from .19 to .20 problems

2009-06-24 Thread Seunghwa Kang
You may also need to set mapred.job.tracker in mapred-site.xml -seunghwa On Thu, 2009-06-18 at 10:42 -0700, llpind wrote: Hey All, I'm able to start my master server, but none of the slave nodes come up (unless I list the master as the slave). After searching a bit, seems people have

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread John Heidemann
On Wed, 24 Jun 2009 12:45:59 +0200, Usman Waheed wrote: Hi All, Can I map/reduce logs that have the .bz2 extension in Hadoop 18.3? I tried but interestingly the output was not what i expected versus what i got when my data was in uncompressed format. Thanks, Usman Not AFAIK, but we have added

RE: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Gross, Danny
Hi Usman, I believe your issue is specifically in the contrib/ hadoop-streaming.jar. I ran a test python job with hadoop-streaming.jar on a bz2 file with no errors. However, the output was junk. Pig has no issue with bz2 files. According to

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Usman Waheed
Cool, Thanks will check it out. -Usman Hi Usman, I believe your issue is specifically in the contrib/ hadoop-streaming.jar. I ran a test python job with hadoop-streaming.jar on a bz2 file with no errors. However, the output was junk. Pig has no issue with bz2 files. According to

Re: When is configure and close run

2009-06-24 Thread Saptarshi Guha
Thank you! Just to confirm. Consider a JVM (that is being reused), has to reduce K1,{V11,V12,V13..} and K2,{V21,V22,V23,}. Then the configure and close methods are called once each for both K1,{V11,...} and K2,{V2,}? Is my understanding correct? Once again, there is no combiner, and it

Re: EC2, Hadoop, copy file from CLUSTER_MASTER to CLUSTER, failing

2009-06-24 Thread Saptarshi Guha
Hello, Thank you. This is quite useful. Regards Saptarshi On Wed, Jun 24, 2009 at 6:16 AM, Tom Whitet...@cloudera.com wrote: Hi Saptarshi, The group permissions open the firewall ports to enable access, but there are no shared keys on the cluster by default. See

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Christophe Bisciglia
This is correct - thanks for the note Jason. You can see the current patch list for Cloudera's Distribution (based on 18.3) at: http://www.cloudera.com/hadoop-manifest In addition to Bzip2, we have patched in: DBInputFormat, the fair scheduler, job level task limiting, soft fd leak fix, a fix for

Re: Are .bz2 extensions supported in Hadoop 18.3

2009-06-24 Thread Usman Waheed
Very cool, we are using Debian and I checked Cloudera's website. You have packages for the Debian platform. Will check it out and install on a test cluster. Thanks much, Usman This is correct - thanks for the note Jason. You can see the current patch list for Cloudera's Distribution (based

RE: DataNode Cannot Start Incompatible namespaceIDs

2009-06-24 Thread Boyu Zhang
Hi everyone, Me again. The problem has been solved, I clean the stuff in /tmp and everything works. Thanks. Boyu -Original Message- From: Boyu Zhang [mailto:bzh...@cs.utsa.edu] Sent: Wednesday, June 24, 2009 3:54 PM To: core-user@hadoop.apache.org Subject: DataNode Cannot Start

CompositeInputFormat scalbility

2009-06-24 Thread pmg
I have two files FileA (with 600K records) and FileB (With 2million records) FileA has a key which is same of all the records 123724101722493 123781676672721 FileB has the same key as FileA 1235026328101569 1235026328001562 Using hadoop join package I can create output file

Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread asif md
hello everyone, I have added 7 nodes to my 3 node cluster. I followed the following steps to do this 1. added the node's ip to conf/slaves at master 2. ran bin/start-balance.sh at each node As i loaded the data when the size of the cluster was three which is now TEN. Can i do anything to

Re: Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread Alex Loddengaard
Hi, Running the rebalancer script (by the way, you only need to run it once) redistributes all of your data for you. That is, after you've run the rebalancer, your data should be stored evenly among your 10 nodes. Alex On Wed, Jun 24, 2009 at 2:50 PM, asif md asif.d...@gmail.com wrote: hello

Re: Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread asif md
@Alex Thanks. http://wiki.apache.org/hadoop/FAQ#6 has anyone any experience with this? Please suggest. On Wed, Jun 24, 2009 at 5:44 PM, Alex Loddengaard a...@cloudera.com wrote: Hi, Running the rebalancer script (by the way, you only need to run it once) redistributes all of

Next Bay Area Hadoop User Group - Focus on Hadoop 0.20 and Core Project Split

2009-06-24 Thread Christophe Bisciglia
Bay Area Hadoop Fans, We're excited to hold our first Hadoop User Group at Cloudera's office in Burlingame (just south of SFO). We pushed the start time back 30 minutes to allow a little extra time to drive further north, and we hope the mid-way location brings more users from San Francisco.

Re: Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread Konstantin Shvachko
These links should help you to rebalance the nodes: http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer

Re: Add new Datnodes : Is redistribution of previous data required?

2009-06-24 Thread asif md
@Konstantin I'll try those . Thanks. More comments are welcome. On Wed, Jun 24, 2009 at 7:27 PM, Konstantin Shvachko s...@yahoo-inc.comwrote: These links should help you to rebalance the nodes: http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing

Re: CompositeInputFormat scalbility

2009-06-24 Thread jason hadoop
The join package does a streaming merge sort between each part-X in your input directories, part- will be handled a single task, part-0001 will be handled in a single task and so on These jobs are essentially io bound, and hard to beat for performance. On Wed, Jun 24, 2009 at 2:09 PM, pmg

Re: CompositeInputFormat scalbility

2009-06-24 Thread pmg
And what decides part-, part-0001input split, block size? So for example for 1G of data on HDFS with 64m block size get 16 blocks mapped to different map tasks? jason hadoop wrote: The join package does a streaming merge sort between each part-X in your input directories,

Re: where is the addDependingJob?

2009-06-24 Thread Amareshwari Sriramadasu
HRoger wrote: Hi As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a method called addDependingJob but not in org.apache.hadoop.mapreduce.Job.Is there some method works like addDependingJob in mapreduce package? org.apache.hadoop.mapred.jobcontrol.Job is moved to

Re: CompositeInputFormat scalbility

2009-06-24 Thread jason hadoop
The input split size is Long.MAX_VALUE. and in actual fact, the contents of each directory are sorted separately. The number of directory entries for each has to be identical. and all files in index position I, where I varies from 0 to the number of files in a directory, become the input to 1

Re: where is the addDependingJob?

2009-06-24 Thread HRoger
Thanks for your answer,I am using the 0.20 and programing with the new api,so how can I make one job ran after the other job in one class with the new api? Amareshwari Sriramadasu wrote: HRoger wrote: Hi As you know in the org.apache.hadoop.mapred.jobcontrol.Job there is a method called

Re: where is the addDependingJob?

2009-06-24 Thread Amareshwari Sriramadasu
You can use 0.21-dev. If not, you can try using old api jobcontrol to create dependingJobs by getting the conf from org.apache.hadoop.mapreduce.Job.getConfiguration(). Thanks Amareshwari HRoger wrote: Thanks for your answer,I am using the 0.20 and programing with the new api,so how can I make

Problem with setting up the cluster

2009-06-24 Thread .ke. sivakumar
Hi all, I'm a student and I have been tryin to set up the hadoop cluster for a while but have been unsuccessful till now. I'm tryin to setup a 4-node cluster 1 - namenode 1 - job tracker 2 - datanode / task tracker version of hadoop - 0.18.3 *config in hadoop-site.xml* (which I have replicated