Re: Tasktracker fails
Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.bind(Server.java:225) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301) at org.apache.hadoop.ipc.Server.init(Server.java:1483) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1428) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673) Any comments on the issue. Thanks
Hadoop MR jobs failed: Owner 'uid' for path */jobcache/job_*/attempt_*/output/file.out.index did not match expected owner 'username'
Hello Hadoop mailinglist, we have problems running a Hadoop M/R Job on HDFS. It is a 2-node test system using 0.20.203 using a PIG script. The map tasks run through, but most job attempt outputs of one of the machines are rejected by the reducer and rescheduled. This is the stack trace/error message: Map output lost, rescheduling: getMapOutput(attempt_201202210928_0005_m_08_0,129) failed : java.io.IOException: Owner 'MYUID' for path LOCAL_PATH/jobcache/job_201202210928_0005/attempt_201202210928_0005_m_08_0/output/file.out did not match expected owner 'MYUSERNAME' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:110) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3837) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) The userid (as number) is correct. but it seem that there are problems to map the number back to a valid username. The machines use Scientific Linux 6.1 with NIS/yp for usernames/passwords. However, as far as I understand the source, the Owner username is obtained by running ls -ld (class RawFileSystemStatus). Running ls -l locally on the files returns a correctly resolved username. The expected owner is obtained from the process system property user.name (class TaskController). After around 2k failed attempts the pig task is aborted. * Can anyone help me or give me a hint what went wrong here. * Is it possible to disable these security checks via a configuration? I would really appreciate any help. Thank you, Dirk
Re: HBase/HDFS very high iowait
Observe about 50% iowait before even starting clients - that is when there is actually no load from clients on the system. So only internal stuff in HBase/HDFS can cause this - HBase compaction? HDFS? Regards, Per Steffensen Per Steffensen skrev: Hi We have a system a.o. with a HBase cluster and a HDFS cluster (primarily for HBase persistence). Depending on the environment we have between 3 and 8 machine running a HBase RegionServer and a HDFS DataNode. OS is Ubuntu 10.04. On those machine we see very high iowait and very little real usage of the CPU, and unexpected low throughput (HBase creates, updates, reads and short scans). We do not get more throughput by putting more parallel load from the HBase clients on the HBase servers, so it is a real iowait problem. Any idea what might be wrong, and what we can do to improve throughput and lower iowait. Regards, Per Steffensen
Re: HBase/HDFS very high iowait
Per Steffensen skrev: Observe about 50% iowait before even starting clients - that is when there is actually no load from clients on the system. So only internal stuff in HBase/HDFS can cause this - HBase compaction? HDFS? Ahh ok, that was only for half a minute after restart. So basically down to 100% idle when no load from clients. Regards, Per Steffensen Per Steffensen skrev: Hi We have a system a.o. with a HBase cluster and a HDFS cluster (primarily for HBase persistence). Depending on the environment we have between 3 and 8 machine running a HBase RegionServer and a HDFS DataNode. OS is Ubuntu 10.04. On those machine we see very high iowait and very little real usage of the CPU, and unexpected low throughput (HBase creates, updates, reads and short scans). We do not get more throughput by putting more parallel load from the HBase clients on the HBase servers, so it is a real iowait problem. Any idea what might be wrong, and what we can do to improve throughput and lower iowait. Regards, Per Steffensen
Security at file level in Hadoop
Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Security at file level in Hadoop
Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: Security at file level in Hadoop
Hi Shreya, A permissions guide for HDFS is available at: http://hadoop.apache.org/common/docs/current/hdfs_permissions_guide.html The permissions system is much the same as unix-like systems with users and groups. Though I have not worked with this, I think it is likely that all permissions will need to be applied after putting files into HDFS. Hope that helps, Ben On 22 February 2012 10:41, shreya@cognizant.com wrote: Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: Security at file level in Hadoop
You can probably use hadoop fs - chmod permission filename as suggested above. You can provide r/w permissions as you provide for general unix files. Can you please share your experiences on this thing ? Thanks, Praveenesh On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers smithers@googlemail.comwrote: Hi Shreya, A permissions guide for HDFS is available at: http://hadoop.apache.org/common/docs/current/hdfs_permissions_guide.html The permissions system is much the same as unix-like systems with users and groups. Though I have not worked with this, I think it is likely that all permissions will need to be applied after putting files into HDFS. Hope that helps, Ben On 22 February 2012 10:41, shreya@cognizant.com wrote: Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: Optimized Hadoop
Great work folks! Very interesting. PS: did you notice if you google for hanborq or HDH it's very hard to find your website, hanborq.com ? Dieter On Tue, 21 Feb 2012 02:17:31 +0800 Schubert Zhang zson...@gmail.com wrote: We just update the slides of this improvements: http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a Updates: (1) modified some describes to make things more clear and accuracy. (2) add some benchmarks to make sense. On Sat, Feb 18, 2012 at 11:12 PM, Anty anty@gmail.com wrote: On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon t...@cloudera.com wrote: Hey Schubert, Looking at the code on github, it looks like your rewritten shuffle is in fact just a backport of the shuffle from MR2. I didn't look closely additionally, the rewritten shuffle in MR2 has some bugs, which harm the overall performance, for which I have already file a jira to report this, with a patch available. MAPREDUCE-3685 https://issues.apache.org/jira/browse/MAPREDUCE-3685 - are there any distinguishing factors? Also, the OOB heartbeat and adaptive heartbeat code seems to be the same as what's in 1.0? -Todd On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang zson...@gmail.com wrote: Here is the presentation to describe our job, http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a Wellcome to give your advises. It's just a little step, and we are continue to do more improvements, thanks for your help. On Thu, Feb 16, 2012 at 11:01 PM, Anty anty@gmail.com wrote: Hi: Guys We just deliver a optimized hadoop , if you are interested, Pls refer to https://github.com/hanborq/hadoop -- Best Regards Anty Rao -- Todd Lipcon Software Engineer, Cloudera -- Best Regards Anty Rao
mapred.map.tasks and mapred.reduce.tasks parameter meaning
Hello, Could someone please help me to understand these configuration parameters in depth. mapred.map.tasks and mapred.reduce.tasks It is mentioned that default value of these parameters is 2 and 1. *What does it mean?* Does it mean 2 maps and 1 reduce per node. Does it mean 2 maps and 1 reduce in total (for the cluster). Or Does it mean 2 maps and 1 reduce per Job. Can we change maps and reduce for default example Jobs such as Wordcount etc. too? At the same time, I believe that total number of maps are dependent upon input data size? Please help me understand these two parameters clearly. Thanks in advance, Amit - Sangroya -- View this message in context: http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning
Amit, On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote: Hello, Could someone please help me to understand these configuration parameters in depth. mapred.map.tasks and mapred.reduce.tasks It is mentioned that default value of these parameters is 2 and 1. *What does it mean?* Does it mean 2 maps and 1 reduce per node. Does it mean 2 maps and 1 reduce in total (for the cluster). Or Does it mean 2 maps and 1 reduce per Job. These are set per-job, and therefore mean 2 maps and 1 reducer for the single job you notice the value in. Can we change maps and reduce for default example Jobs such as Wordcount etc. too? You can tweak the # of reducers at will. With the default HashPartitioner, scaling reducers is easy by just increasing the #s. At the same time, I believe that total number of maps are dependent upon input data size? Yes maps are dependent on the # of input files and their size (if they are splittable). At minimum, with FileInputFormat derivatives, you will have at least one map per file. You can have multiple maps per file if they extend beyond a single block and can be split. For some more info, take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning
If I am correct : For setting mappers/node --- mapred.tasktracker.map.tasks.maximum For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum For setting mappers/job mapred.map.tasks (applicable for whole cluster) For setting reducers/job mapred.reduce.tasks(same) You can change these values in your M/R code using Job / configurattion object Thanks, Praveenesh On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote: Hello, Could someone please help me to understand these configuration parameters in depth. mapred.map.tasks and mapred.reduce.tasks It is mentioned that default value of these parameters is 2 and 1. *What does it mean?* Does it mean 2 maps and 1 reduce per node. Does it mean 2 maps and 1 reduce in total (for the cluster). Or Does it mean 2 maps and 1 reduce per Job. Can we change maps and reduce for default example Jobs such as Wordcount etc. too? At the same time, I believe that total number of maps are dependent upon input data size? Please help me understand these two parameters clearly. Thanks in advance, Amit - Sangroya -- View this message in context: http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Changing into Replication factor
Hi, You need to use hadoop fs -setrep to change replication of existing files. See the manual at http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html#setrep on how to use it. On Wed, Feb 22, 2012 at 1:03 PM, hadoop hive hadooph...@gmail.com wrote: HI Folks, Rite now i m having replication factor 2, but now i want to make it three for sum tables so how can i do that for specific tables, so that whenever the data would be loaded in those tables it can automatically replicated into three nodes. Or i need to replicate for all the tables. and how can i do that by simply changing the parameter to 3 and run -*refreshNodes *or there is another way to perform that. Regards hadoopHive -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about
Re: Security at file level in Hadoop
HDFS supports POSIX style file and directory permissions (read, write, execute) for the owner, group and world. You can change the permissions with hadoop fs -chmod permissions path -Joey On Feb 22, 2012, at 5:32, shreya@cognizant.com wrote: Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: Tasktracker fails
Hm.. I would try first to stop all the deamons wtih $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one of the slaves no deamons are running (jps). Maybe you could try to check if your conf on tasktrackers for the jobtracker is pointing to the right place (mapred-site.xml). Do you see any error in the jobtracker log too? On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote: Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.**ServerSocketChannelImpl.bind(** ServerSocketChannelImpl.java:**119) at sun.nio.ch.**ServerSocketAdaptor.bind(** ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.**bind(Server.java:225) at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:** 301) at org.apache.hadoop.ipc.Server.**init(Server.java:1483) at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506) at org.apache.hadoop.mapred.**TaskTracker.initialize(** TaskTracker.java:772) at org.apache.hadoop.mapred.**TaskTracker.init(** TaskTracker.java:1428) at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.** java:3673) Any comments on the issue. Thanks
ClassNotFoundException: -libjars not working?
Hello, I'm trying to run a map-reduce job and I get ClassNotFoundException, but I have the class submitted with -libjars. What's wrong with how I do things? Please help. I'm running hadoop-0.20.2-cdh3u1, and I have everithing on the -libjars line. The job is submitted via a java app like: exec /usr/lib/jvm/java-6-sun/bin/java -Dproc_jar -Xmx200m -server -Dhadoop.log.dir=/opt/ui/var/log/mailsearch -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hbase -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -classpath '/usr/lib/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/apache-log4j-extras-1.1.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib /jcl-over-slf4j-1.6.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-api-2.1.jar:/usr/share/mailbox-convertor/lib/*:/usr/lib/hadoop/contrib/capacity-scheduler/hadoop-capacity-scheduler-0.20.2-cdh3u1.jar:/usr/lib/hbase/lib/hadoop-lzo-0.4.13.jar:/usr/lib/hbase/hbase.jar:/etc/hbase/conf:/usr/lib/hbase/lib:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/hadoop/contrib /capacity-scheduler/hadoop-capacity-scheduler-0.20.2-cdh3u1.jar:/usr/lib/hbase/lib/hadoop-lzo-0.4.13.jar:/usr/lib/hbase/hbase.jar:/etc/hbase/conf:/usr/lib/hbase/lib:/usr/lib/zookeeper/zookeeper.jar' org.apache.hadoop.util.RunJar /usr/share/mailbox-convertor/mailbox-convertor-0.1-SNAPSHOT.jar -libjars=/usr/share/mailbox-convertor/lib/antlr-2.7.7.jar,/usr/share/mailbox-convertor/lib/aopalliance-1.0.jar,/usr/share/mailbox-convertor/lib/asm-3.1.jar,/usr/share/mailbox-convertor/lib/backport-util-concurrent-3.1.jar,/usr/share/mailbox-convertor/lib/cglib-2.2.jar,/usr/share/mailbox-convertor/lib/hadoop-ant-3.0-u1.pom,/usr/share/mailbox-convertor/lib/speed4j-0.9.jar,/usr/share/mailbox-convertor/lib/jamm-0.2.2.jar,/usr/share/mailbox-convertor/lib/uuid-3.2.0.jar,/usr/share/mailbox-convertor/lib/high-scale-lib-1.1.1.jar,/usr/share/mailbox-convertor/lib/jsr305-1.3.9.jar,/usr/share/mailbox-convertor/lib/guava-11.0.1.jar,/usr/share/mailbox-convertor/lib/protobuf-java-2.4.0a.jar,/usr/share/mailbox-convertor/lib/concurrentlinkedhashmap-lru-1.1.jar,/usr/share/mailbox-convertor/lib/json-simple-1.1.jar,/usr/share/mailbox-convertor/lib/itext-2.1.5.jar,/usr/share/mailbox-convertor/lib/jmxtools-1.2.1.jar,/usr/share/mailbox-convertor/lib/jersey-client-1.4.jar,/usr/share/mailbox-converto r/lib/jersey-core-1.4.jar,/usr/share/mailbox-convertor/lib/jersey-json-1.4.jar,/usr/share/mailbox-convertor/lib/jersey-server-1.4.jar,/usr/share/mailbox-convertor/lib/jmxri-1.2.1.jar,/usr/share/mailbox-convertor/lib/jaxb-impl-2.1.12.jar,/usr/share/mailbox-convertor/lib/xstream-1.2.2.jar,/usr/share/mailbox-convertor/lib/commons-metrics-1.3.jar,/usr/share/mailbox-convertor/lib/commons-monitoring-2.9.1.jar,/usr/share/mailbox-convertor/lib/html-utils-2.3.jar,/usr/share/mailbox-convertor/lib/mailstore-client-1.0.3.jar,/usr/share/mailbox-convertor/lib/newsearch-commons-1.0.28eo-SNAPSHOT.jar,/usr/share/mailbox-convertor/lib/spring-hbase-gateway-1.0.15.jar,/usr/share/mailbox-convertor/lib/newsearch-deleter-filtering-1.0.11.jar,/usr/share/mailbox-convertor/lib/newsearch-indexing-1.0.28eo-SNAPSHOT.jar,/usr/share/mailbox-convertor/lib/ums-commons-2.1.4.jar,/usr/share/mailbox-convertor/lib/trinity-config-1.2.0.jar,/usr/share/mailbox-convertor/lib/trinity-message-6.7.0.jar,/usr/share/mail
Re: Security at file level in Hadoop
According to this (http://goo.gl/rfwy4) Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the user and groups of the running process. How does this work now? Praveen On Wed, Feb 22, 2012 at 6:03 PM, Joey Echeverria j...@cloudera.com wrote: HDFS supports POSIX style file and directory permissions (read, write, execute) for the owner, group and world. You can change the permissions with hadoop fs -chmod permissions path -Joey On Feb 22, 2012, at 5:32, shreya@cognizant.com wrote: Hi I want to implement security at file level in Hadoop, essentially restricting certain data to certain users. Ex - File A can be accessed only by a user X File B can be accessed by only user X and user Y Is this possible in Hadoop, how do we do it? At what level are these permissions applied (before copying to HDFS or after putting in HDFS)? When the file gets replicated does it retain these permissions? Thanks Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Backupnode in 1.0.0?
It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Runtime Comparison of Hadoop 0.21.0 and 1.0.1
All, I saw the announcement that hadoop 1.0.1 micro release was available. I have been waiting for this because I need the MutipleOutputs capability, which 1.0.0 didn't support. I grabbed a copy of the release candidate. I was happy to see that the directory structure once again conforms (mostly) to the older releases as opposed to what was in the 1.0.0 release. I did a comparison of run times between 1.0.1 and 0.21.0, which is my production version. It seems that 1.0.1 runs about four times slower than 0.21.0. With the same code, same hardware, same configuration, and the same data set; end to end times are: 0.21.0 = 8.83 minutes. 1.0.1 = 30.26 minutes. Is this a known condition? Thanks -- Geoffry Roberts
Re: Splitting files on new line using hadoop fs
Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks. Could you explain more on the use case that demands such a requirement while hdfs copy itself? --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Splitting files on new line using hadoop fs Sent: Feb 23, 2012 01:45 How can I copy large text files using hadoop fs such that split occurs based on blocks + new lines instead of blocks alone? Is there a way to do this? Regards Bejoy K S From handheld, Please excuse typos.
Re: Tasktracker fails
I do not know how the distribution and splitting of deflate files exactly works if that is your question but probably you will find something useful in *Codec classes, where are located implementations of few compression formats. Deflate files are just a type of compression files that you can use for storing files in your system. There are several others types, depending on your needs and tradeofs you are dealing (space or time for compressing). Globs I think are just a matching strategy to match files/folders together with regular expressions.. On 22 February 2012 19:29, Jay Vyas jayunit...@gmail.com wrote: Hi guys ! Im trying to understand the way globstatus / deflate files work in hdfs. I cant read them using the globStatus API in the hadoop FileSystem , from java. the specifics are here if anyone wants some easy stackoverflow points :) http://stackoverflow.com/questions/9400739/hadoop-globstatus-and-deflate-files On Wed, Feb 22, 2012 at 7:39 AM, Merto Mertek masmer...@gmail.com wrote: Hm.. I would try first to stop all the deamons wtih $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one of the slaves no deamons are running (jps). Maybe you could try to check if your conf on tasktrackers for the jobtracker is pointing to the right place (mapred-site.xml). Do you see any error in the jobtracker log too? On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote: Any update on the below issue. Thanks Adarsh Sharma wrote: Dear all, Today I am trying to configure hadoop-0.20.205.0 on a 4 node Cluster. When I start my cluster , all daemons got started except tasktracker, don't know why task tracker fails due to following error logs. Cluster is in private network.My /etc/hosts file contains all IP hostname resolution commands in all nodes. 2012-02-21 17:48:33,056 INFO org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered. 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker: Can not start task tracker because java.net.SocketException: Invalid argument at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.**ServerSocketChannelImpl.bind(** ServerSocketChannelImpl.java:**119) at sun.nio.ch.**ServerSocketAdaptor.bind(** ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.**bind(Server.java:225) at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:** 301) at org.apache.hadoop.ipc.Server.**init(Server.java:1483) at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506) at org.apache.hadoop.mapred.**TaskTracker.initialize(** TaskTracker.java:772) at org.apache.hadoop.mapred.**TaskTracker.init(** TaskTracker.java:1428) at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.** java:3673) Any comments on the issue. Thanks -- Jay Vyas MMSB/UCHC
Re: Splitting files on new line using hadoop fs
On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote: Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks. Could you explain more on the use case that demands such a requirement while hdfs copy itself? I am using pig's XMLLoader in piggybank to read xml files concatenated in a text file. But pig script doesn't work when file is big that causes hadoop to split the files. Any suggestions on how I can make it work? Below is my simple script that I would like to enhance, only if it starts working. Please note this works for small files. register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar' raw = LOAD '/examples/testfile5.txt using org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray); dump raw; --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Splitting files on new line using hadoop fs Sent: Feb 23, 2012 01:45 How can I copy large text files using hadoop fs such that split occurs based on blocks + new lines instead of blocks alone? Is there a way to do this? Regards Bejoy K S From handheld, Please excuse typos.
Re: Splitting files on new line using hadoop fs
Hi Mohit I'm not an expert in pig and it'd be better using the pig user group for pig specific queries. I'd try to help you with some basic trouble shooting of the same It sounds strange that pig's XML Loader can't load larger XML files that consists of multiple blocks. Or is it like, pig is not able to load the concatenated files that you are trying with? If that is the case then it could be because of some issues since you are just appending multiple xml file contents into a single file. Pig users can give you some workarounds how they are dealing with loading of small xml files that are stored efficiently. Regards Bejoy K S From handheld, Please excuse typos. -Original Message- From: Mohit Anchlia mohitanch...@gmail.com Date: Wed, 22 Feb 2012 12:29:26 To: common-user@hadoop.apache.org; bejoy.had...@gmail.com Subject: Re: Splitting files on new line using hadoop fs On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote: Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks. Could you explain more on the use case that demands such a requirement while hdfs copy itself? I am using pig's XMLLoader in piggybank to read xml files concatenated in a text file. But pig script doesn't work when file is big that causes hadoop to split the files. Any suggestions on how I can make it work? Below is my simple script that I would like to enhance, only if it starts working. Please note this works for small files. register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar' raw = LOAD '/examples/testfile5.txt using org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray); dump raw; --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Splitting files on new line using hadoop fs Sent: Feb 23, 2012 01:45 How can I copy large text files using hadoop fs such that split occurs based on blocks + new lines instead of blocks alone? Is there a way to do this? Regards Bejoy K S From handheld, Please excuse typos.
Re: OSX starting hadoop error
For those interested, you can prevent this error by setting the following in hadoop-env.sh: export HADOOP_OPTS=-Djava.security.krb5.realm= -Djava.security.krb5.kdc= On Jul 28, 2011, at 11:51 AM, Bryan Keller wrote: FYI, I logged a bug for this: https://issues.apache.org/jira/browse/HADOOP-7489 On Jul 28, 2011, at 11:36 AM, Bryan Keller wrote: I am also seeing this error upon startup. I am guessing you are using OS X Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to function properly despite this error showing up, though it is annoying. On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote: All When starting hadoop on OSX I am getting this error. is there a fix for it java[22373:1c03] Unable to load realm info from SCDynamicStore
Re: Backupnode in 1.0.0?
Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: Splitting files on new line using hadoop fs
Thanks I did post this question to that group. All xml document are separated by a new line so that shouldn't be the issue, I think. On Wed, Feb 22, 2012 at 12:44 PM, bejoy.had...@gmail.com wrote: ** Hi Mohit I'm not an expert in pig and it'd be better using the pig user group for pig specific queries. I'd try to help you with some basic trouble shooting of the same It sounds strange that pig's XML Loader can't load larger XML files that consists of multiple blocks. Or is it like, pig is not able to load the concatenated files that you are trying with? If that is the case then it could be because of some issues since you are just appending multiple xml file contents into a single file. Pig users can give you some workarounds how they are dealing with loading of small xml files that are stored efficiently. Regards Bejoy K S From handheld, Please excuse typos. -- *From: *Mohit Anchlia mohitanch...@gmail.com *Date: *Wed, 22 Feb 2012 12:29:26 -0800 *To: *common-user@hadoop.apache.org; bejoy.had...@gmail.com *Subject: *Re: Splitting files on new line using hadoop fs On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote: Hi Mohit AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks. Could you explain more on the use case that demands such a requirement while hdfs copy itself? I am using pig's XMLLoader in piggybank to read xml files concatenated in a text file. But pig script doesn't work when file is big that causes hadoop to split the files. Any suggestions on how I can make it work? Below is my simple script that I would like to enhance, only if it starts working. Please note this works for small files. register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar' raw = LOAD '/examples/testfile5.txt using org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray); dump raw; --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Splitting files on new line using hadoop fs Sent: Feb 23, 2012 01:45 How can I copy large text files using hadoop fs such that split occurs based on blocks + new lines instead of blocks alone? Is there a way to do this? Regards Bejoy K S From handheld, Please excuse typos.
RE: Dynamic changing of slaves
sounds like what you r looking for is a custom scheduler. along the line of !--property namemapred.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property-- obviously not the FairScheduler, but it could give u some idea -Original Message- From: theta glynisdso...@email.arizona.edu Sent: Wednesday, February 22, 2012 10:32am To: core-u...@hadoop.apache.org Subject: Dynamic changing of slaves Hi, I am working on a project which requires a setup as follows: One master with four slaves.However, when a map only program is run, the master dynamically selects the slave to run the map. For example, when the program is run for the first time, slave 2 is selected to run the map and reduce programs, and the output is stored on dfs. When the program is run the second time, slave 3 is selected and son on. I am currently using Hadoop 0.20.2 with Ubuntu 11.10. Any ideas on creating the setup as described above? Regards -- View this message in context: http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Backupnode in 1.0.0?
I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: Backupnode in 1.0.0?
By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy
Re: Backupnode in 1.0.0?
Check out this branch for the 0.22 version of Bigtop: https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/ However, I don't think BackupNode is what you want. It sounds like you want HA which is coming in (hopefully) 0.23.2 and is also available today in CDH4b1. -Joey On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote: By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Streaming job hanging
Streaming job just seems to be hanging 12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0% - On the admin page I see that it created 551 input split. Could somone suggest a way to find out what might be causing it to hang? I increased io.sort.mb to 200 MB. I am using 5 data nodes with 12 CPU, 96G RAM.
Re: Clickstream and video Analysis
http://www.wibidata.com/ Only it's not open source :) You can research the story by looking at http://www.youtube.com/watch?v=pUogubA9CEA to start Mark On Wed, Feb 22, 2012 at 11:30 PM, shreya@cognizant.com wrote: Hi, Could someone provide some links on Clickstream and video Analysis in Hadoop. Thanks and Regards, Shreya Pal This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Re: Clickstream and video Analysis
Tubemogul is one of them. On Thu, Feb 23, 2012 at 11:00 AM, shreya@cognizant.com wrote: Hi, Could someone provide some links on Clickstream and video Analysis in Hadoop. Thanks and Regards, Shreya Pal This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Problem in setting up Hadoop Multi-Node Cluster using a ROUTER
Hello everyone. I have facing a problem on installation of hadoop multinode cluster when I connecte all the linux box througth a router. I have succeeded in installing single node cluster and multi node cluster over internet(LAN). I want to test the multi node cluster by establishing private network between all the nodes through a router which assigns private IP address. I am keeping the same configuration which i used to test multi node cluster over internet(LAN). Error: With the router setup wen i run word count example, the program gets hanged in the middle. it will complete map 100% and will get stop when reduce reaches around 20%. I gave namenode format many times and tried to run the program but still no luck :(. one unlucky thing is there is no error in log files of both master and slave machine. Please let me know what could be the problem. Quick response is very much appreciated. Thanks Regards, Guruprasad
Re: Backupnode in 1.0.0?
Joey, Can you please answer the question from in the context of Apache releases. Not sure if CDH4b1 needs to be mentioned in the context of this mailing list. Regards, Suresh On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote: Check out this branch for the 0.22 version of Bigtop: https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/ However, I don't think BackupNode is what you want. It sounds like you want HA which is coming in (hopefully) 0.23.2 and is also available today in CDH4b1. -Joey On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote: By the way, I don't see anything 0.22 based in the bigtop repos. Thanks -jeremy On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote: I guess I thought that backupnode would provide some level of namenode redundancy. Perhaps I don't fully understand. I'll check out Bigtop. I looked at it a while ago and forgot about it. Thanks -jeremy On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote: Check out the Apache Bigtop project. I believe they have 0.22 RPMs. Out of curiosity, why are you interested in BackupNode? -Joey Sent from my iPhone On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote: Any possibility of getting spec files to create packages for 0.22? Thanks -jeremy On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote: BackupNode is major functionality with change in required in RPC protocols, configuration etc. Hence it will not be available in bug fix release 1.0.1. It is also unlikely to be not available on minor releases in 1.x release streams. Regards, Suresh On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote: It looks as if backupnode isn't supported in 1.0.0? Any chances it's in 1.0.1? Thanks -jeremy -- Joseph Echeverria Cloudera, Inc. 443.305.9434
RE: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER
Hi Guruprasad, Do you have the valid IP--hostname setting in /etc/hosts so that each nodes can be accessed by hostname? I guess the configuration over public network can work may because it can get hostname resolved by DNS. Thanks, Junping -Original Message- From: Guruprasad B [mailto:guruprasadk...@gmail.com] Sent: Thursday, February 23, 2012 2:43 PM To: core-u...@hadoop.apache.org Cc: Robin Mueller-Bady Subject: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER Hello everyone. I have facing a problem on installation of hadoop multinode cluster when I connecte all the linux box througth a router. I have succeeded in installing single node cluster and multi node cluster over internet(LAN). I want to test the multi node cluster by establishing private network between all the nodes through a router which assigns private IP address. I am keeping the same configuration which i used to test multi node cluster over internet(LAN). Error: With the router setup wen i run word count example, the program gets hanged in the middle. it will complete map 100% and will get stop when reduce reaches around 20%. I gave namenode format many times and tried to run the program but still no luck :(. one unlucky thing is there is no error in log files of both master and slave machine. Please let me know what could be the problem. Quick response is very much appreciated. Thanks Regards, Guruprasad
Re: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER
I'm able to make it work with a simple router without issues (Works just as well as vmnet8). Beyond Junping's point, also check that you don't have a firewall acting between your nodes. On Thu, Feb 23, 2012 at 12:45 PM, Jun Ping Du j...@vmware.com wrote: Hi Guruprasad, Do you have the valid IP--hostname setting in /etc/hosts so that each nodes can be accessed by hostname? I guess the configuration over public network can work may because it can get hostname resolved by DNS. Thanks, Junping -Original Message- From: Guruprasad B [mailto:guruprasadk...@gmail.com] Sent: Thursday, February 23, 2012 2:43 PM To: core-u...@hadoop.apache.org Cc: Robin Mueller-Bady Subject: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER Hello everyone. I have facing a problem on installation of hadoop multinode cluster when I connecte all the linux box througth a router. I have succeeded in installing single node cluster and multi node cluster over internet(LAN). I want to test the multi node cluster by establishing private network between all the nodes through a router which assigns private IP address. I am keeping the same configuration which i used to test multi node cluster over internet(LAN). Error: With the router setup wen i run word count example, the program gets hanged in the middle. it will complete map 100% and will get stop when reduce reaches around 20%. I gave namenode format many times and tried to run the program but still no luck :(. one unlucky thing is there is no error in log files of both master and slave machine. Please let me know what could be the problem. Quick response is very much appreciated. Thanks Regards, Guruprasad -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about