Re: Tasktracker fails

2012-02-22 Thread Adarsh Sharma

Any update on the below issue.

Thanks

Adarsh Sharma wrote:

Dear all,

Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
When I start my cluster , all daemons got started except tasktracker, 
don't know why task tracker fails due to following error logs.


Cluster is in private network.My /etc/hosts file contains all IP 
hostname resolution commands in all  nodes.


2012-02-21 17:48:33,056 INFO 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
TaskTrackerMetrics registered.
2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.TaskTracker: 
Can not start task tracker because java.net.SocketException: Invalid 
argument

   at sun.nio.ch.Net.bind(Native Method)
   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
   at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)

   at org.apache.hadoop.ipc.Server.bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301)
   at org.apache.hadoop.ipc.Server.init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
   at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:772)
   at 
org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1428)
   at 
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3673)


Any comments on the issue.


Thanks





Hadoop MR jobs failed: Owner 'uid' for path */jobcache/job_*/attempt_*/output/file.out.index did not match expected owner 'username'

2012-02-22 Thread Dirk Meister
Hello Hadoop mailinglist,

we have problems running a Hadoop M/R Job on HDFS. It is a 2-node test
system using 0.20.203 using a PIG script.

The map tasks run through, but most job attempt outputs of one of the
machines are rejected by the reducer and rescheduled. This is the
stack trace/error message:

Map output lost, rescheduling:
getMapOutput(attempt_201202210928_0005_m_08_0,129) failed :
java.io.IOException: Owner 'MYUID' for path
LOCAL_PATH/jobcache/job_201202210928_0005/attempt_201202210928_0005_m_08_0/output/file.out
did not match expected owner 'MYUSERNAME'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177)
at 
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:110)
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3837)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

The userid (as number) is correct. but it seem that there are problems
to map the number back to a valid username. The machines use
Scientific Linux 6.1 with NIS/yp for usernames/passwords.
However, as far as I understand the source, the Owner username is
obtained by running ls -ld (class RawFileSystemStatus). Running ls
-l locally on the files returns a correctly resolved username.
The expected owner is obtained from the process system property
user.name (class TaskController).

After around 2k failed attempts the pig task is aborted.

* Can anyone help me or give me a hint what went wrong here.
* Is it possible to disable these security checks via a configuration?

I would really appreciate any help. Thank you,
Dirk


Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen
Observe about 50% iowait before even starting clients - that is when 
there is actually no load from clients on the system. So only internal 
stuff in HBase/HDFS can cause this - HBase compaction? HDFS?


Regards, Per Steffensen

Per Steffensen skrev:

Hi

We have a system a.o. with a HBase cluster and a HDFS cluster 
(primarily for HBase persistence). Depending on the environment we 
have between 3 and 8 machine running a HBase RegionServer and a HDFS 
DataNode. OS is Ubuntu 10.04. On those machine we see very high iowait 
and very little real usage of the CPU, and unexpected low throughput 
(HBase creates, updates, reads and short scans). We do not get more 
throughput by putting more parallel load from the HBase clients on the 
HBase servers, so it is a real iowait problem. Any idea what might 
be wrong, and what we can do to improve throughput and lower iowait.


Regards, Per Steffensen




Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen

Per Steffensen skrev:
Observe about 50% iowait before even starting clients - that is when 
there is actually no load from clients on the system. So only 
internal stuff in HBase/HDFS can cause this - HBase compaction? HDFS?
Ahh ok, that was only for half a minute after restart. So basically down 
to 100% idle when no load from clients.


Regards, Per Steffensen

Per Steffensen skrev:

Hi

We have a system a.o. with a HBase cluster and a HDFS cluster 
(primarily for HBase persistence). Depending on the environment we 
have between 3 and 8 machine running a HBase RegionServer and a HDFS 
DataNode. OS is Ubuntu 10.04. On those machine we see very high 
iowait and very little real usage of the CPU, and unexpected low 
throughput (HBase creates, updates, reads and short scans). We do not 
get more throughput by putting more parallel load from the HBase 
clients on the HBase servers, so it is a real iowait problem. Any 
idea what might be wrong, and what we can do to improve throughput 
and lower iowait.


Regards, Per Steffensen







Security at file level in Hadoop

2012-02-22 Thread Shreya.Pal
Hi





I want to implement security at file level in Hadoop, essentially
restricting certain data to certain users.

Ex - File A can be accessed only by a user X

File B can be accessed by only user X and user Y



Is this possible in Hadoop, how do we do it? At what level are these
permissions applied (before copying to HDFS or after putting in HDFS)?

When the file gets replicated does it retain these permissions?



Thanks

Shreya


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, printing 
or copying of this email or any action taken in reliance on this e-mail is 
strictly prohibited and may be unlawful.


Security at file level in Hadoop

2012-02-22 Thread Shreya.Pal


Hi





I want to implement security at file level in Hadoop, essentially
restricting certain data to certain users.

Ex - File A can be accessed only by a user X

File B can be accessed by only user X and user Y



Is this possible in Hadoop, how do we do it? At what level are these
permissions applied (before copying to HDFS or after putting in HDFS)?

When the file gets replicated does it retain these permissions?



Thanks

Shreya


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding, printing 
or copying of this email or any action taken in reliance on this e-mail is 
strictly prohibited and may be unlawful.


Re: Security at file level in Hadoop

2012-02-22 Thread Ben Smithers
Hi Shreya,

A permissions guide for HDFS is available at:
http://hadoop.apache.org/common/docs/current/hdfs_permissions_guide.html

The permissions system is much the same as unix-like systems with users and
groups. Though I have not worked with this, I think it is likely that all
permissions will need to be applied after putting files into HDFS.

Hope that helps,

Ben

On 22 February 2012 10:41, shreya@cognizant.com wrote:



 Hi





 I want to implement security at file level in Hadoop, essentially
 restricting certain data to certain users.

 Ex - File A can be accessed only by a user X

 File B can be accessed by only user X and user Y



 Is this possible in Hadoop, how do we do it? At what level are these
 permissions applied (before copying to HDFS or after putting in HDFS)?

 When the file gets replicated does it retain these permissions?



 Thanks

 Shreya


 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information.
 If you are not the intended recipient, please contact the sender by reply
 e-mail and destroy all copies of the original message.
 Any unauthorized review, use, disclosure, dissemination, forwarding,
 printing or copying of this email or any action taken in reliance on this
 e-mail is strictly prohibited and may be unlawful.



Re: Security at file level in Hadoop

2012-02-22 Thread praveenesh kumar
You can probably use hadoop fs - chmod permission filename as suggested
above. You can provide r/w permissions as you provide for general unix
files.
Can you please share your experiences on this thing ?

Thanks,
Praveenesh


On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers
smithers@googlemail.comwrote:

 Hi Shreya,

 A permissions guide for HDFS is available at:
 http://hadoop.apache.org/common/docs/current/hdfs_permissions_guide.html

 The permissions system is much the same as unix-like systems with users and
 groups. Though I have not worked with this, I think it is likely that all
 permissions will need to be applied after putting files into HDFS.

 Hope that helps,

 Ben

 On 22 February 2012 10:41, shreya@cognizant.com wrote:

 
 
  Hi
 
 
 
 
 
  I want to implement security at file level in Hadoop, essentially
  restricting certain data to certain users.
 
  Ex - File A can be accessed only by a user X
 
  File B can be accessed by only user X and user Y
 
 
 
  Is this possible in Hadoop, how do we do it? At what level are these
  permissions applied (before copying to HDFS or after putting in HDFS)?
 
  When the file gets replicated does it retain these permissions?
 
 
 
  Thanks
 
  Shreya
 
 
  This e-mail and any files transmitted with it are for the sole use of the
  intended recipient(s) and may contain confidential and privileged
  information.
  If you are not the intended recipient, please contact the sender by reply
  e-mail and destroy all copies of the original message.
  Any unauthorized review, use, disclosure, dissemination, forwarding,
  printing or copying of this email or any action taken in reliance on this
  e-mail is strictly prohibited and may be unlawful.
 



Re: Optimized Hadoop

2012-02-22 Thread Dieter Plaetinck
Great work folks! Very interesting.

PS: did you notice if you google for hanborq or HDH it's very hard to find 
your website, hanborq.com ?

Dieter

On Tue, 21 Feb 2012 02:17:31 +0800
Schubert Zhang zson...@gmail.com wrote:

 We just update the slides of this improvements:
 http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
 
 Updates:
 (1) modified some describes to make things more clear and accuracy.
 (2) add some benchmarks to make sense.
 
 On Sat, Feb 18, 2012 at 11:12 PM, Anty anty@gmail.com wrote:
 
 
 
  On Fri, Feb 17, 2012 at 3:27 AM, Todd Lipcon t...@cloudera.com wrote:
 
  Hey Schubert,
 
  Looking at the code on github, it looks like your rewritten shuffle is
  in fact just a backport of the shuffle from MR2. I didn't look closely
 
 
  additionally, the rewritten shuffle in MR2 has some bugs, which harm the
  overall performance, for which I have already file a jira to report this,
  with a patch available.
  MAPREDUCE-3685 https://issues.apache.org/jira/browse/MAPREDUCE-3685
 
 
 
  - are there any distinguishing factors?
  Also, the OOB heartbeat and adaptive heartbeat code seems to be the
  same as what's in 1.0?
 
  -Todd
 
  On Thu, Feb 16, 2012 at 9:44 AM, Schubert Zhang zson...@gmail.com
  wrote:
   Here is the presentation to describe our job,
  
  http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
   Wellcome to give your advises.
   It's just a little step, and we are continue to do more improvements,
  thanks
   for your help.
  
  
  
  
   On Thu, Feb 16, 2012 at 11:01 PM, Anty anty@gmail.com wrote:
  
   Hi: Guys
  We just deliver a optimized hadoop , if you are interested, Pls
   refer to https://github.com/hanborq/hadoop
  
   --
   Best Regards
   Anty Rao
  
  
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera
 
 
 
 
  --
  Best Regards
  Anty Rao
 



mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread sangroya
Hello,

Could someone please help me to understand these configuration parameters in
depth.

mapred.map.tasks and mapred.reduce.tasks

It is mentioned that default value of these parameters is 2 and 1.

*What does it mean?*

Does it mean 2 maps and 1 reduce per node.

Does it mean 2 maps and 1 reduce in total (for the cluster). Or

Does it mean 2 maps and 1 reduce per Job.

Can we change maps and reduce for default example Jobs such as Wordcount
etc. too?

At the same time, I believe that total number of maps are dependent upon
input data size?


Please help me understand these two parameters clearly.

Thanks in advance,
Amit

-
Sangroya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread Harsh J
Amit,

On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote:
 Hello,

 Could someone please help me to understand these configuration parameters in
 depth.

 mapred.map.tasks and mapred.reduce.tasks

 It is mentioned that default value of these parameters is 2 and 1.

 *What does it mean?*

 Does it mean 2 maps and 1 reduce per node.

 Does it mean 2 maps and 1 reduce in total (for the cluster). Or

 Does it mean 2 maps and 1 reduce per Job.

These are set per-job, and therefore mean 2 maps and 1 reducer for the
single job you notice the value in.

 Can we change maps and reduce for default example Jobs such as Wordcount
 etc. too?

You can tweak the # of reducers at will. With the default
HashPartitioner, scaling reducers is easy by just increasing the #s.

 At the same time, I believe that total number of maps are dependent upon
 input data size?

Yes maps are dependent on the # of input files and their size (if they
are splittable). At minimum, with FileInputFormat derivatives, you
will have at least one map per file. You can have multiple maps per
file if they extend beyond a single block and can be split.

For some more info, take a look at
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread praveenesh kumar
If I am correct :

For setting mappers/node --- mapred.tasktracker.map.tasks.maximum
For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum

For setting mappers/job  mapred.map.tasks (applicable for whole cluster)
For setting reducers/job  mapred.reduce.tasks(same)


You can change these values in your M/R code using Job / configurattion
object


Thanks,
Praveenesh


On Wed, Feb 22, 2012 at 5:08 PM, sangroya sangroyaa...@gmail.com wrote:

 Hello,

 Could someone please help me to understand these configuration parameters
 in
 depth.

 mapred.map.tasks and mapred.reduce.tasks

 It is mentioned that default value of these parameters is 2 and 1.

 *What does it mean?*

 Does it mean 2 maps and 1 reduce per node.

 Does it mean 2 maps and 1 reduce in total (for the cluster). Or

 Does it mean 2 maps and 1 reduce per Job.

 Can we change maps and reduce for default example Jobs such as Wordcount
 etc. too?

 At the same time, I believe that total number of maps are dependent upon
 input data size?


 Please help me understand these two parameters clearly.

 Thanks in advance,
 Amit

 -
 Sangroya
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/mapred-map-tasks-and-mapred-reduce-tasks-parameter-meaning-tp3766224p3766224.html
 Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: Changing into Replication factor

2012-02-22 Thread Harsh J
Hi,

You need to use hadoop fs -setrep to change replication of existing
files. See the manual at
http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html#setrep on
how to use it.

On Wed, Feb 22, 2012 at 1:03 PM, hadoop hive hadooph...@gmail.com wrote:
 HI Folks,

 Rite now i m having replication factor 2, but now i want to make it three
 for sum tables so how can i do that for specific tables, so that whenever
 the data would be loaded in those tables it can automatically replicated
 into three nodes.

 Or i need to replicate for all the tables.

 and how can i do that by simply changing the parameter to 3 and run
 -*refreshNodes
 *or there is another way to perform that.


 Regards
 hadoopHive



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about


Re: Security at file level in Hadoop

2012-02-22 Thread Joey Echeverria
HDFS supports POSIX style file and directory permissions (read, write, execute) 
for the owner, group and world. You can change the permissions with hadoop fs 
-chmod permissions path

-Joey


On Feb 22, 2012, at 5:32, shreya@cognizant.com wrote:

 Hi
 
 
 
 
 
 I want to implement security at file level in Hadoop, essentially
 restricting certain data to certain users.
 
 Ex - File A can be accessed only by a user X
 
 File B can be accessed by only user X and user Y
 
 
 
 Is this possible in Hadoop, how do we do it? At what level are these
 permissions applied (before copying to HDFS or after putting in HDFS)?
 
 When the file gets replicated does it retain these permissions?
 
 
 
 Thanks
 
 Shreya
 
 
 This e-mail and any files transmitted with it are for the sole use of the 
 intended recipient(s) and may contain confidential and privileged information.
 If you are not the intended recipient, please contact the sender by reply 
 e-mail and destroy all copies of the original message.
 Any unauthorized review, use, disclosure, dissemination, forwarding, printing 
 or copying of this email or any action taken in reliance on this e-mail is 
 strictly prohibited and may be unlawful.


Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
Hm.. I would try first to stop all the deamons wtih
$haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
of the slaves no deamons are running (jps). Maybe you could try to check if
your conf on tasktrackers for the jobtracker is pointing to the right place
(mapred-site.xml). Do you see any error in the jobtracker log too?


On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com wrote:

 Any update on the below issue.

 Thanks


 Adarsh Sharma wrote:

 Dear all,

 Today I am trying  to configure hadoop-0.20.205.0 on a 4  node Cluster.
 When I start my cluster , all daemons got started except tasktracker,
 don't know why task tracker fails due to following error logs.

 Cluster is in private network.My /etc/hosts file contains all IP hostname
 resolution commands in all  nodes.

 2012-02-21 17:48:33,056 INFO 
 org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
 MBean for source TaskTrackerMetrics registered.
 2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
 Can not start task tracker because java.net.SocketException: Invalid
 argument
   at sun.nio.ch.Net.bind(Native Method)
   at sun.nio.ch.**ServerSocketChannelImpl.bind(**
 ServerSocketChannelImpl.java:**119)
   at sun.nio.ch.**ServerSocketAdaptor.bind(**
 ServerSocketAdaptor.java:59)
   at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
   at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
 301)
   at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
   at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
   at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
   at org.apache.hadoop.mapred.**TaskTracker.initialize(**
 TaskTracker.java:772)
   at org.apache.hadoop.mapred.**TaskTracker.init(**
 TaskTracker.java:1428)
   at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
 java:3673)

 Any comments on the issue.


 Thanks





ClassNotFoundException: -libjars not working?

2012-02-22 Thread Ioan Eugen Stan

Hello,

I'm trying to run a map-reduce job and I get ClassNotFoundException, but 
I have the class submitted with -libjars. What's wrong with how I do 
things? Please help.


I'm running hadoop-0.20.2-cdh3u1, and I have everithing on the -libjars 
line. The job is submitted via a java app like:


 exec /usr/lib/jvm/java-6-sun/bin/java -Dproc_jar -Xmx200m -server 
-Dhadoop.log.dir=/opt/ui/var/log/mailsearch -Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hbase 
-Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml 
-classpath 
'/usr/lib/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/apache-log4j-extras-1.1.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib

/jcl-over-slf4j-1.6.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/lib/jsp-2.1/jsp-api-2.1.jar:/usr/share/mailbox-convertor/lib/*:/usr/lib/hadoop/contrib/capacity-scheduler/hadoop-capacity-scheduler-0.20.2-cdh3u1.jar:/usr/lib/hbase/lib/hadoop-lzo-0.4.13.jar:/usr/lib/hbase/hbase.jar:/etc/hbase/conf:/usr/lib/hbase/lib:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/hadoop/contrib
/capacity-scheduler/hadoop-capacity-scheduler-0.20.2-cdh3u1.jar:/usr/lib/hbase/lib/hadoop-lzo-0.4.13.jar:/usr/lib/hbase/hbase.jar:/etc/hbase/conf:/usr/lib/hbase/lib:/usr/lib/zookeeper/zookeeper.jar' 
org.apache.hadoop.util.RunJar 
/usr/share/mailbox-convertor/mailbox-convertor-0.1-SNAPSHOT.jar 
-libjars=/usr/share/mailbox-convertor/lib/antlr-2.7.7.jar,/usr/share/mailbox-convertor/lib/aopalliance-1.0.jar,/usr/share/mailbox-convertor/lib/asm-3.1.jar,/usr/share/mailbox-convertor/lib/backport-util-concurrent-3.1.jar,/usr/share/mailbox-convertor/lib/cglib-2.2.jar,/usr/share/mailbox-convertor/lib/hadoop-ant-3.0-u1.pom,/usr/share/mailbox-convertor/lib/speed4j-0.9.jar,/usr/share/mailbox-convertor/lib/jamm-0.2.2.jar,/usr/share/mailbox-convertor/lib/uuid-3.2.0.jar,/usr/share/mailbox-convertor/lib/high-scale-lib-1.1.1.jar,/usr/share/mailbox-convertor/lib/jsr305-1.3.9.jar,/usr/share/mailbox-convertor/lib/guava-11.0.1.jar,/usr/share/mailbox-convertor/lib/protobuf-java-2.4.0a.jar,/usr/share/mailbox-convertor/lib/concurrentlinkedhashmap-lru-1.1.jar,/usr/share/mailbox-convertor/lib/json-simple-1.1.jar,/usr/share/mailbox-convertor/lib/itext-2.1.5.jar,/usr/share/mailbox-convertor/lib/jmxtools-1.2.1.jar,/usr/share/mailbox-convertor/lib/jersey-client-1.4.jar,/usr/share/mailbox-converto

r/lib/jersey-core-1.4.jar,/usr/share/mailbox-convertor/lib/jersey-json-1.4.jar,/usr/share/mailbox-convertor/lib/jersey-server-1.4.jar,/usr/share/mailbox-convertor/lib/jmxri-1.2.1.jar,/usr/share/mailbox-convertor/lib/jaxb-impl-2.1.12.jar,/usr/share/mailbox-convertor/lib/xstream-1.2.2.jar,/usr/share/mailbox-convertor/lib/commons-metrics-1.3.jar,/usr/share/mailbox-convertor/lib/commons-monitoring-2.9.1.jar,/usr/share/mailbox-convertor/lib/html-utils-2.3.jar,/usr/share/mailbox-convertor/lib/mailstore-client-1.0.3.jar,/usr/share/mailbox-convertor/lib/newsearch-commons-1.0.28eo-SNAPSHOT.jar,/usr/share/mailbox-convertor/lib/spring-hbase-gateway-1.0.15.jar,/usr/share/mailbox-convertor/lib/newsearch-deleter-filtering-1.0.11.jar,/usr/share/mailbox-convertor/lib/newsearch-indexing-1.0.28eo-SNAPSHOT.jar,/usr/share/mailbox-convertor/lib/ums-commons-2.1.4.jar,/usr/share/mailbox-convertor/lib/trinity-config-1.2.0.jar,/usr/share/mailbox-convertor/lib/trinity-message-6.7.0.jar,/usr/share/mail

Re: Security at file level in Hadoop

2012-02-22 Thread Praveen Sripati
According to this (http://goo.gl/rfwy4)

 Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the
user and groups of the running process.

How does this work now?

Praveen

On Wed, Feb 22, 2012 at 6:03 PM, Joey Echeverria j...@cloudera.com wrote:

 HDFS supports POSIX style file and directory permissions (read, write,
 execute) for the owner, group and world. You can change the permissions
 with hadoop fs -chmod permissions path

 -Joey


 On Feb 22, 2012, at 5:32, shreya@cognizant.com wrote:

  Hi
 
 
 
 
 
  I want to implement security at file level in Hadoop, essentially
  restricting certain data to certain users.
 
  Ex - File A can be accessed only by a user X
 
  File B can be accessed by only user X and user Y
 
 
 
  Is this possible in Hadoop, how do we do it? At what level are these
  permissions applied (before copying to HDFS or after putting in HDFS)?
 
  When the file gets replicated does it retain these permissions?
 
 
 
  Thanks
 
  Shreya
 
 
  This e-mail and any files transmitted with it are for the sole use of
 the intended recipient(s) and may contain confidential and privileged
 information.
  If you are not the intended recipient, please contact the sender by
 reply e-mail and destroy all copies of the original message.
  Any unauthorized review, use, disclosure, dissemination, forwarding,
 printing or copying of this email or any action taken in reliance on this
 e-mail is strictly prohibited and may be unlawful.



Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen

It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in 1.0.1?

Thanks
-jeremy

Runtime Comparison of Hadoop 0.21.0 and 1.0.1

2012-02-22 Thread Geoffry Roberts
All,

I saw the announcement that hadoop 1.0.1 micro release was available.  I
have been waiting for this because I need the MutipleOutputs capability,
which 1.0.0 didn't support.  I grabbed a copy of the release candidate.  I
was happy to see that the directory structure once again conforms (mostly)
to the older releases as opposed to what was in the 1.0.0 release.

I did a comparison of run times between 1.0.1 and 0.21.0, which is my
production version.  It seems that 1.0.1 runs about four times slower than
0.21.0.

With the same code, same hardware, same configuration, and the same data
set; end to end times are:

0.21.0 =   8.83 minutes.
1.0.1   = 30.26 minutes.

Is this a known condition?

Thanks

-- 
Geoffry Roberts


Re: Splitting files on new line using hadoop fs

2012-02-22 Thread bejoy . hadoop
Hi Mohit
AFAIK there is no default mechanism available for the same in hadoop. 
File is split into blocks just based on the configured block size during hdfs 
copy. While processing the file using Mapreduce the record reader takes care of 
the new lines even if a line spans across multiple blocks. 

Could you explain more on the use case that demands such a requirement while 
hdfs copy itself?

--Original Message--
From: Mohit Anchlia
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Splitting files on new line using hadoop fs
Sent: Feb 23, 2012 01:45

How can I copy large text files using hadoop fs such that split occurs
based on blocks + new lines instead of blocks alone? Is there a way to do
this?



Regards
Bejoy K S

From handheld, Please excuse typos.


Re: Tasktracker fails

2012-02-22 Thread Merto Mertek
I do not know how the distribution and splitting of deflate files exactly
works if that is your question but probably you will find something useful
in *Codec classes, where are located implementations of few compression
formats. Deflate files are just a type of compression files that you can
use for storing files in your system. There are several others types,
depending on your needs and tradeofs you are dealing (space or time for
compressing).

 Globs I think are just a matching strategy to match files/folders together
with regular expressions..


On 22 February 2012 19:29, Jay Vyas jayunit...@gmail.com wrote:

 Hi guys !

 Im trying to understand the way globstatus / deflate files work in hdfs.  I
 cant read them using the globStatus API in the hadoop FileSystem , from
 java.  the specifics are here if anyone wants some easy stackoverflow
 points :)


 http://stackoverflow.com/questions/9400739/hadoop-globstatus-and-deflate-files

 On Wed, Feb 22, 2012 at 7:39 AM, Merto Mertek masmer...@gmail.com wrote:

  Hm.. I would try first to stop all the deamons wtih
  $haddop_home/bin/stop-all.sh. Afterwards check that on the master and one
  of the slaves no deamons are running (jps). Maybe you could try to check
 if
  your conf on tasktrackers for the jobtracker is pointing to the right
 place
  (mapred-site.xml). Do you see any error in the jobtracker log too?
 
 
  On 22 February 2012 09:44, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:
 
   Any update on the below issue.
  
   Thanks
  
  
   Adarsh Sharma wrote:
  
   Dear all,
  
   Today I am trying  to configure hadoop-0.20.205.0 on a 4  node
 Cluster.
   When I start my cluster , all daemons got started except tasktracker,
   don't know why task tracker fails due to following error logs.
  
   Cluster is in private network.My /etc/hosts file contains all IP
  hostname
   resolution commands in all  nodes.
  
   2012-02-21 17:48:33,056 INFO
  org.apache.hadoop.metrics2.**impl.MetricsSourceAdapter:
   MBean for source TaskTrackerMetrics registered.
   2012-02-21 17:48:33,094 ERROR org.apache.hadoop.mapred.**TaskTracker:
   Can not start task tracker because java.net.SocketException: Invalid
   argument
 at sun.nio.ch.Net.bind(Native Method)
 at sun.nio.ch.**ServerSocketChannelImpl.bind(**
   ServerSocketChannelImpl.java:**119)
 at sun.nio.ch.**ServerSocketAdaptor.bind(**
   ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server.**bind(Server.java:225)
 at org.apache.hadoop.ipc.Server$**Listener.init(Server.java:**
   301)
 at org.apache.hadoop.ipc.Server.**init(Server.java:1483)
 at org.apache.hadoop.ipc.RPC$**Server.init(RPC.java:545)
 at org.apache.hadoop.ipc.RPC.**getServer(RPC.java:506)
 at org.apache.hadoop.mapred.**TaskTracker.initialize(**
   TaskTracker.java:772)
 at org.apache.hadoop.mapred.**TaskTracker.init(**
   TaskTracker.java:1428)
 at org.apache.hadoop.mapred.**TaskTracker.main(TaskTracker.**
   java:3673)
  
   Any comments on the issue.
  
  
   Thanks
  
  
  
 



 --
 Jay Vyas
 MMSB/UCHC



Re: Splitting files on new line using hadoop fs

2012-02-22 Thread Mohit Anchlia
On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote:

 Hi Mohit
AFAIK there is no default mechanism available for the same in
 hadoop. File is split into blocks just based on the configured block size
 during hdfs copy. While processing the file using Mapreduce the record
 reader takes care of the new lines even if a line spans across multiple
 blocks.

 Could you explain more on the use case that demands such a requirement
 while hdfs copy itself?


 I am using pig's XMLLoader in piggybank to read xml files concatenated in
a text file. But pig script doesn't work when file is big that causes
hadoop to split the files.

Any suggestions on how I can make it work? Below is my simple script that I
would like to enhance, only if it starts working. Please note this works
for small files.


register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'

raw = LOAD '/examples/testfile5.txt using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);

dump raw;


 --Original Message--
 From: Mohit Anchlia
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: Splitting files on new line using hadoop fs
 Sent: Feb 23, 2012 01:45

 How can I copy large text files using hadoop fs such that split occurs
 based on blocks + new lines instead of blocks alone? Is there a way to do
 this?



 Regards
 Bejoy K S

 From handheld, Please excuse typos.



Re: Splitting files on new line using hadoop fs

2012-02-22 Thread bejoy . hadoop
Hi Mohit
I'm not an expert in pig and it'd be better using the pig user group 
for pig specific queries. I'd try to help you with some basic trouble shooting 
of the same

It sounds strange that pig's XML Loader can't load larger XML files that 
consists of multiple blocks. Or is it like, pig is not able to load the 
concatenated files that you are trying with? If that is the case then it could 
be because of some issues since you are just appending multiple xml file 
contents into a single file.

Pig users can give you some workarounds how they are dealing with loading of 
small xml files that are stored efficiently.

Regards
Bejoy K S

From handheld, Please excuse typos.

-Original Message-
From: Mohit Anchlia mohitanch...@gmail.com
Date: Wed, 22 Feb 2012 12:29:26 
To: common-user@hadoop.apache.org; bejoy.had...@gmail.com
Subject: Re: Splitting files on new line using hadoop fs

On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote:

 Hi Mohit
AFAIK there is no default mechanism available for the same in
 hadoop. File is split into blocks just based on the configured block size
 during hdfs copy. While processing the file using Mapreduce the record
 reader takes care of the new lines even if a line spans across multiple
 blocks.

 Could you explain more on the use case that demands such a requirement
 while hdfs copy itself?


 I am using pig's XMLLoader in piggybank to read xml files concatenated in
a text file. But pig script doesn't work when file is big that causes
hadoop to split the files.

Any suggestions on how I can make it work? Below is my simple script that I
would like to enhance, only if it starts working. Please note this works
for small files.


register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'

raw = LOAD '/examples/testfile5.txt using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);

dump raw;


 --Original Message--
 From: Mohit Anchlia
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: Splitting files on new line using hadoop fs
 Sent: Feb 23, 2012 01:45

 How can I copy large text files using hadoop fs such that split occurs
 based on blocks + new lines instead of blocks alone? Is there a way to do
 this?



 Regards
 Bejoy K S

 From handheld, Please excuse typos.




Re: OSX starting hadoop error

2012-02-22 Thread Bryan Keller
For those interested, you can prevent this error by setting the following in 
hadoop-env.sh:

export HADOOP_OPTS=-Djava.security.krb5.realm= -Djava.security.krb5.kdc=


On Jul 28, 2011, at 11:51 AM, Bryan Keller wrote:

 FYI, I logged a bug for this:
 https://issues.apache.org/jira/browse/HADOOP-7489
 
 On Jul 28, 2011, at 11:36 AM, Bryan Keller wrote:
 
 I am also seeing this error upon startup. I am guessing you are using OS X 
 Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to 
 function properly despite this error showing up, though it is annoying.
 
 
 On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote:
 
 All
 When starting hadoop on OSX I am getting this error. is there a fix for it
 
 java[22373:1c03] Unable to load realm info from SCDynamicStore
 
 



Re: Backupnode in 1.0.0?

2012-02-22 Thread Joey Echeverria
Check out the Apache Bigtop project. I believe they have 0.22 RPMs. 

Out of curiosity, why are you interested in BackupNode?

-Joey

Sent from my iPhone

On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:

 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC protocols,
 configuration etc. Hence it will not be available in bug fix release 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 


Re: Splitting files on new line using hadoop fs

2012-02-22 Thread Mohit Anchlia
Thanks I did post this question to that group. All xml document are
separated by a new line so that shouldn't be the issue, I think.

On Wed, Feb 22, 2012 at 12:44 PM, bejoy.had...@gmail.com wrote:

 **
 Hi Mohit
 I'm not an expert in pig and it'd be better using the pig user group for
 pig specific queries. I'd try to help you with some basic trouble shooting
 of the same

 It sounds strange that pig's XML Loader can't load larger XML files that
 consists of multiple blocks. Or is it like, pig is not able to load the
 concatenated files that you are trying with? If that is the case then it
 could be because of some issues since you are just appending multiple xml
 file contents into a single file.

 Pig users can give you some workarounds how they are dealing with loading
 of small xml files that are stored efficiently.

 Regards
 Bejoy K S

 From handheld, Please excuse typos.
 --
 *From: *Mohit Anchlia mohitanch...@gmail.com
 *Date: *Wed, 22 Feb 2012 12:29:26 -0800
 *To: *common-user@hadoop.apache.org; bejoy.had...@gmail.com
 *Subject: *Re: Splitting files on new line using hadoop fs


 On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote:

 Hi Mohit
AFAIK there is no default mechanism available for the same in
 hadoop. File is split into blocks just based on the configured block size
 during hdfs copy. While processing the file using Mapreduce the record
 reader takes care of the new lines even if a line spans across multiple
 blocks.

 Could you explain more on the use case that demands such a requirement
 while hdfs copy itself?


  I am using pig's XMLLoader in piggybank to read xml files concatenated
 in a text file. But pig script doesn't work when file is big that causes
 hadoop to split the files.

 Any suggestions on how I can make it work? Below is my simple script that
 I would like to enhance, only if it starts working. Please note this works
 for small files.


 register '/root/pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'

 raw = LOAD '/examples/testfile5.txt using
 org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);

 dump raw;


 --Original Message--
 From: Mohit Anchlia
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: Splitting files on new line using hadoop fs
 Sent: Feb 23, 2012 01:45

 How can I copy large text files using hadoop fs such that split occurs
 based on blocks + new lines instead of blocks alone? Is there a way to do
 this?



 Regards
 Bejoy K S

 From handheld, Please excuse typos.





RE: Dynamic changing of slaves

2012-02-22 Thread kaveh
sounds like what you r looking for is a custom scheduler. along the line of 

!--property 
  namemapred.jobtracker.taskScheduler/name 
  valueorg.apache.hadoop.mapred.FairScheduler/value 
/property--

obviously not the FairScheduler, but it could give u some idea

-Original Message-
From: theta glynisdso...@email.arizona.edu
Sent: Wednesday, February 22, 2012 10:32am
To: core-u...@hadoop.apache.org
Subject: Dynamic changing of slaves


Hi,

I am working on a project which requires a setup as follows:

One master with four slaves.However, when a map only program is run, the
master dynamically selects the slave to run the map. For example, when the
program is run for the first time, slave 2 is selected to run the map and
reduce programs, and the output is stored on dfs. When the program is run
the second time, slave 3 is selected and son on.

I am currently using Hadoop 0.20.2 with Ubuntu 11.10.

Any ideas on creating the setup as described above?

Regards

-- 
View this message in context: 
http://old.nabble.com/Dynamic-changing-of-slaves-tp33365922p33365922.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
I guess I thought that backupnode would provide some level of namenode 
redundancy.  Perhaps I don't fully understand.

I'll check out Bigtop.  I looked at it a while ago and forgot about it.

Thanks
-jeremy

On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:

 Check out the Apache Bigtop project. I believe they have 0.22 RPMs. 
 
 Out of curiosity, why are you interested in BackupNode?
 
 -Joey
 
 Sent from my iPhone
 
 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC protocols,
 configuration etc. Hence it will not be available in bug fix release 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 



Re: Backupnode in 1.0.0?

2012-02-22 Thread Jeremy Hansen
By the way, I don't see anything 0.22 based in the bigtop repos.

Thanks
-jeremy

On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:

 I guess I thought that backupnode would provide some level of namenode 
 redundancy.  Perhaps I don't fully understand.
 
 I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
 Check out the Apache Bigtop project. I believe they have 0.22 RPMs. 
 
 Out of curiosity, why are you interested in BackupNode?
 
 -Joey
 
 Sent from my iPhone
 
 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
 Any possibility of getting spec files to create packages for 0.22?
 
 Thanks
 -jeremy
 
 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
 BackupNode is major functionality with change in required in RPC protocols,
 configuration etc. Hence it will not be available in bug fix release 1.0.1.
 
 It is also unlikely to be not available on minor releases in 1.x release
 streams.
 
 Regards,
 Suresh
 
 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:
 
 
 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?
 
 Thanks
 -jeremy
 
 



Re: Backupnode in 1.0.0?

2012-02-22 Thread Joey Echeverria
Check out this branch for the 0.22 version of Bigtop:

https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/

However, I don't think BackupNode is what you want. It sounds like you
want HA which is coming in (hopefully) 0.23.2 and is also available
today in CDH4b1.

-Joey

On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote:
 By the way, I don't see anything 0.22 based in the bigtop repos.

 Thanks
 -jeremy

 On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:

 I guess I thought that backupnode would provide some level of namenode 
 redundancy.  Perhaps I don't fully understand.

 I'll check out Bigtop.  I looked at it a while ago and forgot about it.

 Thanks
 -jeremy

 On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:

 Check out the Apache Bigtop project. I believe they have 0.22 RPMs.

 Out of curiosity, why are you interested in BackupNode?

 -Joey

 Sent from my iPhone

 On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:

 Any possibility of getting spec files to create packages for 0.22?

 Thanks
 -jeremy

 On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:

 BackupNode is major functionality with change in required in RPC 
 protocols,
 configuration etc. Hence it will not be available in bug fix release 
 1.0.1.

 It is also unlikely to be not available on minor releases in 1.x release
 streams.

 Regards,
 Suresh

 On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la wrote:


 It looks as if backupnode isn't supported in 1.0.0?  Any chances it's in
 1.0.1?

 Thanks
 -jeremy






-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Streaming job hanging

2012-02-22 Thread Mohit Anchlia
Streaming job just seems to be hanging

12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0%

-

On the admin page I see that it created 551 input split. Could somone
suggest a way to find out what might be causing it to hang? I increased
io.sort.mb to 200 MB.

I am using 5 data nodes with 12 CPU, 96G RAM.


Re: Clickstream and video Analysis

2012-02-22 Thread Mark Kerzner
http://www.wibidata.com/

Only it's not open source :)

You can research the story by looking at
http://www.youtube.com/watch?v=pUogubA9CEA to start

Mark

On Wed, Feb 22, 2012 at 11:30 PM, shreya@cognizant.com wrote:

 Hi,



 Could someone provide some links on Clickstream and video Analysis in
 Hadoop.



 Thanks and Regards,

 Shreya Pal




 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information.
 If you are not the intended recipient, please contact the sender by reply
 e-mail and destroy all copies of the original message.
 Any unauthorized review, use, disclosure, dissemination, forwarding,
 printing or copying of this email or any action taken in reliance on this
 e-mail is strictly prohibited and may be unlawful.



Re: Clickstream and video Analysis

2012-02-22 Thread Prashant Sharma
Tubemogul is one of them.

On Thu, Feb 23, 2012 at 11:00 AM, shreya@cognizant.com wrote:

 Hi,



 Could someone provide some links on Clickstream and video Analysis in
 Hadoop.



 Thanks and Regards,

 Shreya Pal




 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information.
 If you are not the intended recipient, please contact the sender by reply
 e-mail and destroy all copies of the original message.
 Any unauthorized review, use, disclosure, dissemination, forwarding,
 printing or copying of this email or any action taken in reliance on this
 e-mail is strictly prohibited and may be unlawful.



Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Guruprasad B
Hello everyone.

I have facing a problem on installation of hadoop multinode cluster when I
connecte all the linux box througth a router. I have succeeded in
installing single node cluster and multi node cluster over internet(LAN). I
want to test the multi node cluster by establishing private network between
all the nodes through a router which assigns private IP address.

I am keeping the same configuration which i used to test multi node cluster
over internet(LAN).

Error:
With the router setup wen i run word count example, the program gets hanged
in the middle. it will complete map 100% and will get stop when reduce
reaches around 20%.
I gave namenode format many times and tried to run the program but still no
luck :(. one unlucky thing is there is no error in log files of both master
and slave machine.

Please let me know what could be the problem.

Quick response is very much appreciated.

Thanks  Regards,
Guruprasad


Re: Backupnode in 1.0.0?

2012-02-22 Thread Suresh Srinivas
Joey,

Can you please answer the question from in the context of Apache releases.
Not sure if CDH4b1 needs to be mentioned in the context of this mailing
list.

Regards,
Suresh

On Wed, Feb 22, 2012 at 5:24 PM, Joey Echeverria j...@cloudera.com wrote:

 Check out this branch for the 0.22 version of Bigtop:

 https://svn.apache.org/repos/asf/incubator/bigtop/branches/hadoop-0.22/

 However, I don't think BackupNode is what you want. It sounds like you
 want HA which is coming in (hopefully) 0.23.2 and is also available
 today in CDH4b1.

 -Joey

 On Wed, Feb 22, 2012 at 7:09 PM, Jeremy Hansen jer...@skidrow.la wrote:
  By the way, I don't see anything 0.22 based in the bigtop repos.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 3:58 PM, Jeremy Hansen wrote:
 
  I guess I thought that backupnode would provide some level of namenode
 redundancy.  Perhaps I don't fully understand.
 
  I'll check out Bigtop.  I looked at it a while ago and forgot about it.
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 2:43 PM, Joey Echeverria wrote:
 
  Check out the Apache Bigtop project. I believe they have 0.22 RPMs.
 
  Out of curiosity, why are you interested in BackupNode?
 
  -Joey
 
  Sent from my iPhone
 
  On Feb 22, 2012, at 14:56, Jeremy Hansen jer...@skidrow.la wrote:
 
  Any possibility of getting spec files to create packages for 0.22?
 
  Thanks
  -jeremy
 
  On Feb 22, 2012, at 11:50 AM, Suresh Srinivas wrote:
 
  BackupNode is major functionality with change in required in RPC
 protocols,
  configuration etc. Hence it will not be available in bug fix release
 1.0.1.
 
  It is also unlikely to be not available on minor releases in 1.x
 release
  streams.
 
  Regards,
  Suresh
 
  On Wed, Feb 22, 2012 at 11:40 AM, Jeremy Hansen jer...@skidrow.la
 wrote:
 
 
  It looks as if backupnode isn't supported in 1.0.0?  Any chances
 it's in
  1.0.1?
 
  Thanks
  -jeremy
 
 
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



RE: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Jun Ping Du
Hi Guruprasad,
Do you have the valid IP--hostname setting in /etc/hosts so that
each nodes can be accessed by hostname? I guess the configuration over
public network can work may because it can get hostname resolved by DNS. 

Thanks,

Junping

-Original Message-
From: Guruprasad B [mailto:guruprasadk...@gmail.com] 
Sent: Thursday, February 23, 2012 2:43 PM
To: core-u...@hadoop.apache.org
Cc: Robin Mueller-Bady
Subject: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

Hello everyone.

I have facing a problem on installation of hadoop multinode cluster when I
connecte all the linux box througth a router. I have succeeded in
installing single node cluster and multi node cluster over internet(LAN).
I want to test the multi node cluster by establishing private network
between all the nodes through a router which assigns private IP address.

I am keeping the same configuration which i used to test multi node
cluster over internet(LAN).

Error:
With the router setup wen i run word count example, the program gets
hanged in the middle. it will complete map 100% and will get stop when
reduce reaches around 20%.
I gave namenode format many times and tried to run the program but still
no luck :(. one unlucky thing is there is no error in log files of both
master and slave machine.

Please let me know what could be the problem.

Quick response is very much appreciated.

Thanks  Regards,
Guruprasad


Re: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

2012-02-22 Thread Harsh J
I'm able to make it work with a simple router without issues (Works
just as well as vmnet8). Beyond Junping's point, also check that you
don't have a firewall acting between your nodes.

On Thu, Feb 23, 2012 at 12:45 PM, Jun Ping Du j...@vmware.com wrote:
 Hi Guruprasad,
    Do you have the valid IP--hostname setting in /etc/hosts so that
 each nodes can be accessed by hostname? I guess the configuration over
 public network can work may because it can get hostname resolved by DNS.

 Thanks,

 Junping

 -Original Message-
 From: Guruprasad B [mailto:guruprasadk...@gmail.com]
 Sent: Thursday, February 23, 2012 2:43 PM
 To: core-u...@hadoop.apache.org
 Cc: Robin Mueller-Bady
 Subject: Problem in setting up Hadoop Multi-Node Cluster using a ROUTER

 Hello everyone.

 I have facing a problem on installation of hadoop multinode cluster when I
 connecte all the linux box througth a router. I have succeeded in
 installing single node cluster and multi node cluster over internet(LAN).
 I want to test the multi node cluster by establishing private network
 between all the nodes through a router which assigns private IP address.

 I am keeping the same configuration which i used to test multi node
 cluster over internet(LAN).

 Error:
 With the router setup wen i run word count example, the program gets
 hanged in the middle. it will complete map 100% and will get stop when
 reduce reaches around 20%.
 I gave namenode format many times and tried to run the program but still
 no luck :(. one unlucky thing is there is no error in log files of both
 master and slave machine.

 Please let me know what could be the problem.

 Quick response is very much appreciated.

 Thanks  Regards,
 Guruprasad



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about