Yarn AppMaster request for containers not working

2015-04-16 Thread Antonescu Andrei Bogdan
Hello,

I'm writing a Yarn Client for my distributed processing framework and I`m
not able to request containers for workers from AppMaster
addContainerRequest method.

Please find here a more detailed explanation:
http://stackoverflow.com/questions/29668132/yarn-appmaster-request-for-containers-not-working

Let me know if more information is needed about configuration, server logs
or client code.

Many thanks,

Best,
Andrei


How to import custom Python module in MapReduce job?

2013-08-12 Thread Andrei
(cross-posted from
StackOverflowhttp://stackoverflow.com/questions/18150208/how-to-import-custom-module-in-mapreduce-job?noredirect=1#comment26584564_18150208
)

I have a MapReduce job defined in file *main.py*, which imports module lib from
file *lib.py*. I use Hadoop Streaming to submit this job to Hadoop cluster
as follows:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

-files lib.py,main.py
-mapper ./main.py map -reducer ./main.py reduce
-input input -output output

In my understanding, this should put both main.py and lib.py into *distributed
cache folder* on each computing machine and thus make module lib available
to main. But it doesn't happen - from log file I see, that files *are
really copied* to the same directory, but main can't import lib, throwing*
ImportError*.

Adding current directory to the path didn't work:

import sys
sys.path.append(os.path.realpath(__file__))import lib# ImportError

though, loading module manually did the trick:

import imp
lib = imp.load_source('lib', 'lib.py')

But that's not what I want. So why Python interpreter can see other .py files
in the same directory, but can't import them? Note, I have already tried
adding empty __init__.py file to the same directory without effect.


Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Andrei
Hi Binglin,

thanks for your explanation, now it makes sense. However, I'm not sure how
to implement suggested method with.

First of all, I found out that `-cachArchive` option is deprecated, so I
had to use `-archives` instead. I put my `lib.py` to directory `lib` and
then zipped it to `lib.zip`. After that I uploaded archive to HDFS and
 linked it in call to Streaming API as follows:

  hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar  -files main.py
*-archives hdfs://hdfs-namenode/user/me/lib.jar* -mapper ./main.py map
-reducer ./main.py reduce -combiner ./main.py combine -input input
-output output

But script failed, and from logs I see that lib.jar hasn't been unpacked.
What am I missing?




On Mon, Aug 12, 2013 at 11:33 AM, Binglin Chang decst...@gmail.com wrote:

 Hi,

 The problem seems to caused by symlink, hadoop uses file cache, so every
 file is in fact a symlink.

 lrwxrwxrwx 1 root root 65 Aug 12 15:22 lib.py -
 /root/hadoop3/data/nodemanager/usercache/root/filecache/13/lib.py
 lrwxrwxrwx 1 root root 66 Aug 12 15:23 main.py -
 /root/hadoop3/data/nodemanager/usercache/root/filecache/12/main.py
 [root@master01 tmp]# ./main.py
 Traceback (most recent call last):
   File ./main.py, line 3, in ?
 import lib
 ImportError: No module named lib

 This should be a python bug: when using import, it can't handle symlink

 You can try to use a directory containing lib.py and use -cacheArchive,
 so the symlink actually links to a directory, python may handle this case
 well.

 Thanks,
 Binglin



 On Mon, Aug 12, 2013 at 2:50 PM, Andrei faithlessfri...@gmail.com wrote:

 (cross-posted from 
 StackOverflowhttp://stackoverflow.com/questions/18150208/how-to-import-custom-module-in-mapreduce-job?noredirect=1#comment26584564_18150208
 )

 I have a MapReduce job defined in file *main.py*, which imports module
 lib from file *lib.py*. I use Hadoop Streaming to submit this job to
 Hadoop cluster as follows:

 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

 -files lib.py,main.py
 -mapper ./main.py map -reducer ./main.py reduce
 -input input -output output

  In my understanding, this should put both main.py and lib.py into 
 *distributed
 cache folder* on each computing machine and thus make module lib available
 to main. But it doesn't happen - from log file I see, that files *are
 really copied* to the same directory, but main can't import lib, throwing
 *ImportError*.

 Adding current directory to the path didn't work:

 import sys
 sys.path.append(os.path.realpath(__file__))import lib# ImportError

 though, loading module manually did the trick:

 import imp
 lib = imp.load_source('lib', 'lib.py')

  But that's not what I want. So why Python interpreter can see other .py 
 files
 in the same directory, but can't import them? Note, I have already tried
 adding empty __init__.py file to the same directory without effect.






Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Andrei
For some reason using -archives option leads to Error in configuring
object without any further information. However, I found out that -files
option works pretty well for this purpose. I was able to run my example as
follows.

1. I put `main.py` and `lib.py` into `app` directory.
2. In `main.py` I used `lib.py` directly, that is, import string is just

import lib

3. Instead of uploading to HDFS and using -archives option I just pointed
to `app` directory in -files option:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar *-files
app*-mapper 
*app/*main.py map -reducer *app/*main.py reduce -input input -output
output

It did the trick. Note, that I tested with both - CPython (2.6) and PyPy
(1.9), so I think it's quite safe to assume this way correct for Python
scripts.

Thanks for your help, Binglin, without it I wouldn't be able to figure it
out anyway.




On Mon, Aug 12, 2013 at 1:12 PM, Binglin Chang decst...@gmail.com wrote:

 Maybe you doesn't specify symlink name in you cmd line, so the symlink
 name will be just lib.jar, so I am not sure how you import lib module in
 your main.py file.
 Please try this:
 put main.py lib.py in same jar file, e.g.  app.zip
 *-archives hdfs://hdfs-namenode/user/me/app.zip#app* -mapper app/main.py
 map -reducer app/main.py reduce
 in main.py:
 import app.lib
 or:
 import .lib




Re: Large-scale collection of logs from multiple Hadoop nodes

2013-08-06 Thread Andrei
We have similar requirements and build our log collection system around
RSyslog and Flume. It is not in production yet, but tests so far look
pretty well. We rejected idea of using AMQP since it introduces large
overhead for log events.

Probably you can use Flume interceptors to do real-time processing on your
events, though I haven't tried anything like that earlier. Alternatively,
you can use Twitter Storm to handle your logs. Anyway, I wouldn't recommend
using Hadoop MapReduce for real-time processing of logs, and there's at
least one important reason for this.

As you probably know, Flume sources obtains new event and put it into
channel, where sink then pulls it from. If we are talking about HDFS Sink,
it has pull interval (normally time, but you can also use total size of
events in channel). If this interval is large, you won't get real-time
processing. And if it is small, Flume will produce large number of small
files in HDFS, say, of size 10-100KB. However, HDFS cannot store multiple
files in a single block, and minimal block size is 64M, so each of your
10-100KB of logs will become 64M (multiplied by # of replicas!).

Of course, you can use some ad-hoc solution like deleting small files from
time to time or combining them into a larger file, but monitoring of such a
system becomes much harder and may lead to unexpected results. So,
processing log events before they get to HDFS seems to be better idea.



On Tue, Aug 6, 2013 at 7:54 AM, Inder Pall inder.p...@gmail.com wrote:

 We have been using a flume like system for such usecases at significantly
 large scale and it has been working quite well.

 Would like to hear thoughts/challenges around using zeromq alike systems
 at good enough scale.

 inder
 you are the average of 5 people you spend the most time with
 On Aug 5, 2013 11:29 PM, Public Network Services 
 publicnetworkservi...@gmail.com wrote:

 Hi...

 I am facing a large-scale usage scenario of log collection from a Hadoop
 cluster and examining ways as to how it should be implemented.

 More specifically, imagine a cluster that has hundreds of nodes, each of
 which constantly produces Syslog events that need to be gathered an
 analyzed at another point. The total amount of logs could be tens of
 gigabytes per day, if not more, and the reception rate in the order of
 thousands of events per second, if not more.

 One solution is to send those events over the network (e.g., using using
 flume) and collect them in one or more (less than 5) nodes in the cluster,
 or in another location, whereby the logs will be processed by a either
 constantly MapReduce job, or by non-Hadoop servers running some log
 processing application.

 Another approach could be to deposit all these events into a queuing
 system like ActiveMQ or RabbitMQ, or whatever.

 In all cases, the main objective is to be able to do real-time log
 analysis.

 What would be the best way of implementing the above scenario?

 Thanks!

 PNS




Re: ConnectionException in container, happens only sometimes

2013-07-11 Thread Andrei
Here are logs of RM and 2 NMs:

RM (master-host): http://pastebin.com/q4qJP8Ld
NM where AM ran (slave-1-host): http://pastebin.com/vSsz7mjG
NM where slave container ran (slave-2-host): http://pastebin.com/NMFi6gRp

The only related error I've found in them is the following (from RM logs):

...
2013-07-11 07:46:06,225 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
AppAttemptId doesnt exist in cache appattempt_1373465780870_0005_01
2013-07-11 07:46:06,227 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.yarn.api.AMRMProtocolPB.allocate from
10.128.40.184:47101: output error
2013-07-11 07:46:06,228 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8030 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:456)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2140)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:939)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1005)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1747)
2013-07-11 07:46:11,238 INFO org.apache.hadoop.yarn.util.RackResolver:
Resolved my_user to /default-rack
2013-07-11 07:46:11,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node my_user(cmPort: 59267 httpPort: 8042) registered with
capability: 8192, assigned nodeId my_user:59267
...

Though from stack trace it's hard to tell where this error came from.

Let me know if you need any more information.










On Thu, Jul 11, 2013 at 1:00 AM, Andrei faithlessfri...@gmail.com wrote:

 Hi Omkar,

 I'm out of office now, so I'll post it as fast as get back there.

 Thanks


 On Thu, Jul 11, 2013 at 12:39 AM, Omkar Joshi ojo...@hortonworks.comwrote:

 can you post RM/NM logs too.?

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com




ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
Hi,

I'm running CDH4.3 installation of Hadoop with the following simple setup:

master-host: runs NameNode, ResourceManager and JobHistoryServer
slave-1-host and slave-2-hosts: DataNodes and NodeManagers.

When I run simple MapReduce job (both - using streaming API or Pi example
from distribution) on client I see that some tasks fail:

13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%
13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
attempt_1373454026937_0005_m_03_0, Status : FAILED
13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
attempt_1373454026937_0005_m_05_0, Status : FAILED
...
13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%
...

Every time different set of tasks/attempts fails. In some cases number of
failed attempts becomes critical, and the whole job fails, in other cases
job is finished successfully. I can't see any dependency, but I noticed the
following.

Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
_slave-2-host_ there will be corresponding syslog with the following
contents:

...
2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 1 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
...
2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From slave-2-host/
127.0.0.1 to slave-2-host:11812 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1229)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at com.sun.proxy.$Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:131)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:499)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:593)
at
org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:241)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1278)
at org.apache.hadoop.ipc.Client.call(Client.java:1196)
... 3 more


Notice several things:

1. This exception always happens on the different host than
ApplicationMaster runs on.
2. It always tries to connect to localhost, not other host in cluster.
3. Port number (11812 in this case) is always different.

My questions are:

1. I assume this is the task (container) that tries to establish
connection, but what it wants to connect to?
2. Why this error happens and how can I fix it?

Any suggestions are welcome.

Thanks,
Andrei


Re: ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
Hi Devaraj,

thanks for your answer. Yes, I suspected it could be because of host
mapping, so I have already checked (and have just re-checked) settings in
/etc/hosts of each machine, and they all are ok. I use both fully-qualified
names (e.g. `master-host.company.com`) and their shortcuts (e.g.
`master-host`), so it shouldn't depend on notation too.

I have also checked AM syslog. There's nothing about network, but there are
several messages like the following:

ERROR [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container
complete event for unknown container id
container_1373460572360_0001_01_88


I understand container just doesn't get registered in AM (probably because
of the same issue), is it correct? So I wonder who sends container
complete event to ApplicationMaster?





On Wed, Jul 10, 2013 at 3:19 PM, Devaraj k devara...@huawei.com wrote:

  1. I assume this is the task (container) that tries to establish
 connection, but what it wants to connect to? 

 It is trying to connect to MRAppMaster for executing the actual task.

 ** **

 1. I assume this is the task (container) that tries to establish
 connection, but what it wants to connect to? 

 It seems Container is not getting the correct MRAppMaster address due to
 some reason or AM is crashing before giving the task to Container. Probably
 it is coming due to invalid host mapping.  Can you check the host mapping
 is proper in both the machines and also check the AM log that time for any
 clue. 

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Andrei [mailto:faithlessfri...@gmail.com]
 *Sent:* 10 July 2013 17:32
 *To:* user@hadoop.apache.org
 *Subject:* ConnectionException in container, happens only sometimes

 ** **

 Hi, 

 ** **

 I'm running CDH4.3 installation of Hadoop with the following simple setup:
 

 ** **

 master-host: runs NameNode, ResourceManager and JobHistoryServer

 slave-1-host and slave-2-hosts: DataNodes and NodeManagers. 

 ** **

 When I run simple MapReduce job (both - using streaming API or Pi example
 from distribution) on client I see that some tasks fail: 

 ** **

 13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%

 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
 attempt_1373454026937_0005_m_03_0, Status : FAILED

 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
 attempt_1373454026937_0005_m_05_0, Status : FAILED

 ...

 13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%

 ...

 ** **

 Every time different set of tasks/attempts fails. In some cases number of
 failed attempts becomes critical, and the whole job fails, in other cases
 job is finished successfully. I can't see any dependency, but I noticed the
 following. 

 ** **

 Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
 _slave-2-host_ there will be corresponding syslog with the following
 contents: 

 ** **

 ... 

 2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying
 connect to server: slave-2-host/127.0.0.1:11812. Already tried 0 time(s);
 retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
 sleepTime=1 SECONDS)

 2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying
 connect to server: slave-2-host/127.0.0.1:11812. Already tried 1 time(s);
 retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
 sleepTime=1 SECONDS)

 ...

 2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying
 connect to server: slave-2-host/127.0.0.1:11812. Already tried 9 time(s);
 retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
 sleepTime=1 SECONDS)

 2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
 Exception running child : java.net.ConnectException: Call From slave-2-host/
 127.0.0.1 to slave-2-host:11812 failed on connection exception:
 java.net.ConnectException: Connection refused; For more details see:
 http://wiki.apache.org/hadoop/ConnectionRefused

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 

 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 

 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 

 at
 org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
 

 at org.apache.hadoop.ipc.Client.call(Client.java:1229)

 at
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
 

 at com.sun.proxy.$Proxy6.getTask(Unknown Source)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:131)

 Caused by: java.net.ConnectException

Re: ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
If it helps, full log of AM can be found here http://pastebin.com/zXTabyvv
.


On Wed, Jul 10, 2013 at 4:21 PM, Andrei faithlessfri...@gmail.com wrote:

 Hi Devaraj,

 thanks for your answer. Yes, I suspected it could be because of host
 mapping, so I have already checked (and have just re-checked) settings in
 /etc/hosts of each machine, and they all are ok. I use both fully-qualified
 names (e.g. `master-host.company.com`) and their shortcuts (e.g.
 `master-host`), so it shouldn't depend on notation too.

 I have also checked AM syslog. There's nothing about network, but there
 are several messages like the following:

 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container 
 complete event for unknown container id container_1373460572360_0001_01_88


 I understand container just doesn't get registered in AM (probably because
 of the same issue), is it correct? So I wonder who sends container
 complete event to ApplicationMaster?





 On Wed, Jul 10, 2013 at 3:19 PM, Devaraj k devara...@huawei.com wrote:

  1. I assume this is the task (container) that tries to establish
 connection, but what it wants to connect to? 

 It is trying to connect to MRAppMaster for executing the actual task.

 ** **

 1. I assume this is the task (container) that tries to establish
 connection, but what it wants to connect to? 

 It seems Container is not getting the correct MRAppMaster address due to
 some reason or AM is crashing before giving the task to Container. Probably
 it is coming due to invalid host mapping.  Can you check the host mapping
 is proper in both the machines and also check the AM log that time for any
 clue. 

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Andrei [mailto:faithlessfri...@gmail.com]
 *Sent:* 10 July 2013 17:32
 *To:* user@hadoop.apache.org
 *Subject:* ConnectionException in container, happens only sometimes

 ** **

 Hi, 

 ** **

 I'm running CDH4.3 installation of Hadoop with the following simple
 setup: 

 ** **

 master-host: runs NameNode, ResourceManager and JobHistoryServer

 slave-1-host and slave-2-hosts: DataNodes and NodeManagers. 

 ** **

 When I run simple MapReduce job (both - using streaming API or Pi example
 from distribution) on client I see that some tasks fail: 

 ** **

 13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%

 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
 attempt_1373454026937_0005_m_03_0, Status : FAILED

 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
 attempt_1373454026937_0005_m_05_0, Status : FAILED

 ...

 13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%

 ...

 ** **

 Every time different set of tasks/attempts fails. In some cases number of
 failed attempts becomes critical, and the whole job fails, in other cases
 job is finished successfully. I can't see any dependency, but I noticed the
 following. 

 ** **

 Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
 _slave-2-host_ there will be corresponding syslog with the following
 contents: 

 ** **

 ... 

 2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client:
 Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
 0 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
 **

 2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client:
 Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
 1 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
 **

 ...

 2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client:
 Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
 9 time(s); retry policy is
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
 **

 2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
 Exception running child : java.net.ConnectException: Call From slave-2-host/
 127.0.0.1 to slave-2-host:11812 failed on connection exception:
 java.net.ConnectException: Connection refused; For more details see:
 http://wiki.apache.org/hadoop/ConnectionRefused

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 

 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 

 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 

 at
 org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
 

 at org.apache.hadoop.ipc.Client.call(Client.java:1229)

 at
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225

hadoop mapreduce and contrib.lucene.index: ClassNotFoundException: org.apache.lucene.index.IndexDeletionPolicy

2011-02-10 Thread Andrei Krichevskiy
Hi,

I tried to run the example org.apache.hadoop.contrib.index.main.UpdateIndex:

hadoop-0.21.0$ ./bin/hadoop  jar hadoop-0.21.0-index.jar   -inputPaths
di/input1/ -outputPath di/output/ -indexPath di/ -numShards 1
-numMapTasks 2 -conf conf/index-config.xml

and got

11/02/10 16:35:52 INFO mapreduce.Job: Task Id :
attempt_201102080006_0067_m_01_2, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.lucene.index.IndexDeletionPolicy

I double checked that hadop-0.21.0/lib folder contaings
lucene-core-2.3.1.jar and also tried to pass it as -libjars :

./bin/hadoop  jar hadoop-0.21.0-index.jar  -libjars
lib/lucene-core-2.3.1.jar  -inputPaths di/input1/ -outputPath
di/output/ -indexPath di/ -numShards 1 -numMapTasks 2 -conf
conf/index-config.xml

but the result is still the same.

Thanks  in advance,
Andrey


Re: Question about rest interface

2010-09-30 Thread Andrei Savu
The latest version of the rest gateway, now available in trunk, works
the way you want it.

I had the same problem you have while working on the code. There is
also a simple start/stop script available (src/contrib/rest/rest.sh).

You should checkout the trunk [1] or [2]. Runt ant jar in the root
folder and ant tar in src/contrib/rest. After running these steps
you will find in build/contrib/rest/ a .tar.gz archive that contains
everything you need to run a standalone REST gateway for ZooKeeper.

The config file should be pretty much self explanatory but if you need
more help let me know.

The version in the trunk is now session aware and you can use it even
to implement things like leader election (you can find some python
examples in  src/contrib/rest/src/python).

I'm planning to add more features to it, things like ACLs and session
authentication but unfortunately I haven't got the time. I should be
able to do this in the near future.

[1] http://hadoop.apache.org/zookeeper/version_control.html
[2] http://github.com/apache/zookeeper

On Thu, Sep 30, 2010 at 7:01 PM, Patrick Hunt ph...@apache.org wrote:
 Hi Marc, you should checkout the REST interface that's on the svn trunk, it
 includes new functionality and numerous fixes that might be interesting to
 you, this will be part of 3.4.0. CCing Andrei who worked on this as part of
 his GSOC project this summer.
 If you look at this file:
 src/contrib/rest/src/java/org/apache/zookeeper/server/jersey/RestMain.java
 you'll see how we start the server. Looks like we need an option to run as a
 process w/o assuming interactive use. It should be pretty easy for someone
 to patch this (if you do please consider submitting a patch via our JIRA
 process, others would find it interesting). With the current code you might
 get away with something like java   /dev/null -- basically turn off
 stdin.
 Patrick
 On Wed, Sep 29, 2010 at 3:09 PM, marc slayton gangofn...@yahoo.com wrote:

 Hey all --

 Having a great time with Zookeeper and recently started testing
 the RESTful interface in src/contrib.

 'ant runrestserver' creates a test instance attached to stdin
 which works well but any input kills it. How does one configure
 Jersey to run for real i.e. not attached to my terminal's
 stdin?

 I've tried altering log4j settings without much luck.

 If there are example setup docs for Linux, could somebody point
 me there? FWIW, I'm running zookeeper-3.3.1 with openjdk-1.6.

 Cheers, and thanks in advance --







-- 
Andrei Savu -- http://www.andreisavu.ro/


Re: ZK monitoring

2010-08-17 Thread Andrei Savu
It's not possible. You need to query all the servers in order to know
who is the current leader.

It should be pretty simple to implement this by parsing the output
from the 'stat' 4-letter command.

On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao jun...@gmail.com wrote:
 Hi,

 Is there a way to see the current leader and a list of followers from a
 single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
 commands) only provides info local to a node.

 Thanks,

 Jun




-- Andrei Savu


Re: ZK monitoring

2010-08-17 Thread Andrei Savu
You should also take a look at ZOOKEEPER-744 [1] and ZOOKEEPER-799 [2]

The archive from 799 contains ready to be used scripts for monitoring
ZooKeeper using Ganglia, Nagios and Cacti.

Let me know if you need more help.

[1] https://issues.apache.org/jira/browse/ZOOKEEPER-744
[2] https://issues.apache.org/jira/browse/ZOOKEEPER-799

On Tue, Aug 17, 2010 at 9:50 PM, Jun Rao jun...@gmail.com wrote:
 Hi,

 Is there a way to see the current leader and a list of followers from a
 single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
 commands) only provides info local to a node.

 Thanks,

 Jun




-- Andrei Savu


Re: building client tools

2010-07-13 Thread Andrei Savu
Hi,

In this case I think you have to install libcppunit (should work using
apt-get). I believe that should be enough but I don't really remember
what else I've installed the first time I compiled the c client.

Let me know what else was needed. I would like to submit a patch to
update the README file in order to avoid this problem in the future.

Thanks.

On Tue, Jul 13, 2010 at 8:09 PM, Martin Waite waite@gmail.com wrote:
 Hi,

 I am trying to build the c client on debian lenny for zookeeper 3.3.1.

 autoreconf -if
 configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
 configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
 configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
 autoreconf: /usr/bin/autoconf failed with exit status: 1

 I probably need to install some required tools.   Is there a list of what
 tools are needed to build this please ?

 regards,
 Martin




-- 
Andrei Savu - http://andreisavu.ro/


Re: Starting zookeeper in replicated mode

2010-06-21 Thread Andrei Savu
As Luka Stojanovic suggested you need to a a file called
/var/zookeeper/myid on each node:

$ echo 1,2 ... 6  /var/zookeeper/myid

I want to make a few more comments related to your setup and to your questions:

- there is no configured master node in a zookeeper cluster. the
leader is automatically elected at runtime
- you can write and read from any node at any time

 Am I supposed to have an instance of ZooKeeper on each node started before 
 running in replication mode?
- you start the cluster by starting one node at a time

 Should I have each node that will be running ZK listed in the config file?
- yes. you need to have all nodes running ZK listed in the config file.

 Should I be using an IP address to point to a server instead of a hostname?
- it doesn't really make difference if you use hostnames or IP addresses.

I hope this will help you.

Andrei

On Mon, Jun 21, 2010 at 10:04 PM, Erik Test erik.shi...@gmail.com wrote:
 Hi All,

 I'm having a problem with installing zookeeper on a cluster with 6 nodes in
 replicated mode. I was able to install and run zookeeper in standalone mode
 but I'm unable to run zookeeper in replicated mode.

 I've added a list of servers in zoo.cfg as suggested by the ZooKeeper
 Getting Started Guide but I get these logs displayed to screen:

 *[r...@master1 bin]# ./zkServer.sh start
 JMX enabled by default
 Using config: /root/zookeeper-3.2.2/bin/../conf/zoo.cfg
 Starting zookeeper ...
 STARTED
 [r...@master1 bin]# 2010-06-21 12:25:23,738 - INFO
 [main:quorumpeercon...@80] - Reading configuration from:
 /root/zookeeper-3.2.2/bin/../conf/zoo.cfg
 2010-06-21 12:25:23,743 - INFO  [main:quorumpeercon...@232] - Defaulting to
 majority quorums
 2010-06-21 12:25:23,745 - FATAL [main:quorumpeerm...@82] - Invalid config,
 exiting abnormally
 org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error
 processing /root/zookeeper-3.2.2/bin/../conf/zoo.cfg
        at
 org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:100)
        at
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:98)
        at
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:75)
 Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file is
 missing
        at
 org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:238)
        at
 org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:96)
        ... 2 more
 Invalid config, exiting abnormally*

 And here is my config file:
 *
 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial
 # synchronization phase can take
 initLimit=5
 # The number of ticks that can pass between
 # sending a request and getting an acknowledgement
 syncLimit=2
 # the directory where the snapshot is stored.
 dataDir=/var/zookeeper
 # the port at which the clients will connect
 clientPort=2181
 server.1=master1:2888:3888
 server.2=slave2:2888:3888
 server.3=slave3:2888:3888
 *
 I'm a little confused as to why this doesn't work and I haven't had any luck
 finding answers to some questions I have.

 Am I supposed to have an instance of ZooKeeper on each node started before
 running in replication mode? Should I have each node that will be running ZK
 listed in the config file? Should I be using an IP address to point to a
 server instead of a hostname?

 Thanks for your time.
 Erik




-- 
Andrei Savu

http://www.andreisavu.ro/


GSoC 2010: ZooKeeper Monitoring Recipes and Web-based Administrative Interface

2010-05-13 Thread Andrei Savu
Hi all,

My name is Andrei Savu and I am on of the GSoC2010 accepted students.
My mentor is Patrick Hunt.

My objective in the next 4 months is to write tools and recipes for
monitoring ZooKeeper and to implement a web-based administrative
interface.

I have created a wiki page for this project:
 - http://wiki.apache.org/hadoop/ZooKeeper/GSoCMonitoringAndWebInterface

Are there any HBase  / Hadoop  specific ZooKeeper monitoring requirements?

Regards

-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Sample Application: Feed Aggregator

2009-09-25 Thread Andrei Savu
Hi,

I have just finished the first version of a small python / thrift demo
application: a basic feed aggregator.I want to share this with you
because I believe this could be useful for a beginner (I have detailed
install instructions). Someone new to Hbase should be able to
understand how to build an index table.

You can find the source code on github:
http://github.com/andreisavu/feedaggregator

Thank you for your attention. I would highly appreciate your feedback.

-- 
Savu Andrei

Website: http://www.andreisavu.ro/


unable to start hbase 0.20. zookeeper server not found.

2009-08-28 Thread Andrei Savu
Hi,

I have downloaded the release candidate from here: http://su.pr/1NHIlM
and I am unable to make it start standalone. It seems like the
zookeeper server does not start.

2009-08-28 10:43:49,872 INFO org.apache.zookeeper.ZooKeeper:
Initiating client connection, host=localhost:2181 sessionTimeout=6
watcher=Thread[Thread-0,5,main]
2009-08-28 10:43:49,876 INFO org.apache.zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
2009-08-28 10:43:49,911 INFO org.apache.zookeeper.ClientCnxn:
Attempting connection to server localhost/127.0.0.1:2181
2009-08-28 10:43:49,926 WARN org.apache.zookeeper.ClientCnxn:
Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@7d2452e8
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:885)
2009-08-28 10:43:49,933 WARN org.apache.zookeeper.ClientCnxn: Ignoring
exception during shutdown input

The zookeeper server should be installed as a standalone application?

I'm running  bin/start-hbase.sh . On the same machine hbase 0.19.3 works fine.

Sorry if this is a silly question :)

-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Re: unable to start hbase 0.20. zookeeper server not found.

2009-08-28 Thread Andrei Savu
While trying to write a response I found the solution :)

It seems like the os env is not what I expected it to be when running
a command over ssh.

This tutorial helped me understand why JAVA_HOME is not set and how to fix it.
http://www.netexpertise.eu/en/ssh/environment-variables-and-ssh.html

Thanks for your time.

On Fri, Aug 28, 2009 at 5:06 PM, Jean-Daniel Cryansjdcry...@apache.org wrote:
 What's in the Zookeeper log? It's kept with the other HBase logs.

 J-D

 On Fri, Aug 28, 2009 at 3:59 AM, Andrei Savusavu.and...@gmail.com wrote:
 Hi,

 I have downloaded the release candidate from here: http://su.pr/1NHIlM
 and I am unable to make it start standalone. It seems like the
 zookeeper server does not start.

 2009-08-28 10:43:49,872 INFO org.apache.zookeeper.ZooKeeper:
 Initiating client connection, host=localhost:2181 sessionTimeout=6
 watcher=Thread[Thread-0,5,main]
 2009-08-28 10:43:49,876 INFO org.apache.zookeeper.ClientCnxn:
 zookeeper.disableAutoWatchReset is false
 2009-08-28 10:43:49,911 INFO org.apache.zookeeper.ClientCnxn:
 Attempting connection to server localhost/127.0.0.1:2181
 2009-08-28 10:43:49,926 WARN org.apache.zookeeper.ClientCnxn:
 Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@7d2452e8
 java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:885)
 2009-08-28 10:43:49,933 WARN org.apache.zookeeper.ClientCnxn: Ignoring
 exception during shutdown input

 The zookeeper server should be installed as a standalone application?

 I'm running  bin/start-hbase.sh . On the same machine hbase 0.19.3 works 
 fine.

 Sorry if this is a silly question :)

 --
 Savu Andrei

 Website: http://www.andreisavu.ro/





-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Re: hbase/jython outdated

2009-08-27 Thread Andrei Savu
See comments bellow.

On Thu, Aug 27, 2009 at 7:58 PM, stackst...@duboce.net wrote:
 On Wed, Aug 26, 2009 at 3:29 AM, Andrei Savu savu.and...@gmail.com wrote:

 I have fixed the code samples and opened a feature request on JIRA for
 the jython command.

 https://issues.apache.org/jira/browse/HBASE-1796


 Thanks.  Patch looks good.  Will commit soon.   Did you update the jython
 wiki page?  It seems to be using old API still.

I have updated the Jython wiki page to use the latest API. After the
commit I will also
update the instruction for running the sample code.



 Is there any python library for REST interface? How stable is the REST
 interface?


 Not that I know of (a ruby one, yes IIRC).  Write against stargate if you
 are going to do one since o.a.h.h.rest is deprecated in 0.20.0.


I am going give it a try and post the results back here.

What about thrift? It's going to be deprecated?

 St.Ack


-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Re: hbase/jython outdated

2009-08-26 Thread Andrei Savu
I have fixed the code samples and opened a feature request on JIRA for
the jython command.

https://issues.apache.org/jira/browse/HBASE-1796

Until recently I have used the python thrift interface but it has some
serious issues with unicode.
Currently I am searching for alternatives.

Is there any python library for REST interface? How stable is the REST
interface?

On Tue, Aug 25, 2009 at 4:18 PM, Jean-Daniel Cryansjdcry...@apache.org wrote:
 I can edit this page just fine but you have to be logged in to do
 that, anyone can sign in.

 Thx!

 J-D

 On Tue, Aug 25, 2009 at 7:02 AM, Andrei Savusavu.and...@gmail.com wrote:
 Hi,

 The Hbase/Jython ( http://wiki.apache.org/hadoop/Hbase/Jython ) wiki
 page is outdated.
 I want to edit it but the page is marked as immutable.

 I have attached a working sample and a patched version of bin/hbase
 with the jython command added.

 --
 Savu Andrei

 Website: http://www.andreisavu.ro/





-- 
Savu Andrei

Website: http://www.andreisavu.ro/


hbase/jython outdated

2009-08-25 Thread Andrei Savu
Hi,

The Hbase/Jython ( http://wiki.apache.org/hadoop/Hbase/Jython ) wiki
page is outdated.
I want to edit it but the page is marked as immutable.

I have attached a working sample and a patched version of bin/hbase
with the jython command added.

-- 
Savu Andrei

Website: http://www.andreisavu.ro/

import java.lang
from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, HColumnDescriptor, HConstants
from org.apache.hadoop.hbase.client import HBaseAdmin, HTable
from org.apache.hadoop.hbase.io import BatchUpdate, Cell, RowResult

# First get a conf object.  This will read in the configuration 
# that is out in your hbase-*.xml files such as location of the
# hbase master node.
conf = HBaseConfiguration()

# Create a table named 'test' that has two column families,
# one named 'content, and the other 'anchor'.  The colons
# are required for column family names.
tablename = test  

desc = HTableDescriptor(tablename)
desc.addFamily(HColumnDescriptor(content:))
desc.addFamily(HColumnDescriptor(anchor:))
admin = HBaseAdmin(conf)

# Drop and recreate if it exists
if admin.tableExists(tablename):
admin.disableTable(tablename)
admin.deleteTable(tablename)
admin.createTable(desc)

tables = admin.listTables()
table = HTable(conf, tablename)

# Add content to 'column:' on a row named 'row_x'
row = 'row_x'
update = BatchUpdate(row)
update.put('content:', 'some content')
table.commit(update)

# Now fetch the content just added, returns a byte[]
data_row = table.get(row, content:)
data = java.lang.String(data.value, UTF8)

print The fetched row contains the value '%s' % data

# Delete the table.
admin.disableTable(desc.getName())
admin.deleteTable(desc.getName())



Re: Feed Aggregator Schema

2009-08-17 Thread Andrei Savu
Thanks for your answer Peter.

I will give it a try using this approach and I will let you know how it works.

On Mon, Aug 17, 2009 at 10:26 AM, Peter
Rietzlerpeter.rietz...@smarter-ecommerce.com wrote:

 Hi

 In our project we are handling event lists where we have similar
 requirements. We do ordering by choosing our row keys wisely. We use the
 following key for our events (they should be ordered by time in ascending
 order):

 eventListName/MMddHHmmssSSS-000[-111]

 where eventListName is the name of the event list and 000 is a three digit
 instance id to disambiguate between different running instances of
 application, and -111 is optional to disambiguate events that occured in the
 same millisecond on one instance.

 We additionally insert and artifical row for each day with the id

 eventListName/MMddHHmmssSSS

 This allows us to start scanning at the beginning of each day without
 searching through the event list.

 You need to be aware of the fact that if you have a very high load of
 inserts, then always one hbase region server is busy inserting while the
 others are idle ... if that's a problem for you, you have to find different
 keys for your purpose.

 You could also use an HBase index table but I have no experience with it and
 I remember an email on the mailing list that this would double all requests
 because the API would first lookup the index table and then the original
 table ??? (please correct me if this is not right ...)

 Kind regards,
 Peter



 Andrei Savu wrote:

 Hello,

 I am working on a project involving monitoring a large number of
 rss/atom feeds. I want to use hbase for data storage and I have some
 problems designing the schema. For the first iteration I want to be
 able to generate an aggregated feed (last 100 posts from all feeds in
 reverse chronological order).

 Currently I am using two tables:

 Feeds: column families Content and Meta : raw feed stored in Content:raw
 Urls: column families Content and Meta : raw post version stored in
 Content:raw and the rest of the data found in RSS stored in Meta

 I need some sort of index table for the aggregated feed. How should I
 build that? Is hbase a good choice for this kind of application?

 In other words: Is it possible( in hbase) to design a schema that
 could efficiently answer to queries like the one listed bellow?

 SELECT data FROM Urls ORDER BY date DESC LIMIT 100

 Thanks.

 --
 Savu Andrei

 Website: http://www.andreisavu.ro/



 --
 View this message in context: 
 http://www.nabble.com/Feed-Aggregator-Schema-tp24974071p25002264.html
 Sent from the HBase User mailing list archive at Nabble.com.





-- 
Savu Andrei

Website: http://www.andreisavu.ro/


Feed Aggregator Schema

2009-08-14 Thread Andrei Savu
Hello,

I am working on a project involving monitoring a large number of
rss/atom feeds. I want to use hbase for data storage and I have some
problems designing the schema. For the first iteration I want to be
able to generate an aggregated feed (last 100 posts from all feeds in
reverse chronological order).

Currently I am using two tables:

Feeds: column families Content and Meta : raw feed stored in Content:raw
Urls: column families Content and Meta : raw post version stored in
Content:raw and the rest of the data found in RSS stored in Meta

I need some sort of index table for the aggregated feed. How should I
build that? Is hbase a good choice for this kind of application?

In other words: Is it possible( in hbase) to design a schema that
could efficiently answer to queries like the one listed bellow?

SELECT data FROM Urls ORDER BY date DESC LIMIT 100

Thanks.

--
Savu Andrei

Website: http://www.andreisavu.ro/