[ http://issues.apache.org/jira/browse/NUTCH-368?page=all ]
Andrzej Bialecki updated NUTCH-368:
------------------------------------
Attachment: msg.tgz
Implementation + JUnit tests.
> Message queueing system
> -----------------------
>
> Key: NUTCH-368
> URL: http://issues.apache.org/jira/browse/NUTCH-368
> Project: Nutch
> Issue Type: New Feature
> Affects Versions: 0.9.0
> Reporter: Andrzej Bialecki
> Assigned To: Andrzej Bialecki
> Attachments: msg.tgz
>
>
> This is an implementation of a filesystem-based message queueing system. The
> motivation for this functionality is explained in HADOOP-490 - there is
> nothing Nutch-specific in this implementation, so if it's considered
> generally useful it could be moved there.
> Below are excerpts from the included javadocs.
> The model of the system is as follows:
> * applications (including map-reduce jobs) may create their own separate
> message queueing area. Alternatively, they can specifically ask for a named
> message queue, belonging to a different application or existing as a
> system-wide queue. Message queues are created under "/mq" and then the
> message queue id (for map-reduce jobs this is a job id, or it can be any
> other name passed as job id to the constructor).
> Please see the example for more information.
> * a single unit of information passing through queues is a Msg, which has
> a unique identifier (consisting of creation time and publisher name), string
> subject, and content (Writable).
> * single MsgQueue in fact consists of any number of topics. There are
> four predefined ones: in, out, err, and ctrl.
> * messages are published to topics, which present a sequential view of
> messages, sorted by msgId (which corresponds to their order of arrival).
> * each message queue may periodically poll for changes
> (MsgQueue.startPolling()), using a separate thread. Polling updates the list
> of topics and messages. Poll interval is configurable, and defaults to 5 sec.
> * each detected change in the queue (add/remove topic, add/remove
> message) may be communicated to registered listeners. Out-of-band messages
> are not supported in this version, but it's not too complicated to add them.
> Applications can create listeners watching queues for newly added messages,
> or deleted messages, added topics or deleted topics, etc.
> * each instance of MsgQueue using the same physical queue maintains its
> own view of the queue, keeping track of topics and messages that it considers
> "processed and discarded". In other words, multiple readers and creators may
> modify queues, and each knows which messages it already processed and which
> ones are new. In a similar fashion, instances may willfully "remove" certain
> topics from their view, even though these topics still physically exist and
> are available for other instances (and later on they can "add" them to their
> view again).
> This somewhat complicated feature was implemented in order to support
> multiple readers for the same message (e.g. many tasks per one mapred job).
> Each task needs to register for the same queue, and if they didn't have their
> own views of the queue, messages would be consumed by the first task that got
> to them. As it is implemented now, each task may consume messages at its own
> pace. At the end of the job applications may elect to keep the queue around
> or to destroy it (and thus remove all topics and messages in it).
> * messages, topics and queues may be destroyed by any user, at which
> point they are physically removed from the filesystem. All users will
> gradually update their views, during the next poll operation.
> * there is a command-line tool to examine and modify queues, and also to
> retrieve and send simple text messages. You can run it like this:
> bin/nutch org.apache.nutch.util.msg.MsgQueueTool ...many options...
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers