[ http://issues.apache.org/jira/browse/NUTCH-368?page=all ]

Andrzej Bialecki  updated NUTCH-368:
------------------------------------

    Attachment: msg.tgz

Implementation + JUnit tests.

> Message queueing system
> -----------------------
>
>                 Key: NUTCH-368
>                 URL: http://issues.apache.org/jira/browse/NUTCH-368
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>         Attachments: msg.tgz
>
>
> This is an implementation of a filesystem-based message queueing system. The 
> motivation for this functionality is explained in HADOOP-490 - there is 
> nothing Nutch-specific in this implementation, so if it's considered 
> generally useful it could be moved there.
> Below are excerpts from the included javadocs.
> The model of the system is as follows:
>     * applications (including map-reduce jobs) may create their own separate 
> message queueing area. Alternatively, they can specifically ask for a named 
> message queue, belonging to a different application or existing as a 
> system-wide queue. Message queues are created under "/mq" and then the 
> message queue id (for map-reduce jobs this is a job id, or it can be any 
> other name passed as job id to the constructor).
>       Please see the example for more information.
>     * a single unit of information passing through queues is a Msg, which has 
> a unique identifier (consisting of creation time and publisher name), string 
> subject, and content (Writable).
>     * single MsgQueue in fact consists of any number of topics. There are 
> four predefined ones: in, out, err, and ctrl.
>     * messages are published to topics, which present a sequential view of 
> messages, sorted by msgId (which corresponds to their order of arrival).
>     * each message queue may periodically poll for changes 
> (MsgQueue.startPolling()), using a separate thread. Polling updates the list 
> of topics and messages. Poll interval is configurable, and defaults to 5 sec.
>     * each detected change in the queue (add/remove topic, add/remove 
> message) may be communicated to registered listeners. Out-of-band messages 
> are not supported in this version, but it's not too complicated to add them. 
> Applications can create listeners watching queues for newly added messages, 
> or deleted messages, added topics or deleted topics, etc.
>     * each instance of MsgQueue using the same physical queue maintains its 
> own view of the queue, keeping track of topics and messages that it considers 
> "processed and discarded". In other words, multiple readers and creators may 
> modify queues, and each knows which messages it already processed and which 
> ones are new. In a similar fashion, instances may willfully "remove" certain 
> topics from their view, even though these topics still physically exist and 
> are available for other instances (and later on they can "add" them to their 
> view again).
>       This somewhat complicated feature was implemented in order to support 
> multiple readers for the same message (e.g. many tasks per one mapred job). 
> Each task needs to register for the same queue, and if they didn't have their 
> own views of the queue, messages would be consumed by the first task that got 
> to them. As it is implemented now, each task may consume messages at its own 
> pace. At the end of the job applications may elect to keep the queue around 
> or to destroy it (and thus remove all topics and messages in it).
>     * messages, topics and queues may be destroyed by any user, at which 
> point they are physically removed from the filesystem. All users will 
> gradually update their views, during the next poll operation.
>     * there is a command-line tool to examine and modify queues, and also to 
> retrieve and send simple text messages. You can run it like this:
>          bin/nutch org.apache.nutch.util.msg.MsgQueueTool ...many options...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to