Hello Mat (again),

I followed your advice and wrote my own Scheduler and used some of your
aeolus code too! It works just fine and it makes my testing much easier
(less terminals :-D ). However, now I see that my bandwidth is limited by
the disc bandwidth. Matters get worse since I am working on AWS and the
nodes that have the files use EBS storage.

Do you happen to have any advice on how I can avoid the disc latency and
achieve higher input rates? I know that buffering is one way to go.
However, I am afraid that even if I add additional threads on my code, they
will be blocked every time the worker context switches my task.

Thanks,
Nick

On Tue, Sep 15, 2015 at 10:20 AM, Nick R. Katsipoulakis <
[email protected]> wrote:

> Hello again,
>
> Thank you for the link and the info. I am going to look into this in more
> detail.
>
> Cheers,
> Nick
>
> On Tue, Sep 15, 2015 at 9:43 AM, Matthias J. Sax <[email protected]> wrote:
>
>> Hi Nick,
>>
>> thanks. I like Aeolus, too ;)
>>
>> If you want to make sure that a specific spout/bolt in scheduled to a
>> specific node, you need to provide a custom scheduler.
>>
>> See here for an example:
>>
>> https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>>
>>
>> -Matthias
>>
>> On 09/15/2015 03:31 PM, Nick R. Katsipoulakis wrote:
>> > Hey Matthias,
>> >
>> > I apologize for the late response, but I was busy with some additional
>> > changes to my code. Thank you very much for your reply and the code
>> > snippets you provided (love the name "aeolus" by the way :-) ). The only
>> > reason that I have not created my File-provider as a spout is because I
>> > do not always know on which node my spout is spawned. Therefore, there
>> > might be a setting in which the file with the data is not co-located
>> > with the spout. Do you have any work-around for this problem?
>> >
>> > Thanks again,
>> > Nick
>> >
>> > On Thu, Sep 10, 2015 at 5:15 PM, Matthias J. Sax <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     Hi,
>> >
>> >     You can simple read the file directly in your Spout. This is an
>> >     implementation that reads multiple files concurrently (with respect
>> to a
>> >     timestamp attribute that is included in the input record -- of
>> course
>> >     you can simplify the code if you don't have a timestamp attribute
>> and
>> >     just want to read a single file or multiple files after each other:
>> >
>> >
>> https://github.com/mjsax/aeolus/blob/master/queries/lrb/src/main/java/de/hub/cs/dbis/lrb/operators/FileReaderSpout.java
>> >
>> >     Furthermore, I use a Spout-Wrapper for controlling the ingestion
>> rate
>> >     (ie, spout output rate). If you want to get rid of
>> >     nested/layered/wrapped Spouts, just merge the code of both
>> >     implementations. I personally prefer the wrapper approach as it is
>> very
>> >     flexible...
>> >
>> >
>> https://github.com/mjsax/aeolus/blob/master/queries/utils/src/main/java/de/hub/cs/dbis/aeolus/spouts/FixedStreamRateDriverSpout.java
>> >
>> >     Feel free to use and/or modify both.
>> >
>> >     -Matthias
>> >
>> >
>> >     On 09/10/2015 10:18 PM, Nick R. Katsipoulakis wrote:
>> >     > Hello,
>> >     >
>> >     > I am currently running some experiments and in order to send data
>> >     to my
>> >     > spouts, I do the following:
>> >     >
>> >     > I spawn external processes which read the data from files (on
>> >     disk) and
>> >     > they send them through TCP sockets to Spouts. I do the former
>> because
>> >     > (a) I want to control the input rate of the spouts, and (b) so
>> that I
>> >     > can use previously gathered data for my experiments.
>> >     >
>> >     > Unfortunately, when I want to maintain input rates greater than 16
>> >     > thousands tuples per second, I see that my scheme is not fast
>> enough,
>> >     > and the input rate is capped. Do you think that there is a better
>> >     way to
>> >     > send (replay) previously gathered data in my topology?
>> >     >
>> >     > Thanks,
>> >     > Nick
>> >
>> >
>> >
>> >
>> > --
>> > Nikolaos Romanos Katsipoulakis,
>> > University of Pittsburgh, PhD candidate
>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>



-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD student

Reply via email to