unsubscribe

2019-03-26 Thread Ankur Garg
On Tue, Mar 26, 2019, 09:22 Romeo Nocon  wrote:

> unsubscribe
>


Re: Introduce lag into a topology spout to process the events with a delay.

2017-02-23 Thread Ankur Garg
Thanks Sandeep for explaining the context .I believe Storm does not provide
you any solution for this out of the box (unless u use Trident and process
events in a batch) . I am not much aware of Kafka Spout so cant say. So I
am going to suggest some things which can be done (again may be assuming
things )

It appears (combining all the information) that stream B can be real time
with respect to A or delayed up to 24 Hrs . While I do not know the SLA or
the complete use case , adding a delay or sleep will in average be fine (as
Stream B is already delayed ..should it matter if it gets delayed by a
minute or 2 more ? ) .


*Alternatively , since you are using Normal Spout  , I assume each event is
rather independent to another (as you r processing one event at a time) . *
*Assuming that the only bottleneck here  is ur write speed to key/value
store is not very fast (you have the data for enriching from Stream A ,
just that you cannot write it in time for it to be available for data from
Stream B ) , you may write the event which could not be enriched  back to
the same/different topic instead of sending to the deferred queue . Chances
of the data available in the key/value store can be more .The events which
are enriched can be emitted to different bolt. This doesnt work if the data
which can enrich is not present in Stream A.*

Assuming the above solution is not good , do you know beforehand which
stream is delayed by how much . To elaborate , based on the delay u can
read them in different spouts from different topics in Kafka or process
them in separate Bolts and based on the delay u can introduce sleep/delay
in each bolts .

Though with the info you have provided , I would rather check after each
loop (after enriching B with A  and before writing to deferred queue) the %
of enriching (if it can be computed) and retry enriching till the desired
%age is computed . The rest can be pushed to deferred queue .

Thanks
Ankur

On Thu, 23 Feb 2017 at 14:04 Sandeep Samudrala <sandys...@gmail.com> wrote:

Running both streams in same topology doesn't work for me as the stream B
events can come very late up to 24 hours.

Sleep doesn't work as it will slow down the topology, I want the topology
to be running as fast as possible but only with delay so as to ensure
enrichment in the first trial itself.

To give more context. Th events coming from Stream B can come as late as up
to 24 hours and hence I keep the events in a key-value store. I am using
Normal Spout and not Trident as I will be handling event by event(record by
record). I don't think blocking at a tuple level is a good idea as it will
slow down the processing of events.

For now, I am working with a hack to check for the current backlog with the
message header in the kafka event to look for events to be processed with a
delay. Although it works to some extent I am still not able to get it fully
working.

Please let me know If I can add more context.

On Wed, Feb 22, 2017 at 3:23 PM, Ankur Garg <ankurga...@gmail.com> wrote:

Are u processing the events in Storm topology in batch (Trident Spout) or
Normal Spout .

The way I see (this is very trivial and am sure you would have thought
about it)  is if u can introduce sleep in the nextTuple method for Stream B
(in case of Normal Spout) or increasing the value *topology.max.spout.pending
in case of Trident can help you achieve better %age . You can also think of
making nextTuple blocking (although not recommended in general as
everything runs in a single thread so ur ack/fail/emit can get delayed but
I believe it can be fine in your case). *
*Alternatively , since almost both the streams are real time , u could read
from both streams in the same spout and then do enriching instead of
writing the stream A into some key value store and then perform enriching .*

*Obviously , I am making lot of assumptions here since they are not
mentioned in the question and I am not aware of full context of the problem
too . *

*Hope this helps*
*Ankur*

On Wed, 22 Feb 2017 at 11:22 Sandeep Samudrala <sandys...@gmail.com> wrote:

Yes. I am reading both the streams from kafka as part of a topology.

On Wed, Feb 22, 2017 at 12:39 AM, Ankur Garg <ankurga...@gmail.com> wrote:

Hi Sandeep ,

One question :- how are you reading Streams B and A . Are u reading from
some messaging queue (Kafka , Rabbit Mq etc.) with some spout (as part of
some topology) reading from them . Please confirm .

Thanks
Ankur

On Tue, 21 Feb 2017 at 15:28 Sandeep Samudrala <sandys...@gmail.com> wrote:

Hello,
 I have two streams A and B. I need to enrich events coming from stream B
with events coming from A and I store events coming from A in a key-value
store to enrich events from B. Events that doesn't get enriched are sent to
a deferred queue(kafka stream) and are read back later.

Most of the the time the events from Stream B are sent to defer queue
because of bit delay in storing the events into a key-value store f

Re: Introduce lag into a topology spout to process the events with a delay.

2017-02-22 Thread Ankur Garg
Are u processing the events in Storm topology in batch (Trident Spout) or
Normal Spout .

The way I see (this is very trivial and am sure you would have thought
about it)  is if u can introduce sleep in the nextTuple method for Stream B
(in case of Normal Spout) or increasing the value *topology.max.spout.pending
in case of Trident can help you achieve better %age . You can also think of
making nextTuple blocking (although not recommended in general as
everything runs in a single thread so ur ack/fail/emit can get delayed but
I believe it can be fine in your case). *
*Alternatively , since almost both the streams are real time , u could read
from both streams in the same spout and then do enriching instead of
writing the stream A into some key value store and then perform enriching .*

*Obviously , I am making lot of assumptions here since they are not
mentioned in the question and I am not aware of full context of the problem
too . *

*Hope this helps*
*Ankur*

On Wed, 22 Feb 2017 at 11:22 Sandeep Samudrala <sandys...@gmail.com> wrote:

> Yes. I am reading both the streams from kafka as part of a topology.
>
> On Wed, Feb 22, 2017 at 12:39 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>
> Hi Sandeep ,
>
> One question :- how are you reading Streams B and A . Are u reading from
> some messaging queue (Kafka , Rabbit Mq etc.) with some spout (as part of
> some topology) reading from them . Please confirm .
>
> Thanks
> Ankur
>
> On Tue, 21 Feb 2017 at 15:28 Sandeep Samudrala <sandys...@gmail.com>
> wrote:
>
> Hello,
>  I have two streams A and B. I need to enrich events coming from stream B
> with events coming from A and I store events coming from A in a key-value
> store to enrich events from B. Events that doesn't get enriched are sent to
> a deferred queue(kafka stream) and are read back later.
>
> Most of the the time the events from Stream B are sent to defer queue
> because of bit delay in storing the events into a key-value store from
> Stream A and events coming into A and B are almost real time.
>
> I want to introduce a delay into reading into my spout reading from Stream
> B so as to make sure higher % of events get enriched in first shot rather
> than getting enriched post reading from defer queue. I tried putting a
> check on the lag and controlling on the backlog queue to get a hold but
> didn't seemed right and would enter into draining and other issues.
>
> Is there a way in the kafka consumer or Storm spout to control the data in
> flow to come with delay for processing?
>
> Thanks,
> -sandeep.
>
>
>


Re: Introduce lag into a topology spout to process the events with a delay.

2017-02-21 Thread Ankur Garg
Hi Sandeep ,

One question :- how are you reading Streams B and A . Are u reading from
some messaging queue (Kafka , Rabbit Mq etc.) with some spout (as part of
some topology) reading from them . Please confirm .

Thanks
Ankur

On Tue, 21 Feb 2017 at 15:28 Sandeep Samudrala  wrote:

> Hello,
>  I have two streams A and B. I need to enrich events coming from stream B
> with events coming from A and I store events coming from A in a key-value
> store to enrich events from B. Events that doesn't get enriched are sent to
> a deferred queue(kafka stream) and are read back later.
>
> Most of the the time the events from Stream B are sent to defer queue
> because of bit delay in storing the events into a key-value store from
> Stream A and events coming into A and B are almost real time.
>
> I want to introduce a delay into reading into my spout reading from Stream
> B so as to make sure higher % of events get enriched in first shot rather
> than getting enriched post reading from defer queue. I tried putting a
> check on the lag and controlling on the backlog queue to get a hold but
> didn't seemed right and would enter into draining and other issues.
>
> Is there a way in the kafka consumer or Storm spout to control the data in
> flow to come with delay for processing?
>
> Thanks,
> -sandeep.
>


Re: Storm-related interview on Linux.com

2016-10-13 Thread Ankur Garg
Thanks Julien for sharing this  and all the best for the conference :).

On Wed, 12 Oct 2016 at 22:24 Julien Nioche 
wrote:

> Hi,
>
> FYI, Linux.com have just published an interview [1] I gave on StormCrawler
>  in advance of the presentation I'll be giving
> at ApacheCon [2] in November.
>
> Anyone going to ApacheCon in the Storm community, feel free to find me
> there and have a chat.
>
> Have a nice day
>
> Julien
>
> [1]
> https://www.linux.com/news/stormcrawler-open-source-sdk-building-web-crawlers-apachestorm
>
> [2]
> https://apachebigdataeu2016.sched.org/event/8Tzb/low-latency-web-crawling-on-apache-storm
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble 
>


Apache Storm integration with Spring

2016-05-25 Thread Ankur Garg
Hi All ,

Sometime back I was working with Apache Storm in one of our  Projects .

Basically , the need was to run a topology over a Storm Cluster which
consumes data from Rabbit Mq and does some processing over it .

Processing also involved ingesting the feed from Rabbit Mq to some
relational and Non Relation Database  among others .

Traditionally , we heavily use Spring and other Frameworks by Pivotal in
our projects . For Example :-  To make connections and ingest Data into
mySql and Mongo we used Spring Data Jpa .

Similarly to read feeds from RabbitMq ,we used Spring AMQP framework which
internally used RabbitMq Java Client .

Considering  the above , we thought of using Spring  with Storm  to
accomplish all the above things .

Unfortunately at the moment , there is no real integration of Storm and
Spring documented anywhere.

There are some examples present in github and else where (for ex
https://github.com/granthenke/storm-spring-sample) but all of them use
Spring to Create and Inject Topology Definitions where as our ask was to
create a Spring Context which holds all information about database
Connections , broker Connections and some bean injections) and is available
throughout the lifecycle of a Topology .

Finally , I found a way to create context and do integration with Spring
Framework .

I have started a short project (
https://github.com/ankurgarg1986/Spring-Storm) to pen down our integration
so that it can be useful to other developers who wish to use Apache Storm
with Spring.

Please share your thoughts and use-cases (if any) so that I can use them to
drive this short project .

Thanks
Ankur


Re: Multiple output streams: all go through the same connection?

2016-01-06 Thread Ankur Garg
Hi John ,

I am not aware of the storm internal working so I am not sure about this .
But since no one has answered this so far , I will try my hand .

I believe Spouts and Bolts communicate through an intermediate queue (ZMQ)
. Assuming this it should always be a single connection to ZMQ .

But Like I said , this is what I think should be the case . Waiting for any
of the member to validate/invalidate  this.

Thanks
Ankur

On Wed, Jan 6, 2016 at 7:47 AM, John Yost  wrote:

> Hi Everyone,
>
> Had a co-worker ask me today if, I define two output streams for a spout
> or bolt, do they both go out over the same connection to a downstream bolt,
> or are there separate connections for each output stream.
>
> Anyone know?
>
> Thanks
>
> --John
>


Re: Sending Some Context when Failing a tuple.

2016-01-04 Thread Ankur Garg
My Bad Ravi that I could not explain my point properly .

Like you said in the fail method  when u call outputCollector.fail(tuple) ,
ISpout.fail(Object messageId) gets invoked.

May be u r passing String as this messageId but the argument itself says
Object .

So what I meant to say that you can pass your same object as MessageId
(perhaps name it better for understanding) .

To Elaborate , Lets say I pass along simple Java Bean from Spout to Bolt .

Here is my sample Bean

public class Bean implements Serializable {

private String a; // your custom attributes which are serializable .

private int b;  your custom attributes which are serializable .

private int c; // your custom attributes which are serializable .

private String msgId; //your custom attributes which are serializable .

   private String failureReason;   // Failure Reason ..to be populated
inside bolt when tuple fails


//getter setter Methods

}

In your Spout inside nextTuple . Taking example from word count of Storm
Starter project


public void nextTuple() {

try{

final String[] words = new String[] { "nathan", "mike", "jackson",

"golda", "bertels" };

final Random rand = new Random();

final String word = words[rand.nextInt(words.length)];

String msgId = word +  UUID.randomUUID().toString();

Bean b = new Bean();

b.setA("String A");

b.setB(123);

b.setC(456);

b.setMsgId(msgId); // not necessary to do

* _collector.emit(new Values(word,b) , b);*

LOG.info("Exit nextTuple Method ");

}

catch(Exception ex)

{

  ex.printStackTrace();

}

LOG.info("Final Exit nextTuple method ");

}


See the _collector.emit . I am passing the same bean object as MessageId
Object .

*And declareOutputFields method as *

public void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields("word" , "correlationObject"));

}


Now in my Bolt

In the execute Method

@Override

public void execute(Tuple tuple) {

  Bean b1 = (Bean)tuple.getValue(1);

 * b1.setFailureReason("It failed ");*

_collector.fail(tuple);

}


Now , when _collector.fail method is called Spout's fail method gets
invoked

public void fail(Object msgId) {

 Bean b1 = (Bean) msgId;

 String failureReason = b1.getFailureReason();

}


*You will see the failureReason u set inside ur bolt received here inside
fail method . *

*Again , I am not saying that this is the best way to achieve what u want ,
but just proposing a way it can be done.*

Hope this helps.


Thanks

Ankur


On Mon, Jan 4, 2016 at 2:24 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> Hi Ankur,
> Various Storm API for this are like this
>
> Bolts recieve Tuple, which is immutable object.
> Once something fails in Bolt we call outputCollector.fail(tuple)
>
> which in turn invoke Spout's ISpout.fail(Object messageId) method.
>
>
>
> now spout gets only the messageId back(which i created when processing
> started from that spout), so not sure how some other info either some
> object or boolean or integer or String can be passed from Bolts to Spout
> after failing.
>
> Thanks
> Ravi
>
>
>
>
>
>
>
>
>
> On Sun, Jan 3, 2016 at 7:00 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi Ravi,
>>
>> May be a very naive answer here but still posting it  :
>>
>> I am assuming that once u catch the failed Tuple inside the fail method
>> of your spout  all you need to decide is whether this tuple should be
>> replayed or not .
>>
>> I am assuming the object you are sending between spout and bolts are
>> already serializable .  How about adding this extra information fore
>> replaying the tuple to the same Object  (Since you already thinking of
>> storing it in some external storage I am assuming this info too is
>> serializable) . It may be a simple boolean flag too .
>>
>> For ex :
>>
>> Original Tuple u  r sending may be
>>
>> OrigTuple implements Serializable
>> {
>> ObjectA a;
>> ObjectB b;
>> }
>>
>> I am assuming a , b are all serializable or marked transient .
>>
>> Now in case of failure you can attach Object C too which contains failure
>> information or simple boolean Flag which implies to the spout that it needs
>> to be played . For the ones which dont need to be played it takes default
>> value as false .
>>
>>
>> Like I said before , it is a very simple thought  but I could think of
>> this may work based on info u provided and assumptions I made.
>>
>> Thanks
>> Ankur
>>
>>
>>
>> On Sun, Jan 3, 2016 at 3:59 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>>
>>> Hi All,
>>> I would like to sen

Re: Sending Some Context when Failing a tuple.

2016-01-03 Thread Ankur Garg
Hi Ravi,

May be a very naive answer here but still posting it  :

I am assuming that once u catch the failed Tuple inside the fail method of
your spout  all you need to decide is whether this tuple should be replayed
or not .

I am assuming the object you are sending between spout and bolts are
already serializable .  How about adding this extra information fore
replaying the tuple to the same Object  (Since you already thinking of
storing it in some external storage I am assuming this info too is
serializable) . It may be a simple boolean flag too .

For ex :

Original Tuple u  r sending may be

OrigTuple implements Serializable
{
ObjectA a;
ObjectB b;
}

I am assuming a , b are all serializable or marked transient .

Now in case of failure you can attach Object C too which contains failure
information or simple boolean Flag which implies to the spout that it needs
to be played . For the ones which dont need to be played it takes default
value as false .


Like I said before , it is a very simple thought  but I could think of this
may work based on info u provided and assumptions I made.

Thanks
Ankur



On Sun, Jan 3, 2016 at 3:59 PM, Ravi Sharma  wrote:

> Hi All,
> I would like to send some extra information back to spout when a tuple is
> failed in some Bolt, so that Spout can decide if it want it to replay or
> just put the message into queue outside storm for admins to view.
>
> So is there any way i can attach some more information when sending back
> failed tuple to spout.?
>
> One way i can think of is keeping such information outside storm in some
> datastore, with Tuple id and spout can lookup that, but looking for some
> way to do it via storm without bringing in other integration/datastore.
>
>
> Thanks
> Ravi.
>


Re: Log Location for Storm UI

2015-10-24 Thread Ankur Garg
Please ignore my earlier mail .. Perhaps I just pasted the message without
reading it fully ..Changing the  *instance-1.c.apache-storm.internal*  to
my hostname did the trick :P ..I can now see those log messages ..

On Sat, Oct 24, 2015 at 7:34 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> Thanks Satish for replying .
>
> My logviewer is running but still when i click on the port on the storm ui
> it returns  "The server at *instance-1.c.apache-storm.internal* can't be
> found, because the DNS lookup failed" .
>
> Also , can u  or anyone here tell me why the exceptions which can be seen
> in storm ui page cant be seen inside worker log files :( . Does Storm UI
> reads these exceptions from a different source than worker.*.log file .
>
> Thanks
> Ankur
>
> On Sat, Oct 24, 2015 at 7:12 PM, Satish Mittal <satish.mit...@inmobi.com>
> wrote:
>
>> You probably need to set up logviewer daemon.
>>
>> On Sat, Oct 24, 2015 at 5:28 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I have set up a single node Storm Cluster to run my topologies .
>>> Unfortunately , due to reason unknown I am not seeing logs inside
>>> worker-*.log  when I start a spring application inside the Storm Spout and
>>> bolts .
>>>
>>> However , in the Storm UI I do see any exceptions occuring inside my
>>> spouts and bolts and log messages  present for the same in the UI (image
>>> attached below) . Moreover,  If i click on the error port below (6703) , I
>>> get page not found error .
>>>
>>> Can someone tell me where does StormUI print or gets its log from . I
>>> believe
>>>
>>> this should be the page where rest of my missing logs can be present .
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> Please help .
>>>
>>> Thanks
>>> Ankur
>>>
>>
>>
>> _
>> The information contained in this communication is intended solely for
>> the use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify
>> us immediately by responding to this email and then delete it from your
>> system. The firm is neither liable for the proper and complete transmission
>> of the information contained in this communication nor for any delay in its
>> receipt.
>
>
>


Re: Log Location for Storm UI

2015-10-24 Thread Ankur Garg
Thanks Satish for replying .

My logviewer is running but still when i click on the port on the storm ui
it returns  "The server at *instance-1.c.apache-storm.internal* can't be
found, because the DNS lookup failed" .

Also , can u  or anyone here tell me why the exceptions which can be seen
in storm ui page cant be seen inside worker log files :( . Does Storm UI
reads these exceptions from a different source than worker.*.log file .

Thanks
Ankur

On Sat, Oct 24, 2015 at 7:12 PM, Satish Mittal <satish.mit...@inmobi.com>
wrote:

> You probably need to set up logviewer daemon.
>
> On Sat, Oct 24, 2015 at 5:28 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi ,
>>
>> I have set up a single node Storm Cluster to run my topologies .
>> Unfortunately , due to reason unknown I am not seeing logs inside
>> worker-*.log  when I start a spring application inside the Storm Spout and
>> bolts .
>>
>> However , in the Storm UI I do see any exceptions occuring inside my
>> spouts and bolts and log messages  present for the same in the UI (image
>> attached below) . Moreover,  If i click on the error port below (6703) , I
>> get page not found error .
>>
>> Can someone tell me where does StormUI print or gets its log from . I
>> believe
>>
>> this should be the page where rest of my missing logs can be present .
>>
>> [image: Inline image 1]
>>
>>
>> Please help .
>>
>> Thanks
>> Ankur
>>
>
>
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.


Re: binding to port 0.0.0.0/0.0.0.0:2181 !!

2015-10-24 Thread Ankur Garg
kill -9 

Ex :

 sudo netstat -netulp | grep 2181

tcp0  0 0.0.0.0:*2181*0.0.0.0:*
LISTEN  0  919692  17029/java


then kill -9 17029




On Sat, Oct 24, 2015 at 8:04 PM, researcher cs 
wrote:

> Thanks for replying , but the last command kill -9 what ?
>
> On Sat, Oct 24, 2015 at 4:08 PM, Youzha  wrote:
>
>> Maybe u can try this command.
>>
>> Netstat -netulp | grep 2181
>>
>> Then check the pid and kill with kill -9 pid.
>> Cmiiw
>> On Oct 24, 2015 8:53 PM, "researcher cs" 
>> wrote:
>>
>>> I'm new to zookeeper and tried to start server by using
>>>
>>> ./zkServer.sh start
>>>
>>> but got this
>>>
>>> binding to port 0.0.0.0/0.0.0.0:2181 2015-10-23 22:08:09,952 - FATAL
>>> [main:ZooKeeperServerMain@62] - Unexpected exception, exiting abnormally
>>> java.net.BindException: Address already in use
>>>
>>> is there any port which i can use it ?
>>>
>>> or is there any solution to solve this problem ?
>>>
>>> i tried to kill process in the port 2181 by using this
>>>
>>> sudo fuser -k 2181/tcp
>>>
>>> but problem is still
>>>
>>> my zoo.cfg is
>>>
>>> tickTime=2000
>>> initLimit=10
>>> syncLimit=5
>>> dataDir=/var/zookeeper
>>> clientPort=2181
>>>
>>>
>


Application Hangs inside Spout and Bolts

2015-10-20 Thread Ankur Garg
Hi ,

In my spout I am starting a spring boot application which basically
initialises all the beans and classes .

Whenever I start the topology in remote cluster , the moment this spring
application runs inside the open method of my Spout , I see that the
topology hangs and I see no errors  and no output in my worker logs from my
application . After sometime the same messages appear in my log file
(albeit with new timestamp) . What may be causing this issue in the cluster
.

On Local Cluster though which i run inside my eclipse , everything works
fine .

Also , Where can I see log messages from my application . I can only see
log messages from Spouts and Bolts inside worker logs . But what about the
other logs which my application prints . Is their something which I need to
configure from my end ??

This is what I am doing inside open method for the Spout =>

LOG.info("Inside Synchrnized block for Storm Context");

  ApplicationContext context = new
AnnotationConfigWebApplicationContext();

  LOG.info("Inside Create Application Context block for Storm
Context");

  SpringApplicationBuilder appBuilder = new
SpringApplicationBuilder(Application.class);

  LOG.info("Loading Profiles ");

  context = (ApplicationContext) appBuilder.profiles("common",
"common_rabbitmq","common_mongo_db", "common_mysql_db",


"common_topology").run();

  LOG.info("Set the Storm Context ");


In my worker logs I can only see logs  generated as

LOG.info("Inside Synchrnized block for Storm Context");
 LOG.info("Loading Profiles ");

And the application hangs after printing this . It never returns from the
run method and never printsLOG.info("Set the Storm Context ");

PS=> The run method here appBuilder.profiles("common","common_rabbitmq",
"common_mongo_db", "common_mysql_db",


"common_topology").run();  is not to run it on a different thread (It is
not launching it in different thread) rather it does work on the same
thread Spout is on.


Any help is appreciated and required :)


Thanks

Ankur


Not able to see my application Logs in workerLogs

2015-10-20 Thread Ankur Garg
Hi ,

I have deployed my topology in remote cluster .

Inside the open and prepare method for my spouts and bolts , I launch a
separate application (Spring Application) which  runs on a different port
(just like zookeeper etc) and I get the instance of this application for
use in my Spouts and bolts .

However , once this application gets launched , I no longer see any of my
logs in worker logs .

So  , can we launch a seperate application from spouts and bolts ?

Also , if we can do the above , how to get the logs printed in my Worker
Logs .

Thanks
Ankur


Re: Not able to see my application Logs in workerLogs

2015-10-20 Thread Ankur Garg
Any idea ppl .

Even though application is running and my spouts and bolts are functioning
, worker logs are stuck and nothing is getting printed there .



On Tue, Oct 20, 2015 at 8:42 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> Hi ,
>
> I have deployed my topology in remote cluster .
>
> Inside the open and prepare method for my spouts and bolts , I launch a
> separate application (Spring Application) which  runs on a different port
> (just like zookeeper etc) and I get the instance of this application for
> use in my Spouts and bolts .
>
> However , once this application gets launched , I no longer see any of my
> logs in worker logs .
>
> So  , can we launch a seperate application from spouts and bolts ?
>
> Also , if we can do the above , how to get the logs printed in my Worker
> Logs .
>
> Thanks
> Ankur
>


Re: Not able to see my application Logs in workerLogs

2015-10-20 Thread Ankur Garg
Thanks Javier for suggestion . Let me try this.

Thanks
Ankur

On Wed, Oct 21, 2015 at 5:41 AM, Javier Gonzalez <jagon...@gmail.com> wrote:

> Configure your cluster.xml to debug level for your own packages, set storm
> debug to true, and retry.
> On Oct 20, 2015 3:37 PM, "Ankur Garg" <ankurga...@gmail.com> wrote:
>
>> Any idea ppl .
>>
>> Even though application is running and my spouts and bolts are
>> functioning , worker logs are stuck and nothing is getting printed there .
>>
>>
>>
>> On Tue, Oct 20, 2015 at 8:42 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I have deployed my topology in remote cluster .
>>>
>>> Inside the open and prepare method for my spouts and bolts , I launch a
>>> separate application (Spring Application) which  runs on a different port
>>> (just like zookeeper etc) and I get the instance of this application for
>>> use in my Spouts and bolts .
>>>
>>> However , once this application gets launched , I no longer see any of
>>> my logs in worker logs .
>>>
>>> So  , can we launch a seperate application from spouts and bolts ?
>>>
>>> Also , if we can do the above , how to get the logs printed in my Worker
>>> Logs .
>>>
>>> Thanks
>>> Ankur
>>>
>>
>>


Re: Does Storm work with Spring

2015-10-19 Thread Ankur Garg
Hi Ravi ,

Need your help . So I created a local cluster and deployed my topology to
it . Inside my Spout and Bolts , I am launching a Spring Boot application
wrapped inside a singleton to initialise my context . Unfortunately , it
appears to me that it is not working :  and annotations like
@EnableAutoConfiguration is not picking up yml files from the classpath and
injecting their values in the bean. And I am getting exceptions like

Error creating bean with name 'inputQueueManager': Injection of autowired
dependencies failed; nested exception is
org.springframework.beans.factory.BeanCreationException: Could not autowire
field: private int
mqclient.rabbitmq.manager.impl.InputQueueManagerImpl.rabbitMqPort; nested
exception is org.springframework.beans.TypeMismatchException: Failed to
convert value of type 'java.lang.String' to required type 'int'; nested
exception is java.lang.NumberFormatException: For input string:
"${input.rabbitmq.port}" at

has anyone here ever tried injecting dependencies from Spring . I am not
sure why this is not working .

It works like a charm in Local Cluster and now I am not passing context as
a constructor argument , rather declaring and initializing it inside each
spout and bolts :( .

Is there any reason why Spring Annotations dont work inside a Remote
Cluster .

Need help urgently here .

Thanks
Ankur

On Sun, Oct 11, 2015 at 1:01 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> I think I don't  need to Autowire beans inside my spout and bolts .
>
> All I want my context to be available . Since I use Spring Boot , I am
> delegating it to initialise all the beans and set up every bean (reading
> yml file and create DB connections , connections to Message brokers etc ) .
>
> On my local cluster I am passing it as a constructor argument to Spouts
> and Bolts . Since all r running in same jvm its available to all spouts and
> bolts .
>
> But in a distributed cluster , this will blow up as Context is not
> serializable and cannot be passed like above .
>
> So the problem is only to make this context available once per jvm . Hence
> I thought I will wrap it under a singleton and make this available to all
> spouts and bolts per jvm.
>
> Once I have this context initialized and loaded all I need to do is to get
> the bean which I will do the same way I am doing inside local cluster
> spouts and bolts .
>
>
>
>
>
> On Sun, Oct 11, 2015 at 12:46 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>> Yes ur assumption is right
>> Jvm1 will create application contexts say ac1
>>
>> And jvm2 will create another application instance ac2
>>
>> And all of it can be done via singleton classes.
>>
>> All bolts and spouts in same jvm instance need to access same application
>> context.
>>
>> I have done same in cluster and it works
>>
>> Remember all spring beans need to be transient and also u need to set
>> required=false in case u r going create spout and bolt using spring
>>
>> Public class mybolt  {
>> @aurowired(required=false)
>> Private transient MyServiceBean myServiceBean;
>>
>> 
>> ...
>> }
>>
>> Ravi
>> On 11 Oct 2015 07:59, "Ankur Garg" <ankurga...@gmail.com> wrote:
>>
>>> Also , I think there can be some instances of spouts/bolts running on
>>> JVM 1 and some on JVM 2 and so on...
>>>
>>> Is it possible for spouts and bolts running on same jvm to access same
>>> applicationContext .
>>>
>>> I am thinking that I can make the place where I  launch my spring Boot
>>> application  inside a singleton class , and so all the spouts and bolts
>>> running on say JVM1 will have access to same context  (instead of launching
>>> it in all spouts and bolts) . And for those in JVM 2 they will still
>>> initialise it once and all the rest will get the same application Context .
>>>
>>> But all above is theoretical assumption  . I still need to try it out
>>>  (unfortunately i dont have a cluster setup at my end) but if possible
>>> please let me know if this can work .
>>>
>>> Thanks
>>> Ankur
>>>
>>> On Sun, Oct 11, 2015 at 11:48 AM, Ankur Garg <ankurga...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for replying Ravi .
>>>>
>>>> I think your suggestion to make wrapper to read json or xml is a very
>>>> nice Idea indeed .
>>>>
>>>> But , the problem for me here is to have the context (with all beans
>>>> loaded and initialized ) available inside the Spouts and Bolts and that
>>>> means inside every running instance of Spouts and 

Re: Does Storm work with Spring

2015-10-19 Thread Ankur Garg
Actually its not only yml , infact none of the dependencies are getting
injected . Perhaps , it appears to me that it is not able to read Spring
Annotations .

Btw , do u know how to debug my application deployed on remote cluster
remotely from Eclipse.

Thanks
Ankur

On Mon, Oct 19, 2015 at 8:22 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> you may have to tell Spring that ur .yaml file is ur resource file.
>
> Ravi.
>
> On Mon, Oct 19, 2015 at 3:25 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi Ravi ,
>>
>> Need your help . So I created a local cluster and deployed my topology to
>> it . Inside my Spout and Bolts , I am launching a Spring Boot application
>> wrapped inside a singleton to initialise my context . Unfortunately , it
>> appears to me that it is not working :  and annotations like
>> @EnableAutoConfiguration is not picking up yml files from the classpath and
>> injecting their values in the bean. And I am getting exceptions like
>>
>> Error creating bean with name 'inputQueueManager': Injection of autowired
>> dependencies failed; nested exception is
>> org.springframework.beans.factory.BeanCreationException: Could not autowire
>> field: private int
>> mqclient.rabbitmq.manager.impl.InputQueueManagerImpl.rabbitMqPort; nested
>> exception is org.springframework.beans.TypeMismatchException: Failed to
>> convert value of type 'java.lang.String' to required type 'int'; nested
>> exception is java.lang.NumberFormatException: For input string:
>> "${input.rabbitmq.port}" at
>>
>> has anyone here ever tried injecting dependencies from Spring . I am not
>> sure why this is not working .
>>
>> It works like a charm in Local Cluster and now I am not passing context
>> as a constructor argument , rather declaring and initializing it inside
>> each spout and bolts :( .
>>
>> Is there any reason why Spring Annotations dont work inside a Remote
>> Cluster .
>>
>> Need help urgently here .
>>
>> Thanks
>> Ankur
>>
>> On Sun, Oct 11, 2015 at 1:01 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>>
>>> I think I don't  need to Autowire beans inside my spout and bolts .
>>>
>>> All I want my context to be available . Since I use Spring Boot , I am
>>> delegating it to initialise all the beans and set up every bean (reading
>>> yml file and create DB connections , connections to Message brokers etc ) .
>>>
>>> On my local cluster I am passing it as a constructor argument to Spouts
>>> and Bolts . Since all r running in same jvm its available to all spouts and
>>> bolts .
>>>
>>> But in a distributed cluster , this will blow up as Context is not
>>> serializable and cannot be passed like above .
>>>
>>> So the problem is only to make this context available once per jvm .
>>> Hence I thought I will wrap it under a singleton and make this available to
>>> all spouts and bolts per jvm.
>>>
>>> Once I have this context initialized and loaded all I need to do is to
>>> get the bean which I will do the same way I am doing inside local cluster
>>> spouts and bolts .
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Oct 11, 2015 at 12:46 PM, Ravi Sharma <ping2r...@gmail.com>
>>> wrote:
>>>
>>>> Yes ur assumption is right
>>>> Jvm1 will create application contexts say ac1
>>>>
>>>> And jvm2 will create another application instance ac2
>>>>
>>>> And all of it can be done via singleton classes.
>>>>
>>>> All bolts and spouts in same jvm instance need to access same
>>>> application context.
>>>>
>>>> I have done same in cluster and it works
>>>>
>>>> Remember all spring beans need to be transient and also u need to set
>>>> required=false in case u r going create spout and bolt using spring
>>>>
>>>> Public class mybolt  {
>>>> @aurowired(required=false)
>>>> Private transient MyServiceBean myServiceBean;
>>>>
>>>> 
>>>> ...
>>>> }
>>>>
>>>> Ravi
>>>> On 11 Oct 2015 07:59, "Ankur Garg" <ankurga...@gmail.com> wrote:
>>>>
>>>>> Also , I think there can be some instances of spouts/bolts running on
>>>>> JVM 1 and some on JVM 2 and so on...
>>>>>
>>>>> Is it possible for spouts and bolts running on same jvm to access same
>>

Shutting and Starting Storm Cluster

2015-10-16 Thread Ankur Garg
Hi ,

I have a single node storm cluster set up . Currently to start storm nimbus
and supervisors daemons , I use storm nimbus and storm supervisor commands.


To Stop it , currently I am doing kill -9 to kill those processes manually
.

Is there something I can use to restart cluster with one single command.

Thanks
Ankur


Re: Topology not working + Found multiple defaults.yaml resources when deploying to remote cluster

2015-10-15 Thread Ankur Garg
Hi John ,

Thanks a lot for your reply . You pointed me in the right direction .

Here is a link that explains this
http://kennethjorgensen.com/blog/2014/fat-jars-with-excluded-dependencies-in-gradle
.

May be we shud add it to storm documentation somewhere if its not there so
that ppl dont end up wasting time .

Also , if possible we should fix this issue .

Thanks
Ankur

On Thu, Oct 15, 2015 at 3:47 AM, John Reilly <j...@inconspicuous.org> wrote:

>
> I haven't tried this in gradle, but you basically need to do one of 2
> things.
>
> * Mark storm-core as provided
> or
> * Exclude storm-core.jar from the fat jar packaging.
>
> IIRC, you can change your storm dependency to
> provided group: 'org.apache.storm' ,
> name: 'storm-core',version: stormVersion
>
> I've tried doing this in my project (in sbt) and ran into the issue that
> when provided is used, the project appears to be broken in the IDE since
> the provided jar is not available in the IDE.  Also, iirc if you try to run
> local cluster in the same project that will fail.  Usually this is not a
> problem since it will usually be for a test where we can just specify
> storm-core as a test dependency.
>
> I ended up going with the second option above and as part of the
> packaging, I omit the storm-core jar from the fat jar.  I'm not sure how
> you would do that in gradle though.
>
>
>
>
> On Wed, Oct 14, 2015 at 3:00 PM Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi ,
>>
>> I have a single node remote cluster set up (all nimbus , supervisor ,
>> zookeeper) running on the same machine  . I deployed my Topology (simple
>> Exclamation Topology) to this remote cluster . While topology and jar got
>> submitted successfully , nothing is happening in the cluster .
>>
>> When I checked the supervisor logs , I could see this :
>>
>> 2015-10-14T21:24:26.340+ b.s.d.supervisor [INFO]
>> 42dd0337-1182-45b0-9385-14570c7e0b09 still hasn't started
>>
>> Worker log files are empty .
>>
>> On debugging a little in supervisor logs , I could see
>>
>> Launching worker with command: (Some java command) ..Firing this java
>> command I can see this error :
>>
>> Caused by: java.lang.RuntimeException: Found multiple defaults.yaml
>> resources. You're probably bundling the Storm jars with your topology jar.
>>
>> I debugged more on internet and other stuff , and modified my
>> build.gradle file too but still the same error whenever I deploy my
>> topology .
>>
>> This is my gradle file
>>
>>
>> dependencies {
>>
>> compile group: 'org.springframework.boot', name:
>> 'spring-boot-starter-actuator', version: springBootVersion
>>
>> compile group: 'org.quartz-scheduler', name: 'quartz', version:
>> quartzVersion
>>
>> compile group: 'clj-stacktrace' , name: 'clj-stacktrace',version:
>> cljStackTrace
>>
>> compile group: 'org.apache.storm' , name: 'storm-core',version:
>> stormVersion
>>
>> ext {
>>
>>   fatJarExclude = true
>>
>>   }
>>
>> }
>>
>>
>> task uberjar(type: Jar) {
>>
>> from files(sourceSets.main.output.classesDir)
>>
>> from {configurations.compile.collect {zipTree(it)}} {
>>
>> exclude "META-INF/*.SF"
>>
>> exclude "META-INF/*.DSA"
>>
>> exclude "META-INF/*.RSA"
>>
>> exclude "META-INF/LICENSE"
>>
>> }
>>
>>
>> manifest {
>>
>> attributes 'Main-Class': 'storm.topology.ExclamationTopology'
>>
>> }
>>
>> }
>>
>> Nybody has ny idea
>>
>


Re: Building fat jar for my test Topology to be deployed to remote Storm Cluster

2015-10-14 Thread Ankur Garg
SO , I got a workaround to solve this ..

added  exclude "META-INF/LICENSE" to my gradle file after checking the
stack trace for the exception caused.

If nyone has faced similar situation and has a better solution , please let
me know.

Thanks
Ankur

On Wed, Oct 14, 2015 at 5:59 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> Hi ,
>
> I have set up a single node cluster and trying to deploy my sample
> topology to it .
>
> I believe to deploy my topology to cluster I have to submit the jar with
> all dependencies to the Cluster .
>
> For that I created a sample project and added a simple topology to it .
>
> While generating the fat jar using gradle I am seeing this error
>
> gradle fatjar gives below error
>
> Could not expand ZIP
> '/Users/agarg/.gradle/caches/modules-2/files-2.1/org.apache.storm/storm-core/0.9.5/d2bf27db853347dcf66990b4514db20a7897303e/storm-core-0.9.5.jar'.
>
> > Could not copy zip entry
> /Users/agarg/.gradle/caches/modules-2/files-2.1/org.apache.storm/storm-core/0.9.5/d2bf27db853347dcf66990b4514db20a7897303e/storm-core-0.9.5.jar!META-INF/license/LICENSE.base64.txt
> to
> '/Users/agarg/Documents/notificationRepo/sample/build/tmp/expandedArchives/storm-core-0.9.5.jar_366us3312tpl54tci2fld83fij/META-INF/license/LICENSE.base64.txt'.
>
>
> I am attaching my build.gradle with this mail .
>
>
> Can anyone help please.
>


Building fat jar for my test Topology to be deployed to remote Storm Cluster

2015-10-14 Thread Ankur Garg
Hi ,

I have set up a single node cluster and trying to deploy my sample topology
to it .

I believe to deploy my topology to cluster I have to submit the jar with
all dependencies to the Cluster .

For that I created a sample project and added a simple topology to it .

While generating the fat jar using gradle I am seeing this error

gradle fatjar gives below error

Could not expand ZIP
'/Users/agarg/.gradle/caches/modules-2/files-2.1/org.apache.storm/storm-core/0.9.5/d2bf27db853347dcf66990b4514db20a7897303e/storm-core-0.9.5.jar'.

> Could not copy zip entry
/Users/agarg/.gradle/caches/modules-2/files-2.1/org.apache.storm/storm-core/0.9.5/d2bf27db853347dcf66990b4514db20a7897303e/storm-core-0.9.5.jar!META-INF/license/LICENSE.base64.txt
to
'/Users/agarg/Documents/notificationRepo/sample/build/tmp/expandedArchives/storm-core-0.9.5.jar_366us3312tpl54tci2fld83fij/META-INF/license/LICENSE.base64.txt'.


I am attaching my build.gradle with this mail .


Can anyone help please.


build.gradle
Description: Binary data


Setting up cluster on Google Cloud

2015-10-13 Thread Ankur Garg
Hi ,

I am trying to set up my cluster on Google Cloud and following below link :

http://datadventures.ghost.io/2013/12/29/deploying-storm-on-gce/

But it looks like its 2 years old . In the meantime I try it out at my end
, ny of u have idea how to deploy it on GCE and may be have some working
scripts (so that I dont have to do any work :P :P) .

Any help is appreciated .


Thanks
Ankur


Re: Multiple Spouts in Same topology or Topology per spout

2015-10-12 Thread Ankur Garg
Hi Ravi,

Thanks for the reply . I got your point of using different bolts for mysql
and Mongo .

One thing though , is it a good idea to use different topologies within the
same cluster .

The idea behind above rational is if I use the same topology but different
bolts to do processing , I believe failure in any one of the bolts will
cause entire message to be replayed . Though this may not mean any real
problem in any of the database (like multiple inserts wont cause any
problem ) but overall throughput of ur topology will affect .

With different topologies , the idea is to seperate execution to different
set of spouts and bolts . So , assuming that topology which had been given
the responsibility of doing a different task fails , it wont effect the
other topologies .

If my rationale is correct , how does it effect cost wise maintaining
different topologies .Also , for simulating and testing this at my end ,
can I test this on local cluster?

Thanks
Ankur

On Mon, Oct 12, 2015 at 3:22 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> Hi Ankur,
>
> Storm's design is stateless, so storm cant store any info about what bolts
> were successful and which one failed.
> Idea is to replay the message again without affecting the final outcome.
> (means if mysql was success, it shudnt add two rows in case its replayed)
>
> From looking at far i would say you may be fixing an issue which hasnt
> happened yet. Assumption is that one DB will be failing a lot, i guess this
> may not be real case.
> Any of the DB can fail once in a while and replaying them shudnt affect ur
> performance. (say less then 10% Message failed) , you will be planning
> atleast 50% more capacity then ur max load.
>
>
> If you really want it to be very effective, i say use something like redis
> and store your bolt status with message id there, so every time you plan to
> start a bolt proessing check if you have already completed it succesfully,
> if yes then skip it.
> I have defined my own MessageId object and always put a retry count in it.
> So first one goes with 0, and at that moment you can avoid the redis/nosql
> checks.
> But then u r adding one more technology and it just increased the
> complexity.
>
>
> Whatever design you choose, i will still suggest to use two bolts, Monogo
> and mysql both are different cluster(hardware) and technology(software),
> they both will have different throughput and scalability. And as per your
> requirment you dont care if data hasnt reached to one exactly at same time,
> no atomicity (basically its not one transaction), so you dont want to slow
> down one system because other is slower.
>
>
> Last suggestion is to go with two spouts  both will read from same
> topic(not queue), so all messages will be delivered to both Spouts. One
> Spout will send message to Mysql Bolt other will send to Mongo Bolt.
>
>
> Ravi.
>
>
>
>
> Ravi.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 12, 2015 at 10:14 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> LOL .. I was looking for something better :) ..If you see then having
>> multiple bolts here do not help much .. It would have helped had there been
>> a provision to skip the already executed Bolts .
>>
>>
>> I believe this should be there in Storm .
>>
>> Thanks
>> Ankur
>>
>> On Mon, Oct 12, 2015 at 2:42 PM, Susheel Kumar Gadalay <
>> skgada...@gmail.com> wrote:
>>
>>> Check and insert
>>>
>>> On 10/12/15, Ankur Garg <ankurga...@gmail.com> wrote:
>>> > But what if MongoDb bolt has some error , in that case I suppose the
>>> entire
>>> > tuple will be replayed from Spout meaning it will have to redo the
>>> > operation of inserting into sql . Is there a way I can skip inserting
>>> into
>>> > mysql ?
>>> >
>>> > On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay
>>> > <skgada...@gmail.com>
>>> > wrote:
>>> >
>>> >> It is better to have 2 bolts - mysql bolt and mongodb bolt.
>>> >>
>>> >> Let mysql bolt forward the tuple to mongodb bolt, so in case of error
>>> >> it won't  emit.
>>> >>
>>> >> On 10/12/15, Ankur Garg <ankurga...@gmail.com> wrote:
>>> >> > So I have a situation where the tuple received on Spout has to be
>>> saved
>>> >> to
>>> >> > mysql database and mongoDb as well .
>>> >> >
>>> >> > What should be better . Using 1 bolt to save it into mysql and
>>> MongoDb
>>> >> or 2
>>> >&

Re: Multiple Spouts in Same topology or Topology per spout

2015-10-12 Thread Ankur Garg
So I have a situation where the tuple received on Spout has to be saved to
mysql database and mongoDb as well .

What should be better . Using 1 bolt to save it into mysql and MongoDb or 2
seperate Bolts (One for saving into mysql and other for saving into Mongo).

What happens when the exception occurs while saving into mysql ? I believe
I will get acknowledgement inside the fail method in my Spout . So If I
reprocess it using 2 bolts , I believe it will again be sent to Bolt for
saving into Mongo database .

If the above is true , will having 2 seperate bolts be of any advantage ?
how can I configure things so that Failure while inserting into mysql does
not impact inserting into MongoDb .

Thanks
Ankur

On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> That depends if ur spout error has affected jvm or normal application error
>
> performance issue in case of lot of errors, I don't think there is any
> issue be coz of errors themselves but ofcourse if u r retrying these
> messages on failure then that means u will be processing lot of messages
> then normal and overall throughput will go down
>
> Ravi
>
> If ur topology has enabled acknowledgment that means spout will always
> receive
> On 11 Oct 2015 18:15, "Ankur Garg" <ankurga...@gmail.com> wrote:
>
>>
>> Thanks for the reply Abhishek and Ravi .
>>
>> One question though , going with One topology with multiple spouts
>> ...What if something goes wrong in One spout or its associated bolts ..
>> Does it impact other Spout as well?
>>
>> Thanks
>> Ankur
>>
>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <ping2r...@gmail.com>
>> wrote:
>>
>>> No 100% right ansers , u will have to test and see what will fit..
>>>
>>> persoanlly i wud suggest Multiple spouts in one Topology and if you have
>>> N node where topology will be running then each Spout(reading from one
>>> queue) shud run N times in parallel.
>>>
>>> if 2 Queues and say 4 Nodes
>>> then one topolgy
>>> 4 Spouts reading from Queue1 in different nodes
>>> 4 spouts reading from Queue2 in different nodes
>>>
>>> Ravi.
>>>
>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <
>>> abhishek.pr...@gmail.com> wrote:
>>>
>>>> I guess this is a question where there r no really correct answers.
>>>> I'll certainly avoid#1 as it is better to keep logic separate and
>>>> lightweight.
>>>>
>>>> If your downstream bolts are same, then it makes senses to keep them in
>>>> same topology but if they r totally different, I'll keep them in two
>>>> different topologies. That will allow me to independently deploy and scale
>>>> the topology. But if the rest of logic is same I topology scaling and
>>>> resource utilization will be better with one topology.
>>>>
>>>> I hope this helps..
>>>>
>>>> Sent somehow
>>>>
>>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>>>> >
>>>> > Hi ,
>>>> >
>>>> > So I have a situation where I want to read messages from different
>>>> queues hosted in a Rabbitmq Server .
>>>> >
>>>> > Now , there are three ways which I can think to leverage Apache Storm
>>>> here :-
>>>> >
>>>> > 1) Use the same Spout (say Spout A) to read messages from different
>>>> queues and based on the messages received emit it to different Bolts.
>>>> >
>>>> > 2) Use different Spout (Spout A and Spout B and so on) within the
>>>> same topology (say Topology A) to read messages from different queues .
>>>> >
>>>> > 3) Use Different Spouts one within eachTopology (Topology A ,
>>>> Topology B and so on) to read messages from different queues .
>>>> >
>>>> > Which is the best way to process this considering I want high
>>>> throughput (more no of queue messages to be processed concurrently) .
>>>> >
>>>> > Also , If In use same Topology for all Spouts (currently though
>>>> requirement is for 2 spouts)  will failure in one Spout (or its associated
>>>> Bolts) effect the second or will they both continue working separately even
>>>> if some failure is in Spout B ?
>>>> >
>>>> > Cost wise , how much would it be to maintain two different topologies
>>>> .
>>>> >
>>>> > Looking for inputs from members here.
>>>> >
>>>> > Thanks
>>>> > Ankur
>>>> >
>>>> >
>>>>
>>>
>>>
>>


Re: Multiple Spouts in Same topology or Topology per spout

2015-10-12 Thread Ankur Garg
LOL .. I was looking for something better :) ..If you see then having
multiple bolts here do not help much .. It would have helped had there been
a provision to skip the already executed Bolts .


I believe this should be there in Storm .

Thanks
Ankur

On Mon, Oct 12, 2015 at 2:42 PM, Susheel Kumar Gadalay <skgada...@gmail.com>
wrote:

> Check and insert
>
> On 10/12/15, Ankur Garg <ankurga...@gmail.com> wrote:
> > But what if MongoDb bolt has some error , in that case I suppose the
> entire
> > tuple will be replayed from Spout meaning it will have to redo the
> > operation of inserting into sql . Is there a way I can skip inserting
> into
> > mysql ?
> >
> > On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay
> > <skgada...@gmail.com>
> > wrote:
> >
> >> It is better to have 2 bolts - mysql bolt and mongodb bolt.
> >>
> >> Let mysql bolt forward the tuple to mongodb bolt, so in case of error
> >> it won't  emit.
> >>
> >> On 10/12/15, Ankur Garg <ankurga...@gmail.com> wrote:
> >> > So I have a situation where the tuple received on Spout has to be
> saved
> >> to
> >> > mysql database and mongoDb as well .
> >> >
> >> > What should be better . Using 1 bolt to save it into mysql and MongoDb
> >> or 2
> >> > seperate Bolts (One for saving into mysql and other for saving into
> >> Mongo).
> >> >
> >> > What happens when the exception occurs while saving into mysql ? I
> >> believe
> >> > I will get acknowledgement inside the fail method in my Spout . So If
> I
> >> > reprocess it using 2 bolts , I believe it will again be sent to Bolt
> >> > for
> >> > saving into Mongo database .
> >> >
> >> > If the above is true , will having 2 seperate bolts be of any
> advantage
> >> > ?
> >> > how can I configure things so that Failure while inserting into mysql
> >> does
> >> > not impact inserting into MongoDb .
> >> >
> >> > Thanks
> >> > Ankur
> >> >
> >> > On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <ping2r...@gmail.com>
> >> wrote:
> >> >
> >> >> That depends if ur spout error has affected jvm or normal application
> >> >> error
> >> >>
> >> >> performance issue in case of lot of errors, I don't think there is
> any
> >> >> issue be coz of errors themselves but ofcourse if u r retrying these
> >> >> messages on failure then that means u will be processing lot of
> >> >> messages
> >> >> then normal and overall throughput will go down
> >> >>
> >> >> Ravi
> >> >>
> >> >> If ur topology has enabled acknowledgment that means spout will
> always
> >> >> receive
> >> >> On 11 Oct 2015 18:15, "Ankur Garg" <ankurga...@gmail.com> wrote:
> >> >>
> >> >>>
> >> >>> Thanks for the reply Abhishek and Ravi .
> >> >>>
> >> >>> One question though , going with One topology with multiple spouts
> >> >>> ...What if something goes wrong in One spout or its associated bolts
> >> >>> ..
> >> >>> Does it impact other Spout as well?
> >> >>>
> >> >>> Thanks
> >> >>> Ankur
> >> >>>
> >> >>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <ping2r...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> No 100% right ansers , u will have to test and see what will fit..
> >> >>>>
> >> >>>> persoanlly i wud suggest Multiple spouts in one Topology and if you
> >> >>>> have
> >> >>>> N node where topology will be running then each Spout(reading from
> >> >>>> one
> >> >>>> queue) shud run N times in parallel.
> >> >>>>
> >> >>>> if 2 Queues and say 4 Nodes
> >> >>>> then one topolgy
> >> >>>> 4 Spouts reading from Queue1 in different nodes
> >> >>>> 4 spouts reading from Queue2 in different nodes
> >> >>>>
> >> >>>> Ravi.
> >> >>>>
> >> >>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <
> >> >>>> abhishek.pr...@gma

Re: Multiple Spouts in Same topology or Topology per spout

2015-10-12 Thread Ankur Garg
But what if MongoDb bolt has some error , in that case I suppose the entire
tuple will be replayed from Spout meaning it will have to redo the
operation of inserting into sql . Is there a way I can skip inserting into
mysql ?

On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay <skgada...@gmail.com>
wrote:

> It is better to have 2 bolts - mysql bolt and mongodb bolt.
>
> Let mysql bolt forward the tuple to mongodb bolt, so in case of error
> it won't  emit.
>
> On 10/12/15, Ankur Garg <ankurga...@gmail.com> wrote:
> > So I have a situation where the tuple received on Spout has to be saved
> to
> > mysql database and mongoDb as well .
> >
> > What should be better . Using 1 bolt to save it into mysql and MongoDb
> or 2
> > seperate Bolts (One for saving into mysql and other for saving into
> Mongo).
> >
> > What happens when the exception occurs while saving into mysql ? I
> believe
> > I will get acknowledgement inside the fail method in my Spout . So If I
> > reprocess it using 2 bolts , I believe it will again be sent to Bolt for
> > saving into Mongo database .
> >
> > If the above is true , will having 2 seperate bolts be of any advantage ?
> > how can I configure things so that Failure while inserting into mysql
> does
> > not impact inserting into MongoDb .
> >
> > Thanks
> > Ankur
> >
> > On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <ping2r...@gmail.com>
> wrote:
> >
> >> That depends if ur spout error has affected jvm or normal application
> >> error
> >>
> >> performance issue in case of lot of errors, I don't think there is any
> >> issue be coz of errors themselves but ofcourse if u r retrying these
> >> messages on failure then that means u will be processing lot of messages
> >> then normal and overall throughput will go down
> >>
> >> Ravi
> >>
> >> If ur topology has enabled acknowledgment that means spout will always
> >> receive
> >> On 11 Oct 2015 18:15, "Ankur Garg" <ankurga...@gmail.com> wrote:
> >>
> >>>
> >>> Thanks for the reply Abhishek and Ravi .
> >>>
> >>> One question though , going with One topology with multiple spouts
> >>> ...What if something goes wrong in One spout or its associated bolts ..
> >>> Does it impact other Spout as well?
> >>>
> >>> Thanks
> >>> Ankur
> >>>
> >>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <ping2r...@gmail.com>
> >>> wrote:
> >>>
> >>>> No 100% right ansers , u will have to test and see what will fit..
> >>>>
> >>>> persoanlly i wud suggest Multiple spouts in one Topology and if you
> >>>> have
> >>>> N node where topology will be running then each Spout(reading from one
> >>>> queue) shud run N times in parallel.
> >>>>
> >>>> if 2 Queues and say 4 Nodes
> >>>> then one topolgy
> >>>> 4 Spouts reading from Queue1 in different nodes
> >>>> 4 spouts reading from Queue2 in different nodes
> >>>>
> >>>> Ravi.
> >>>>
> >>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <
> >>>> abhishek.pr...@gmail.com> wrote:
> >>>>
> >>>>> I guess this is a question where there r no really correct answers.
> >>>>> I'll certainly avoid#1 as it is better to keep logic separate and
> >>>>> lightweight.
> >>>>>
> >>>>> If your downstream bolts are same, then it makes senses to keep them
> >>>>> in
> >>>>> same topology but if they r totally different, I'll keep them in two
> >>>>> different topologies. That will allow me to independently deploy and
> >>>>> scale
> >>>>> the topology. But if the rest of logic is same I topology scaling and
> >>>>> resource utilization will be better with one topology.
> >>>>>
> >>>>> I hope this helps..
> >>>>>
> >>>>> Sent somehow
> >>>>>
> >>>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <ankurga...@gmail.com>
> >>>>> > wrote:
> >>>>> >
> >>>>> > Hi ,
> >>>>> >
> >>>>> > So I have a situation where I want to read messages from different
> >>>>> queues hosted in

Processing failed message inside Storm

2015-10-12 Thread Ankur Garg
Hi ,

So I wanted to understand how ppl process messages which failed inside
Storm .

Do you store it in memory in some data structure (say queue) or use any
other technique to store it .

Please provide your inputs as to how you deal in production environment.

Thanks
Ankur


Exception causing Cluster to fail

2015-10-12 Thread Ankur Garg
Hi ,

I am testing out my topologies on my local Cluster . Unfortunately ,
whenever some exception occurs , my Cluster stops with error message like
this :

{"timeStamp":1444673373185,"host":"mum-1agrag-m.local","tid":"Thread-11-eventSaver","loglevel":"error","msg":"Async
loop died!","source":"backtype.storm.utils/DisruptorQueue.java:128"}

{"timeStamp":1444673373199,"host":"mum-1agrag-m.local","tid":"Thread-11-eventSaver","loglevel":"error","msg":"","source":"backtype.storm.utils/DisruptorQueue.java:128"}

{"timeStamp":1444673373208,"host":"mum-1agrag-m.local","tid":"Thread-11-eventSaver","loglevel":"error","msg":"Halting
process: (\"Worker died\")","source":"backtype.storm/util.clj:325"}


Will the same happen when I deploy my topology to actual distributed
Cluster too ??


Also , If I catch these exceptions and not throw them the cluster continues
to run . But the stack trace for errors are not getting printed in the
console . I am catching all exceptions like below :

catch (Exception ex) {

LOG.error("Could not publish to the output Exchange ",ex);

}


Unfortunately though the Log message is getting printed , stack Trace is
not being thrown :( .


Looking for answers to above questions .


Thanks

Ankur


Re: Does Storm work with Spring

2015-10-11 Thread Ankur Garg
Thanks for replying Ravi .

I think your suggestion to make wrapper to read json or xml is a very nice
Idea indeed .

But , the problem for me here is to have the context (with all beans loaded
and initialized ) available inside the Spouts and Bolts and that means
inside every running instance of Spouts and Bolts which may be running on
different machines and different jvm.

Agree that when defining topology I dont need Spring Context as I just have
to define spouts and bolts there.  I used context here to send them to
spout and bolt through constructor but it appears from comments above that
it wont work on distributed cluster .

So , is there some way that once topology gets submitted to run in a
distributed cluster , I can initialize my context there and someway they
are available to all Spouts and Bolts  ..Basically some shared location
where my application Context can be initialized (once and only once) and
this context can be accessed by
all instances of Spouts and Bolts ?

Thanks

On Sun, Oct 11, 2015 at 11:20 AM, Ravi Sharma <ping2r...@gmail.com> wrote:

> Basically u will have two context defined at different time/phase
>
> When u r about to submit the topology, u need to build topology, that
> context only need information about spouts and bolts.  You don't need any
> application bean like database accessories or ur services etc, as at this
> level u r not running ur application but u r just creating a topology and
> defining how bolts and spouts are connected to each other etc etc
>
> Now once topology is submitted, topology will be moved to one of the
> supervisor node and will start running, all spouts and bolts will be
> initialized,  at this moment u will need ur application context, which
> doesn't need ur earlier topology context
>
> So I will suggest keep both context separate.
>
> Topology is not complex to build, smaller topology can be built via code
> only, I. E. Which bolt listening to which spout, but if u want to go with
> good design, I say just write a small wrapper to read some json where u can
> define ur bolts and spouts and use that to build topology (u can use spring
> but it's not much needed)
>
> In past I have done it using both json setting (without spring) and xml
> setting (with spring) both works good
>
> Ravi
> On 11 Oct 2015 06:38, "Ankur Garg" <ankurga...@gmail.com> wrote:
>
>> Oh The problem here is I have many beans and which need to be initialized
>> (some are reading conf from yml files , database connection , thread pool
>> initialization etc) .
>>
>>
>> Now , I have written a spring boot application which takes care of all
>> the above and I define my topology inside one of the beans , Here is my
>> bean
>>
>> @Autowired
>> ApplicationContext appContext;
>>
>> @Bean
>> public void submitTopology() throws
>> AlreadyAliveException,InvalidTopologyException {
>>
>>TopologyBuilder builder = new TopologyBuilder();
>>
>>builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext),
>> 10);
>>
>>builder.setBolt("mapBolt", new GroupingBolt(appContext),
>> 10).shuffleGrouping("rabbitMqSpout");
>>
>> builder.setBolt("reduceBolt", new PublishingBolt(appContext),
>> 10).shuffleGrouping("mapBolt");
>>
>> Config conf = new Config();
>>
>> conf.registerSerialization(EventBean.class); // To be registered with
>> Kyro for Storm
>>
>> conf.registerSerialization(InputQueueManagerImpl.class);
>>
>> conf.setDebug(true);
>>
>>  conf.setMessageTimeoutSecs(200);
>>
>>LocalCluster cluster = new LocalCluster();
>>
>>   cluster.submitTopology("test", conf, builder.createTopology());
>>
>> }
>>
>>
>> When this bean is initialized , I already have appContext initialized by
>> my Spring Boot Application . So , the thing is , I am using SpringBoot to
>> initialize and load my context with all beans .
>>
>> Now this is the context which I want to leverage in my spouts and bolts .
>>
>>
>> So , if what I suggested earlier does  not work on Storm Distributed
>> Cluster , I need to find a way of initializing my AppContext somehow:(
>>
>> I would be really thankful if anyone here can help me :(
>>
>>
>> Thanks
>>
>> Ankur
>>
>> On Sun, Oct 11, 2015 at 5:54 AM, Javier Gonzalez <jagon...@gmail.com>
>> wrote:
>>
>>> The local cluster runs completely within a single JVM AFAIK. The local
>>> cluster is useful for development, testing your topology, etc. The real
>>> deployment has to go through

Re: Multiple Spouts in Same topology or Topology per spout

2015-10-11 Thread Ankur Garg
Thanks for the reply Abhishek and Ravi .

One question though , going with One topology with multiple spouts ...What
if something goes wrong in One spout or its associated bolts .. Does it
impact other Spout as well?

Thanks
Ankur

On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> No 100% right ansers , u will have to test and see what will fit..
>
> persoanlly i wud suggest Multiple spouts in one Topology and if you have N
> node where topology will be running then each Spout(reading from one queue)
> shud run N times in parallel.
>
> if 2 Queues and say 4 Nodes
> then one topolgy
> 4 Spouts reading from Queue1 in different nodes
> 4 spouts reading from Queue2 in different nodes
>
> Ravi.
>
> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <abhishek.pr...@gmail.com>
> wrote:
>
>> I guess this is a question where there r no really correct answers. I'll
>> certainly avoid#1 as it is better to keep logic separate and lightweight.
>>
>> If your downstream bolts are same, then it makes senses to keep them in
>> same topology but if they r totally different, I'll keep them in two
>> different topologies. That will allow me to independently deploy and scale
>> the topology. But if the rest of logic is same I topology scaling and
>> resource utilization will be better with one topology.
>>
>> I hope this helps..
>>
>> Sent somehow
>>
>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>> >
>> > Hi ,
>> >
>> > So I have a situation where I want to read messages from different
>> queues hosted in a Rabbitmq Server .
>> >
>> > Now , there are three ways which I can think to leverage Apache Storm
>> here :-
>> >
>> > 1) Use the same Spout (say Spout A) to read messages from different
>> queues and based on the messages received emit it to different Bolts.
>> >
>> > 2) Use different Spout (Spout A and Spout B and so on) within the same
>> topology (say Topology A) to read messages from different queues .
>> >
>> > 3) Use Different Spouts one within eachTopology (Topology A , Topology
>> B and so on) to read messages from different queues .
>> >
>> > Which is the best way to process this considering I want high
>> throughput (more no of queue messages to be processed concurrently) .
>> >
>> > Also , If In use same Topology for all Spouts (currently though
>> requirement is for 2 spouts)  will failure in one Spout (or its associated
>> Bolts) effect the second or will they both continue working separately even
>> if some failure is in Spout B ?
>> >
>> > Cost wise , how much would it be to maintain two different topologies .
>> >
>> > Looking for inputs from members here.
>> >
>> > Thanks
>> > Ankur
>> >
>> >
>>
>
>


Multiple Spouts in Same topology or Topology per spout

2015-10-11 Thread Ankur Garg
Hi ,

So I have a situation where I want to read messages from different queues
hosted in a Rabbitmq Server .

Now , there are three ways which I can think to leverage Apache Storm here
:-

1) Use the same Spout (say Spout A) to read messages from different queues
and based on the messages received emit it to different Bolts.

2) Use different Spout (Spout A and Spout B and so on) within the same
topology (say Topology A) to read messages from different queues .

3) Use Different Spouts one within eachTopology (Topology A , Topology B
and so on) to read messages from different queues .

Which is the best way to process this considering I want high throughput
(more no of queue messages to be processed concurrently) .

Also , If In use same Topology for all Spouts (currently though requirement
is for 2 spouts)  will failure in one Spout (or its associated Bolts)
effect the second or will they both continue working separately even if
some failure is in Spout B ?

Cost wise , how much would it be to maintain two different topologies .

Looking for inputs from members here.

Thanks
Ankur


Re: Does Storm work with Spring

2015-10-10 Thread Ankur Garg
Oh ...So I will have to test it in a cluster.

Having said that, how is local cluster which we use is too different from
normal cluster.. Ideally ,it shud simulate normal cluster..
On Oct 10, 2015 7:51 PM, "Ravi Sharma" <ping2r...@gmail.com> wrote:

> Hi Ankur,
> local it may be working but It wont work in Actual cluster.
>
> Think about SpringContext is collection of your so many resoucres, like
> Database connections , may be HTTP connections , Thread pools etc.
> These things wont get serialised and just go to other machines and start
> working.
>
> SO basically in init methods of bolt and spout, you need to call some
> singloton class like this
>
> ApplicationContext ac = SingletonApplicationContext.getContext();
>
> SingletonApplicationContext will have a static variable ApplicationContext
> and in getContext you will check if static variable has been initialised if
> not then u will initilize it, and then return it(normal Singleton class)
>
>
> Now when Topolgy will move to any other node, Bolt and spouts will start
> and first init call will initialize it and other bolt/spouts will just use
> that.
>
> As John mentioned, its very important to mark all Spring beans and Context
> as transient.
>
> Hope it helps.
>
> Ravi.
>
>
>
>
>
> On Sat, Oct 10, 2015 at 6:25 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi Javier ,
>>
>> So , I am using a Local cluster on my dev machine where I am using
>> Eclipse . Here , I am passing Springs ApplicationContext as constructor
>> argument to spouts and bolts .
>>
>> TopologyBuilder builder = new TopologyBuilder();
>>
>> builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext),
>> 10);
>>
>> builder.setBolt("mapBolt", new GroupingBolt(appContext),
>> 10).shuffleGrouping("rabbitMqSpout");
>>
>> builder.setBolt("reduceBolt", new PublishingBolt(appContext),
>> 10).shuffleGrouping("mapBolt");
>>
>> Config conf = new Config();
>>
>> conf.registerSerialization(EventBean.class); /
>>
>> conf.registerSerialization(InputQueueManagerImpl.class);
>>
>> conf.setDebug(true);
>>
>>  LocalCluster cluster = new LocalCluster();
>>
>> cluster.submitTopology("test", conf, builder.createTopology());
>>
>>
>> And in my spouts and Bolts ,
>>
>> I make my Application Context variable as static  . So when it is
>> launched by c;uster.submitTopology , my context is still avalilable
>>
>>
>> private static ApplicationContext ctx;
>>
>> public RabbitListnerSpout(ApplicationContext appContext) {
>>
>> LOG.info("RabbitListner Constructor called");
>>
>> ctx = appContext;
>>
>> }
>>
>>
>> @SuppressWarnings("rawtypes")
>>
>> @Override
>>
>> public void open(Map conf, TopologyContext context,SpoutOutputCollector
>> collector) {
>>
>> LOG.info("Inside the open Method for RabbitListner Spout");
>>
>> inputManager = (InputQueueManagerImpl) ctx.getBean(InputQueueManagerImpl.
>> class);
>>
>> notificationManager = (NotificationQueueManagerImpl) ctx
>> .getBean(NotificationQueueManagerImpl.class);
>>
>> eventExchange = ctx.getEnvironment().getProperty(
>> "input.rabbitmq.events.exchange");
>>
>> routingKey = ctx.getEnvironment().getProperty(
>> "input.rabbitmq.events.routingKey");
>>
>> eventQueue = ctx.getEnvironment().getProperty(
>> "input.rabbitmq.events.queue");
>>
>> _collector = collector;
>>
>> LOG.info("Exiting the open Method for RabbitListner Spout");
>>
>> }
>>
>>
>> This is working like a charm (my ApplicationContext is initialized
>> seperately ) . As we all know , ApplicationContext is not serializable .
>> But this works well in LocalCluster.
>>
>> My assumption is that it will work in a seperate Cluster too . Is my
>> assumption correct ??
>>
>> On Fri, Oct 9, 2015 at 9:04 PM, Javier Gonzalez <jagon...@gmail.com>
>> wrote:
>>
>>> IIRC, only if everything you use in your spouts and bolts is
>>> serializable.
>>> On Oct 6, 2015 11:29 PM, "Ankur Garg" <ankurga...@gmail.com> wrote:
>>>
>>>> Hi Ravi ,
>>>>
>>>> I was able to make an Integration with Spring but the problem is that I
>>>> have to autowire for every bolt and spout . That means that even if i
>>>> parallelize spout and bolt i

Re: Does Storm work with Spring

2015-10-10 Thread Ankur Garg
Oh The problem here is I have many beans and which need to be initialized
(some are reading conf from yml files , database connection , thread pool
initialization etc) .


Now , I have written a spring boot application which takes care of all the
above and I define my topology inside one of the beans , Here is my bean

@Autowired
ApplicationContext appContext;

@Bean
public void submitTopology() throws
AlreadyAliveException,InvalidTopologyException {

   TopologyBuilder builder = new TopologyBuilder();

   builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext),
10);

   builder.setBolt("mapBolt", new GroupingBolt(appContext),
10).shuffleGrouping("rabbitMqSpout");

builder.setBolt("reduceBolt", new PublishingBolt(appContext),
10).shuffleGrouping("mapBolt");

Config conf = new Config();

conf.registerSerialization(EventBean.class); // To be registered with Kyro
for Storm

conf.registerSerialization(InputQueueManagerImpl.class);

conf.setDebug(true);

 conf.setMessageTimeoutSecs(200);

   LocalCluster cluster = new LocalCluster();

  cluster.submitTopology("test", conf, builder.createTopology());

}


When this bean is initialized , I already have appContext initialized by my
Spring Boot Application . So , the thing is , I am using SpringBoot to
initialize and load my context with all beans .

Now this is the context which I want to leverage in my spouts and bolts .

So , if what I suggested earlier does  not work on Storm Distributed
Cluster , I need to find a way of initializing my AppContext somehow:(

I would be really thankful if anyone here can help me :(


Thanks

Ankur

On Sun, Oct 11, 2015 at 5:54 AM, Javier Gonzalez <jagon...@gmail.com> wrote:

> The local cluster runs completely within a single JVM AFAIK. The local
> cluster is useful for development, testing your topology, etc. The real
> deployment has to go through nimbus, run on workers started by supervisors
> on one or more nodes, etc. Kind of difficult to simulate all that on a
> single box.
>
> On Sat, Oct 10, 2015 at 1:45 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Oh ...So I will have to test it in a cluster.
>>
>> Having said that, how is local cluster which we use is too different from
>> normal cluster.. Ideally ,it shud simulate normal cluster..
>> On Oct 10, 2015 7:51 PM, "Ravi Sharma" <ping2r...@gmail.com> wrote:
>>
>>> Hi Ankur,
>>> local it may be working but It wont work in Actual cluster.
>>>
>>> Think about SpringContext is collection of your so many resoucres, like
>>> Database connections , may be HTTP connections , Thread pools etc.
>>> These things wont get serialised and just go to other machines and start
>>> working.
>>>
>>> SO basically in init methods of bolt and spout, you need to call some
>>> singloton class like this
>>>
>>> ApplicationContext ac = SingletonApplicationContext.getContext();
>>>
>>> SingletonApplicationContext will have a static variable
>>> ApplicationContext and in getContext you will check if static variable has
>>> been initialised if not then u will initilize it, and then return it(normal
>>> Singleton class)
>>>
>>>
>>> Now when Topolgy will move to any other node, Bolt and spouts will start
>>> and first init call will initialize it and other bolt/spouts will just use
>>> that.
>>>
>>> As John mentioned, its very important to mark all Spring beans and
>>> Context as transient.
>>>
>>> Hope it helps.
>>>
>>> Ravi.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Oct 10, 2015 at 6:25 AM, Ankur Garg <ankurga...@gmail.com>
>>> wrote:
>>>
>>>> Hi Javier ,
>>>>
>>>> So , I am using a Local cluster on my dev machine where I am using
>>>> Eclipse . Here , I am passing Springs ApplicationContext as constructor
>>>> argument to spouts and bolts .
>>>>
>>>> TopologyBuilder builder = new TopologyBuilder();
>>>>
>>>> builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext),
>>>> 10);
>>>>
>>>> builder.setBolt("mapBolt", new GroupingBolt(appContext),
>>>> 10).shuffleGrouping("rabbitMqSpout");
>>>>
>>>> builder.setBolt("reduceBolt", new PublishingBolt(appContext),
>>>> 10).shuffleGrouping("mapBolt");
>>>>
>>>> Config conf = new Config();
>>>>
>>>> conf.registerSerialization(EventBean.class); /
>>>>
>&

Re: Does Storm work with Spring

2015-10-09 Thread Ankur Garg
Hi Javier ,

So , I am using a Local cluster on my dev machine where I am using Eclipse
. Here , I am passing Springs ApplicationContext as constructor argument to
spouts and bolts .

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext), 10);

builder.setBolt("mapBolt", new GroupingBolt(appContext),
10).shuffleGrouping("rabbitMqSpout");

builder.setBolt("reduceBolt", new PublishingBolt(appContext),
10).shuffleGrouping("mapBolt");

Config conf = new Config();

conf.registerSerialization(EventBean.class); /

conf.registerSerialization(InputQueueManagerImpl.class);

conf.setDebug(true);

 LocalCluster cluster = new LocalCluster();

cluster.submitTopology("test", conf, builder.createTopology());


And in my spouts and Bolts ,

I make my Application Context variable as static  . So when it is launched
by c;uster.submitTopology , my context is still avalilable


private static ApplicationContext ctx;

public RabbitListnerSpout(ApplicationContext appContext) {

LOG.info("RabbitListner Constructor called");

ctx = appContext;

}


@SuppressWarnings("rawtypes")

@Override

public void open(Map conf, TopologyContext context,SpoutOutputCollector
collector) {

LOG.info("Inside the open Method for RabbitListner Spout");

inputManager = (InputQueueManagerImpl) ctx.getBean(InputQueueManagerImpl.
class);

notificationManager = (NotificationQueueManagerImpl) ctx
.getBean(NotificationQueueManagerImpl.class);

eventExchange = ctx.getEnvironment().getProperty(
"input.rabbitmq.events.exchange");

routingKey = ctx.getEnvironment().getProperty(
"input.rabbitmq.events.routingKey");

eventQueue = ctx.getEnvironment().getProperty("input.rabbitmq.events.queue"
);

_collector = collector;

LOG.info("Exiting the open Method for RabbitListner Spout");

}


This is working like a charm (my ApplicationContext is initialized
seperately ) . As we all know , ApplicationContext is not serializable .
But this works well in LocalCluster.

My assumption is that it will work in a seperate Cluster too . Is my
assumption correct ??

On Fri, Oct 9, 2015 at 9:04 PM, Javier Gonzalez <jagon...@gmail.com> wrote:

> IIRC, only if everything you use in your spouts and bolts is serializable.
> On Oct 6, 2015 11:29 PM, "Ankur Garg" <ankurga...@gmail.com> wrote:
>
>> Hi Ravi ,
>>
>> I was able to make an Integration with Spring but the problem is that I
>> have to autowire for every bolt and spout . That means that even if i
>> parallelize spout and bolt it will get started to each instance  . Is there
>> some way that I only have to do for bolts and spouts once (I mean if I
>> parallelize bolts or spouts individually it can share the conf from
>> somewhere) . IS this possible??
>>
>> Thanks
>> Ankur
>>
>> On Tue, Sep 29, 2015 at 7:57 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>>
>>> Yes this is for annotation also...
>>>
>>> you can call this method in prepare()  method of bolt and onOpen() method
>>> in every Spout and make sure you don't use any autowire bean before this
>>> call.
>>>
>>>
>>>
>>>
>>> Ravi.
>>>
>>>
>>>
>>>
>>> On Tue, Sep 29, 2015 at 2:22 PM, Ankur Garg <ankurga...@gmail.com>
>>> wrote:
>>>
>>> > Hi Ravi ,
>>> >
>>> > Thanks for your reply . I am using annotation based configuration and
>>> using
>>> > Spring Boot.
>>> >
>>> > Any idea how to do it using annotations ?
>>> >
>>> >
>>> >
>>> > On Tue, Sep 29, 2015 at 6:41 PM, Ravi Sharma <ping2r...@gmail.com>
>>> wrote:
>>> >
>>> > > Bolts and Spouts are created by Storm and not known to Spring
>>> Context.
>>> > You
>>> > > need to manually add them to SpringContext, there are few methods
>>> > available
>>> > > i.e.
>>> > >
>>> > >
>>> > >
>>> >
>>> SpringContext.getContext().getAutowireCapableBeanFactory().autowireBeanProperties(this,
>>> > > AutowireCapableBeanFactory.AUTOWIRE_AUTODETECT, false);
>>> > >
>>> > > SpringContext is my own class where i have injected SpringContext so
>>> > > SpringContext.getContext() returns the actuall Spring Context
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Ravi.
>>> > >
>>> > >
>>> > > On Tue, Sep 29, 2015 at 1:03 PM, Ankur Garg <ankurga...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hi ,
>>> > > >
>>> > > > I am building a Storm topology with set of Spouts and Bolts  and
>>> also
>>> > > using
>>> > > > Spring for Dependency Injection .
>>> > > >
>>> > > > Unfortunately , none of my fields are getting autowired even
>>> though I
>>> > > have
>>> > > > declared all my spouts and Bolts as @Components .
>>> > > >
>>> > > > However the place where I am declaring my topology , Spring is
>>> working
>>> > > fine
>>> > > > .
>>> > > >
>>> > > > Is it because cluster.submitTopology("test", conf,
>>> > > > builder.createTopology())
>>> > > >  submits the topology to a cluster (locally it spawns different
>>> thread
>>> > > for
>>> > > > Spouts and Bolts) that Autowiring is not working?
>>> > > >
>>> > > > Please suggest .
>>> > > >
>>> > >
>>> >
>>>
>>
>>


Re: Exception Stack Trace for Local Cluster

2015-10-06 Thread Ankur Garg
Thanks Javier for the reply .

You say "If you mean your local desktop machine, you probably need to
configure your logging correctly." .What do I need to do here , my logging
works fine for other programs and only does not show up when  running local
cluster .

On Wed, Oct 7, 2015 at 5:08 AM, Javier Gonzalez <jagon...@gmail.com> wrote:

> If you mean your local desktop machine, you probably need to configure
> your logging correctly.
>
> If you mean running a topology with local submitter in a dev server...
> Why? :) just run a 1 node storm cluster if you want to do that
> On Oct 6, 2015 2:07 PM, "Ankur Garg" <ankurga...@gmail.com> wrote:
>
>> Hi ,
>>
>> I am running a local cluster on my dev machine . I see whenever something
>> fails due to some exception in my code or library that I am using , I just
>> get this message on console
>>
>> {"timeStamp":1444154764709,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"Async
>> loop
>> died!","source":"org.springframework.beans.factory.annotation/AutowiredAnnotationBeanPostProcessor.java:334"}
>>
>>
>> {"timeStamp":1444154764764,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"","source":"org.springframework.beans.factory.annotation/AutowiredAnnotationBeanPostProcessor.java:334"}
>>
>> {"timeStamp":1444154764779,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"Halting
>> process: (\"Worker died\")","source":"backtype.storm/util.clj:325"}
>>
>>
>> Is there any way I  can see the stack trace (other than debugging
>> manually :P)
>>
>>
>> Thanks
>>
>> Ankur
>>
>


Re: Does Storm work with Spring

2015-10-06 Thread Ankur Garg
Hi Ravi ,

I was able to make an Integration with Spring but the problem is that I
have to autowire for every bolt and spout . That means that even if i
parallelize spout and bolt it will get started to each instance  . Is there
some way that I only have to do for bolts and spouts once (I mean if I
parallelize bolts or spouts individually it can share the conf from
somewhere) . IS this possible??

Thanks
Ankur

On Tue, Sep 29, 2015 at 7:57 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> Yes this is for annotation also...
>
> you can call this method in prepare()  method of bolt and onOpen() method
> in every Spout and make sure you don't use any autowire bean before this
> call.
>
>
>
>
> Ravi.
>
>
>
>
> On Tue, Sep 29, 2015 at 2:22 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
> > Hi Ravi ,
> >
> > Thanks for your reply . I am using annotation based configuration and
> using
> > Spring Boot.
> >
> > Any idea how to do it using annotations ?
> >
> >
> >
> > On Tue, Sep 29, 2015 at 6:41 PM, Ravi Sharma <ping2r...@gmail.com>
> wrote:
> >
> > > Bolts and Spouts are created by Storm and not known to Spring Context.
> > You
> > > need to manually add them to SpringContext, there are few methods
> > available
> > > i.e.
> > >
> > >
> > >
> >
> SpringContext.getContext().getAutowireCapableBeanFactory().autowireBeanProperties(this,
> > > AutowireCapableBeanFactory.AUTOWIRE_AUTODETECT, false);
> > >
> > > SpringContext is my own class where i have injected SpringContext so
> > > SpringContext.getContext() returns the actuall Spring Context
> > >
> > >
> > >
> > >
> > > Ravi.
> > >
> > >
> > > On Tue, Sep 29, 2015 at 1:03 PM, Ankur Garg <ankurga...@gmail.com>
> > wrote:
> > >
> > > > Hi ,
> > > >
> > > > I am building a Storm topology with set of Spouts and Bolts  and also
> > > using
> > > > Spring for Dependency Injection .
> > > >
> > > > Unfortunately , none of my fields are getting autowired even though I
> > > have
> > > > declared all my spouts and Bolts as @Components .
> > > >
> > > > However the place where I am declaring my topology , Spring is
> working
> > > fine
> > > > .
> > > >
> > > > Is it because cluster.submitTopology("test", conf,
> > > > builder.createTopology())
> > > >  submits the topology to a cluster (locally it spawns different
> thread
> > > for
> > > > Spouts and Bolts) that Autowiring is not working?
> > > >
> > > > Please suggest .
> > > >
> > >
> >
>


Exception Stack Trace for Local Cluster

2015-10-06 Thread Ankur Garg
Hi ,

I am running a local cluster on my dev machine . I see whenever something
fails due to some exception in my code or library that I am using , I just
get this message on console

{"timeStamp":1444154764709,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"Async
loop
died!","source":"org.springframework.beans.factory.annotation/AutowiredAnnotationBeanPostProcessor.java:334"}

{"timeStamp":1444154764764,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"","source":"org.springframework.beans.factory.annotation/AutowiredAnnotationBeanPostProcessor.java:334"}

{"timeStamp":1444154764779,"host":"mum-1agrag-m.local","tid":"Thread-11-rabbitMqSpout","loglevel":"error","msg":"Halting
process: (\"Worker died\")","source":"backtype.storm/util.clj:325"}


Is there any way I  can see the stack trace (other than debugging manually
:P)


Thanks

Ankur


Re: Does Storm work with Spring

2015-09-29 Thread Ankur Garg
Hi Ravi ,

Thanks for your reply . I am using annotation based configuration and using
Spring Boot.

Any idea how to do it using annotations ?



On Tue, Sep 29, 2015 at 6:41 PM, Ravi Sharma <ping2r...@gmail.com> wrote:

> Bolts and Spouts are created by Storm and not known to Spring Context. You
> need to manually add them to SpringContext, there are few methods available
> i.e.
>
>
> SpringContext.getContext().getAutowireCapableBeanFactory().autowireBeanProperties(this,
> AutowireCapableBeanFactory.AUTOWIRE_AUTODETECT, false);
>
> SpringContext is my own class where i have injected SpringContext so
> SpringContext.getContext() returns the actuall Spring Context
>
>
>
>
> Ravi.
>
>
> On Tue, Sep 29, 2015 at 1:03 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
> > Hi ,
> >
> > I am building a Storm topology with set of Spouts and Bolts  and also
> using
> > Spring for Dependency Injection .
> >
> > Unfortunately , none of my fields are getting autowired even though I
> have
> > declared all my spouts and Bolts as @Components .
> >
> > However the place where I am declaring my topology , Spring is working
> fine
> > .
> >
> > Is it because cluster.submitTopology("test", conf,
> > builder.createTopology())
> >  submits the topology to a cluster (locally it spawns different thread
> for
> > Spouts and Bolts) that Autowiring is not working?
> >
> > Please suggest .
> >
>


Re: Spring AMQP Integration with Storm

2015-09-29 Thread Ankur Garg
Hi Stephen,

I want to use SpringAMQP . All examples you have quoted are not using
Spring nywhere which we are using heavily in our project .

If nothing works out , then I will have to use the example implementations
:(



On Tue, Sep 29, 2015 at 5:52 PM, Stephen Powis <spo...@salesforce.com>
wrote:

> I'd recommend making use of a spout that others have already built and
> battle tested.  A quick google search shows several including this one:
> https://github.com/ppat/storm-rabbitmq
>
> Unless you have a very special use case, I'm not sure re-inventing the
> wheel is worth the time and effort.
>
> Stephen
>
> On Tue, Sep 29, 2015 at 1:39 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to consume the messages in my Storm Spout from a rabbitMq Queue.
>>
>> Now , we are using Spring AMQP to send and receive messages from RabbitMq
>> asynchronously.
>>
>> Spring AMQP provides mechanism(either creating a listener or using
>> annotation @RabbitListner) to read message from the queue aysnchronously.
>>
>> The problem is I can have a Listener to read the message from the Queue.
>> But how do I send this message to my Storm Spout which is running on storm
>> cluster ?
>>
>> The topology will start a cluster, but in my nextTuple() method of my
>> spout , I need to read message from this Queue. Can Spring AMQP be used
>> here ?
>>
>> I have a listener configured to read message from the queue:
>>
>> @RabbitListener(queues = "queueName")
>> public void processMessage(QueueMessage message) {
>>
>> }
>>
>> How can the above message received at the listener be sent to my spout
>> running on a cluster .
>>
>> Alternatively , how can a spout's nextTuple() method have this method
>> inside it ? Is it possible
>>
>> Is there any integration there between Spring AMQP and Storm?
>>
>>
>> Thanks
>>
>> Ankur
>>
>>
>>
>>
>


Does Storm work with Spring

2015-09-29 Thread Ankur Garg
Hi ,

I am building a Storm topology with set of Spouts and Bolts  and also using
Spring for Dependency Injection .

Unfortunately , none of my fields are getting autowired even though I have
declared all my spouts and Bolts as @Components .

However the place where I am declaring my topology , Spring is working fine
.

Is it because cluster.submitTopology("test", conf, builder.createTopology())
 submits the topology to a cluster (locally it spawns different thread for
Spouts and Bolts) that Autowiring is not working?

Please suggest .


Spring AMQP Integration with Storm

2015-09-28 Thread Ankur Garg
Hi,

I want to consume the messages in my Storm Spout from a rabbitMq Queue.

Now , we are using Spring AMQP to send and receive messages from RabbitMq
asynchronously.

Spring AMQP provides mechanism(either creating a listener or using
annotation @RabbitListner) to read message from the queue aysnchronously.

The problem is I can have a Listener to read the message from the Queue.
But how do I send this message to my Storm Spout which is running on storm
cluster ?

The topology will start a cluster, but in my nextTuple() method of my spout
, I need to read message from this Queue. Can Spring AMQP be used here ?

I have a listener configured to read message from the queue:

@RabbitListener(queues = "queueName")
public void processMessage(QueueMessage message) {

}

How can the above message received at the listener be sent to my spout
running on a cluster .

Alternatively , how can a spout's nextTuple() method have this method
inside it ? Is it possible

Is there any integration there between Spring AMQP and Storm?


Thanks

Ankur


Emitting Tuple to different Bolts

2015-09-23 Thread Ankur Garg
Hi ,

I am trying of a scenario in which I have a Spout which reads a data from a
Message Broker and emits the message as a tuple  to a Bolt for some
Processing.

Bolt post processing converts it into seperate Messages and each sub-
message has to be sent  to different Brokers which can be hosted on
different machines .

Assuming I have finite recipients (in my case there are 3 Message Brokers
for output) .

So , Bolt1 post processing can either drop the message directly to these 3
Message Brokers

Now, If I use a single Bolt here which drops the messages to these three
brokers by itself and lets say One of them fails(due to unavailability etc)
 on which I call the collector's  fail method  .

Once the fail method is called on the bolt , in my Spout fail method gets
Invoked .

Here , I believe I will have to again process the entire message again (I
have to make sure everyMessage has to be processed ) even though 2 out of 3
messages got successfully delivered .

Alternatively , even If I emit these 3 sub messages to different bolt , I
think even in that case Spout will have to process the entire message again
.

This is because I am appending a Unique Guid with the message while
emitting it first time in the spout nextTuple() method .

Is there a way to ensure that only the failed sub message should be
processed and not entire one?


Thanks
Ankur


Emitting Custom Object as Tuple from spout

2015-09-16 Thread Ankur Garg
Hi ,

I am new to apache Storm . To understand it I was looking at storm examples
provided in the storm tutorial (
https://github.com/apache/storm/tree/master/examples/storm-starter)  .


In all the examples , we are emitting prmitive types (String,int etc) as
Tuples .

In my use case I am having a Java Object which i want to emit as tuple to
bolts .

I am not sure how to do it . It seems I have to implement a custom Tuple
producer to convert java Object to Values .

Can anyone provide me some example how to do that :

For ex my Java Object is :

class Bean
{
int A ;
String B;
 Bean2 b;
   //setters and getters

}

and Class Bean2
{
   //Some Attributes
}

Now , in my nextTuple() method for Spout , I have an instance of Bean
object .

How do I translate into Tuple and emit it and consume it through my bolt.

Any ideas please.

Thanks
ankur


Significance of boolean direct Output Fields declare (boolean direct, Fields fields)

2015-09-11 Thread Ankur Garg
Hi ,

Looking at the OutputFieldsdeclare class , I see there is an overloaded
method for
declare with a boolean flag direct .

If i use the declare method i.e declare(Fields fields) ,it sets this
boolean flag as false .

I am not sure how storm interprets this boolean field internally while
processing with Spouts and Bolts .

Can somebody explain me the significance of this flag ?

Thanks
Ankur


Deploying Apache Storm on Google Cloud

2015-09-04 Thread Ankur Garg
Hi ,

I wish to deploy storm cluster on Google Cloud .

Browsing on the internet , I could find this  =>
http://datadventures.ghost.io/2013/12/29/deploying-storm-on-gce/


Now , the above was written in 2013 . I read somewhere that Storm no longer
needs Zmq instead it is using netty .

Can someone here confirm , if the above is a good source (I will anyways
try it) .

In case any of u have some other document where this is explained , pls
share it .

PS-> I am new to apache Storm and  Google Cloud .

Any input is appreciated .

Thanks
Ankur


Evaluating Apache Storm

2015-08-31 Thread Ankur Garg
Hi ,

We in our organization are building a system to send  real time
notification (Pub-Sub) to Mobile Users(Android and IOS) .

Broadly the use case is that  mobile users subscribe to some topics and
whenever there is/are events related to this topic , we need to send those
users a Notification for that message .

The estimate is we are going to have a large no of users over a period of
time (may be millions) to get these notifications .

Since Storm is used for real -time processing of data, I am planning to
explore this whether it fits our requirement .

Does storm fits the bill here ? Is it being used somewhere to do this ?

Looking for some pointers .

Thanks
Ankur


Re: Evaluating Apache Storm

2015-08-31 Thread Ankur Garg
Thanks Deepak for the reply .

Yes , here as well Storm will be acting as a subscriber for Incoming Event
and we plan to use this for aggregation (Map and Reduce) in real time. The
volume of events can be huge here .

 I can u a rough idea (sorry if i may sound opaque at some points as we are
just trying to connect dots at this moment).

So , the idea here is , we are trying to build a classic Pub/Sub model
 that we will have a message Bus (Kafka or RabbitMq) where we will get some
events .

To elaborate an event  , there will be various Topics (which will be some
String Value) and we will get events for topics which will nothing but some
metadata for that evebt . Now , we will Figure out  what all mobile users
have subscribed to this particular event and append it to event msg .

To give a very rough example (since we are also not sure what will an
actual event look like as at this point we are just finding ways to
define/solve this problem)  :

Say There is Topic say TomCruise and we get an event in json (containing
metadata for his upcoming movie release)  . This event , we will receive in
our Message Bus (say kafka or rabbit Mq)

Event -> {TomCruise , {EventBody} } .

We plan to  feed  this stream to Apache Storm Spout where in it calls 2
bolts (one for Map and One for Reduce)

In the Bolt1 ,we may be doing a look up as to which all mobile users have
subscribed to this Topic's Event  and their type  and then Change the above
to

{ {Event1},{MobileUser1} ,{iOS} } .
{ {Event2},{MobileUser2} ,{GCM} } .
{ {Event1} , {MobileUser3},{iOs}}
{ {Event2} , {MobileUser4}.{GCM}}

..and so on
iOs,GCM are delivery channels (GCM stands for Google Cloud Messaging) so
that we can push to Android devices.

And in Bolt2 , we do aggregation that which Event needs to be delivered to
which set of users

{
Messages {
message { 'iOS', 'Event1', listOfMobileUsers {'MobileUser1',
'MobileUser3.'} },
message { 'GCM', 'Even2', listOfMobileUsers {'MobileUser2',
'MobileUser4.'} },
 and So on
}

>From here we will again push this to a notification Message Bus and write
plugins which will be responsible for delivering them to end user .

Now,since we will be having potentially millions of subscribers . We want
to build a platform which can handle this scale reliably . Hence , thinking
of evaluating Storm for this .

Hope I could describe the use case here .

Thanks
Ankur



On Mon, Aug 31, 2015 at 11:04 PM, Deepak Sharma <deepakmc...@gmail.com>
wrote:

> Hi Ankur
> I have not seen this kind of setup earlier.But i would say that storm is
> basically the subscriber in majority of implementations that i have seen or
> worked upon.
> So here you can use storm to process the events/notifications and then
> push it from storm to different topics where the mobile users are
> subscribed to.
>
> HTH
> --Deepak
>
> On Mon, Aug 31, 2015 at 10:59 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> Hi ,
>>
>> We in our organization are building a system to send  real time
>> notification (Pub-Sub) to Mobile Users(Android and IOS) .
>>
>> Broadly the use case is that  mobile users subscribe to some topics and
>> whenever there is/are events related to this topic , we need to send those
>> users a Notification for that message .
>>
>> The estimate is we are going to have a large no of users over a period of
>> time (may be millions) to get these notifications .
>>
>> Since Storm is used for real -time processing of data, I am planning to
>> explore this whether it fits our requirement .
>>
>> Does storm fits the bill here ? Is it being used somewhere to do this ?
>>
>> Looking for some pointers .
>>
>> Thanks
>> Ankur
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>