Fwd: Rule Based Approach for Real Time Stream Processing

2016-08-18 Thread Ambud Sharma
Have built and open sourced something similar. https://github.com/symantec/hendrix Please let me know if we could start building on top of this. Thanks and regards, Ambud

Re: Is there a way to tell if a tuple has timed out in 0.10

2016-08-22 Thread Ambud Sharma
Zack, that is the way of telling tuple timeout. You infer it from the fact that there are no bolt failures however spout has failures. On Aug 22, 2016 1:38 PM, "Zachary Smith" wrote: > Hi all, > > I have a topology which occasionally has a lot of tuples that fail at the > spout level, with all

RE: Running a long task in bolt prepare() method

2016-08-24 Thread Ambud Sharma
I would start by increasing task timeout Config.*NIMBUS_TASK_LAUNCH_SECS* and Config.*NIMBUS_TASK_TIMEOUT_SECS* would so supervisor doesn't mark the task as dead and restart. On Aug 24, 2016 2:17 AM, "Simon Cooper" wrote: > We’re decompressing and deserializing several hundreds-of-megabytes fi

Re: Re: How will storm replay the tuple tree?

2016-09-13 Thread Ambud Sharma
No as per the code only individual messages are replayed. On Sep 13, 2016 6:09 PM, "fanxi...@travelsky.com" wrote: > Hi: > > I'd like to make clear on something about Kafka-spout referring to ack. > > For example, kafka-spout fetches offset 5000-6000 from Kafka server, but > one tuple whose offs

Re: How will storm replay the tuple tree?

2016-09-13 Thread Ambud Sharma
Here is a post on it https://bryantsai.com/fault-tolerant-message-processing-in-storm/. Point to point tracking is expensive unless you are using transactions. Flume does point to point transfers using transactions. On Sep 13, 2016 3:27 PM, "Tech Id" wrote: > I agree with this statement about c

Re: How will storm replay the tuple tree?

2016-09-14 Thread Ambud Sharma
When a new >>> tupletree is born, the spout sends the XORed edge-ids of each tuple >>> recipient, which the acker records in its pending ledger" in >>> Acking-framework-implementation.html >>> <http://storm.apache.org/releases/current/Acking-fram

Re: Join stream with historical data

2016-09-14 Thread Ambud Sharma
Yes you can build something for data enrichment as long as your use some sort of LRU cache on the bolt that is fairly sizable and your event volume is reasonable to make sure there won't be a bottleneck in the topology. On Sep 13, 2016 10:43 AM, "Daniela S" wrote: > Dear all, > > is it possible

Re: Fwd: Serialize error when storm supervisor starting executor

2016-09-14 Thread Ambud Sharma
Can you post the snippet of your pom.xml file especially around where storm-core is imported? I suspect you are not excluding dependencies explicitly if there is a conflict in maven . What is serialized is your bolt instance so you need either have serialization objects or mark them transient and

Re: storm-kafka-client 1.1.0 tag - Multiple Spouts Eventually Freeze

2016-09-14 Thread Ambud Sharma
I have seen that behavior only when running in local mode of storm and there is no data flowing in. This sounds like it might have something to do with zookeeper as in your offsets in zookeeper are either not updated or the watches are not being triggered for the spout to consume. Try using the z

Re: SpoutConfig zkRoot argument causing KafkaSpout exception

2016-09-17 Thread Ambud Sharma
Zkroot should be empty string not a /. Basically that config refers to the path where the consumer offsets will be stored. On Sep 17, 2016 12:20 AM, "Dominik Safaric" wrote: > Hi, > > I’ve set up a topology consisting of a Kafka spout. But unfortunately, I > keep getting the exception *Caused b

Re: SpoutConfig zkRoot argument causing KafkaSpout exception

2016-09-17 Thread Ambud Sharma
it actually > refer to? > > Dominik > > On 17 Sep 2016, at 09:40, Ambud Sharma wrote: > > Zkroot should be empty string not a /. > > Basically that config refers to the path where the consumer offsets will > be stored. > > On Sep 17, 2016 12:20 AM, "

Re: Storm 1.0.2 - KafkaSpout cannot find partition information

2016-09-17 Thread Ambud Sharma
The Zkroot should be empty string. On Sep 17, 2016 9:09 AM, "Dominik Safaric" wrote: > Hi, > > I’ve deployed a topology consisting of a KafkaSpout using Kafka 0.10.0.1 > and Zookeeper 3.4.6. All of the services, including the Nimbus and > Supervisor, run on the same instance. > > However, by exa

Re: Storm 1.0.2 - KafkaSpout cannot find partition information

2016-09-17 Thread Ambud Sharma
minik Šafarić > > On 17 Sep 2016, at 18:11, Ambud Sharma wrote: > > The Zkroot should be empty string. > > On Sep 17, 2016 9:09 AM, "Dominik Safaric" > wrote: > >> Hi, >> >> I’ve deployed a topology consisting of a KafkaSpout using Kafka

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
The correct way is to perform time window aggregation using bucketing. Use the timestamp on your event computed from.various stages and send it to a single bolt where the aggregation happens. You only emit from this bolt once you receive results from both parts. It's like creating a barrier or th

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
to manage the queue and all those > complexities of timeout. If Storm is not the right place to do this then > what else? > > > > On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma > wrote: > >> The correct way is to perform time window aggregation using bucketing. >&g

Re: Who needs more memory?

2016-09-20 Thread Ambud Sharma
Allocate RAM for workers that are launched on supervisor nodes. Workers do the heavy lifting and are the component that actually run your topology. On Sep 20, 2016 11:51 AM, "Thomas Cristanis" wrote: > I am using the storm for an academic experiment and have a question. Where > it is necessary t

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
choudharyharsh> [image: Instagram] > <https://instagram.com/harsh.choudhary> > <https://www.pinterest.com/shryharsh/>[image: 500px] > <https://500px.com/harshchoudhary> [image: github] > <https://github.com/shry15harsh> > > On Tue, Sep 20, 2016 at 9:

Re: Large number of very small streams

2016-09-21 Thread Ambud Sharma
Two solutions: 1. You can group users by some sort of classification and create topics based on that then for each user the consumer can check if it's interested in the topic and consumer or reject the messages. 2. If each user writes a lot of data then you can use the concept of key based custom

Re: Submitting Topology Questions (2 small questions)

2016-09-21 Thread Ambud Sharma
Just the nimbus address and port On Sep 21, 2016 6:50 PM, "Joaquin Menchaca" wrote: > What is the minimal storm.yaml configuration do I need for `storm jar ... > remote`? > > Is there a command line option or way to specify locally crafted > storm.yaml, e.g. /tmp/storm.yaml? > > -- > > 是故勝兵先勝而後求

Re: Does storm UI show the YAML file for a topology?

2016-09-26 Thread Ambud Sharma
You mean the flux.yml file? You can see the topology visualization which might be useful. On Sep 26, 2016 10:40 AM, "Kevin" wrote: > I think you have to drill down into the topology to see that > > > On 09/26/16, S G wrote: > > Hi, > > I am unable to see the YAML file (with properties reduced to

Re: Issue with Storm 1.0.2 on docker

2016-10-14 Thread Ambud Sharma
TLDR; here's working compose template for docker storm with Kafka and zk. Please remove the project specific components and you should be able to run the whole this with scaling: https://github.com/Symantec/hendrix/blob/current/install/conf/local/docker-compose-backup.yml On Oct 13, 2016 9:26 PM,

Re: Is it possible to emit different number of OutputFields

2016-10-17 Thread Ambud Sharma
You should declare 2 streams. Bolt configuration can't be passed to the declareoutput fields method call. On Oct 16, 2016 1:47 PM, "Darsh" wrote: > Is it possible to emit different number of output fields from storm a bolt? > > For ex: > > public void declareOutputFields(OutputFieldsDeclarer dec

Re: Cant connect to remote cluster

2016-11-18 Thread Ambud Sharma
Please use IRC for this. Mailing list is not for Instant Messaging. On Nov 18, 2016 2:21 PM, "Mostafa Gomaa" wrote: Also is nimbus by any chance hosted on AWS or Azure? On Nov 19, 2016 12:18 AM, "Ohad Edelstein" wrote: > Try add the zookeeper part to the client storm.yaml > > I think that the

Re: Containerising Storm

2016-11-22 Thread Ambud Sharma
https://github.com/Symantec/hendrix/blob/current/install/scripts/local-build-install-docker.sh Here's a docker compose way of doing it. Zookeeper in this case is not scalable. I had to rebuild some of the images for Storm to fix issues with environment variables and DNS lookups. Those images shou

Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
Yes that is correct. All downstream tuples must be processed for the root tuple to be acknowledged. Type of grouping does not change the acking behavior. On Dec 19, 2016 3:53 PM, "Xunyun Liu" wrote: > Hi there, > > As some grouping methods allow sending multiple copies of emitted data to > down

Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
Forgot to answer your specific question. Storm message id is internal and will be different so you will see a duplicate tuple with a different id. On Dec 19, 2016 3:59 PM, "Ambud Sharma" wrote: > Yes that is correct. All downstream tuples must be processed for the root

Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
hat I need them responding to the signal through proper acknowledgment. However, the rest of them are non-critical which are preferably not to interfere the normal ack process, much like receiving an unanchored tuple. Is there any way that I can achieve this? On 20 December 2016 at 11:01, Ambu

Re: How to send multi tick tuple?

2016-12-19 Thread Ambud Sharma
Counting ticks with modulo operator is the ideal way to do it. Here's an example for you: https://github.com/Symantec/hendrix/blob/current/hendrix-alerts/src/main/java/io/symcpe/hendrix/alerts/SuppressionBolt.java#L136 Some slides explaining what's going on: http://www.slideshare.net/HadoopSummi

Re: Worker's Behavior With Heap Limit

2016-12-19 Thread Ambud Sharma
LRU caches are an effective memory management technique for Storm bolts if lookup is what you are trying to do however if you are doing in memory aggregations, I highly recommend sticking with standard Java maps and then checkpoint state to an external data store (hbase, redis etc.) Note: * Storm

Re: topology.debug always true

2016-12-19 Thread Ambud Sharma
Yes, your reasoning is correct. The topology is overriding the debug configurations, Storm allows topology to override all of the topology specific settings. On Fri, Dec 2, 2016 at 3:41 AM, Mostafa Gomaa wrote: > I think nimbus configuration is what you have on your nimbus machine, > "topology.d

Re: help on Consuming the data from SSL enabled 0.9 kafka topic

2016-12-19 Thread Ambud Sharma
Not sure if this helps: On Thu, Dec 1, 2016 at 4:11 AM, Srinivas.Veerabomma < srinivas.veerabo...@target.com> wrote: > Hi, > > > > I need some help. Basically looking for some sample Storm code or some > suggestions. > > > > My requirement is to develop a code in Apache Storm latest version to >

Re: help on Consuming the data from SSL enabled 0.9 kafka topic

2016-12-19 Thread Ambud Sharma
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_storm-user-guide/content/stormkafka-secure-config.html On Mon, Dec 19, 2016 at 4:49 PM, Ambud Sharma wrote: > Not sure if this helps: > > On Thu, Dec 1, 2016 at 4:11 AM, Srinivas.Veerabomma < > srinivas.veerabo...@ta

Re: Storm blobstore expiry

2016-12-19 Thread Ambud Sharma
What type of Blobstore is it? On Thu, Dec 1, 2016 at 1:57 AM, Mostafa Gomaa wrote: > Hello All, > > I am using the storm blobstore to store some dynamic dictionaries, however > I found that the blobstore had been cleared when I restarted the machine. > Is there a way to make the blobstore non-

Re: Does the bolt in between have the ability to re-emit a failed tuple?

2016-12-19 Thread Ambud Sharma
Replaying of tuples is done from the Spout and not done on a point to point basis like Apache Flume. Either a tuple is completely processed i.e. acked by every single bolt in the pipeline or it's not; if it's not then it will be replayed by the Spout (if the Spout implements a replay logic when th

Re: Clarity on external/storm-kafka-client

2016-12-19 Thread Ambud Sharma
The Storm-External project has the Kafka Spouts and bolts; Storm doesn't directly control the compatibility with Kafka, with that being said the default version of Kafka integrations will work according to your list; so the answer is Yes about the "compatibility" However, it's still possible to us

Re: Apache Storm 1.0.2 integration with ElasticSearch 5.0.0

2016-12-19 Thread Ambud Sharma
That is wire-level protocol incompatibility for ES or zen is disabled or nodes are not reachable in Elasticsearch. On Mon, Nov 28, 2016 at 7:21 PM, Zhechao Ma wrote: > As far as I know, storm-elasticsearch still doesn't support elasticsearch > 5.0. You can use *Elasticsearch-Hadoop *5.x instead,

Re: Support of topic wildcards in Kafka spout

2016-12-19 Thread Ambud Sharma
No, this is currently not supported. Please open a feature request: https://issues.apache.org/jira/browse/STORM/ so we can vote on it from a community perspective and see if others would be interested in developing this feature. On Tue, Nov 22, 2016 at 5:53 AM, Wijekoon, Manusha < manusha.wijek...

Re: deploy bolts to a specific supervisor

2016-12-19 Thread Ambud Sharma
Storm workers are suppose to be identical for the most part. You can tune things a little by setting odd number of executors compared to the worker count. To ideally accomplish what you are trying to do you can: 1. make these "cpu intensive" bolts register their task ids to zookeeper or other KV s

Re: Is there a Storm hosted/cloud solution somewhere available?

2017-01-24 Thread Ambud Sharma
Hi Andre, Try Hortonworks Cloudbreak: http://hortonworks.com/apache/cloudbreak/ provides you nice web interface to spool up clusters on-demand based on the sizing and instance flavors. It's not hosted however fairly easy to deploy. On Tue, Jan 24, 2017 at 4:59 PM, Joaquin Menchaca wrote: > clo

Re: problem in running topology in local mode

2017-01-27 Thread Ambud Sharma
Your dependencies need to be overridden. If you are using Maven please look at your dependency hierarchy and check what all is using slf log4j and fix it the version of storm you are using. On Jan 25, 2017 3:35 AM, "sam mohel" wrote: is there any help , please ? On Tue, Jan 24, 2017 at 8:53 AM,

Re: Storm deployment in wide-area network

2017-01-27 Thread Ambud Sharma
Nifi is a better tool for WAN data movements. There is no particular benefit to deploy Storm either face internet or moving data over the internet. On Jan 25, 2017 9:43 AM, "Fan Jiang" wrote: Hi all, I have a general question - is there any Storm deployment in wide-area network? If so, what ar

Re: Why aren't storm topology send and receive buffers equal sized?

2017-02-01 Thread Ambud Sharma
Arraylist is a resizable data structure, it will resize when more tuples come in. More importantly, this is simply a disruptor optimization step to avoid calling next sequence for each tuple. Rather it will call it once for 8 tuples slots at a time. On Jan 27, 2017 1:53 AM, "Navin Ipe" wrote:

Storm Flux Viewer

2017-02-02 Thread Ambud Sharma
Put together a simple webpage to visualize Flux YAML files to help troubleshooting and development of Flux based topologies. https://github.com/ambud/flux-viewer

Re: Storm Flux Viewer

2017-02-04 Thread Ambud Sharma
project. > > - Jungtaek Lim (HeartSaVioR) > > 2017년 2월 3일 (금) 오후 12:03, Xin Wang 님이 작성: > >> Hi Ambud, >> >> Thanks for your nice work. I tested it. Looks good. This can be a useful >> tool for flux users. >> >> - Xin >> >> 2017-02-03 5:08 GM

Re: Storm Kafka offset monitoring

2017-02-24 Thread Ambud Sharma
https://github.com/srotya/kafka-monitoring-tool You can use this or the original project as a standalone monitoring tool for offsets. On Thu, Feb 23, 2017 at 10:52 AM, pradeep s wrote: > Hi Priyank, > The confusion is on whats the proper implementation of spout . > > *Method 1* > *=* >

Re: Architecture Bolt Design Questions

2017-03-26 Thread Ambud Sharma
Please have a look at Storm DRPC for this to see if that would help make things easier for you. On Mar 16, 2017 6:45 AM, "Steve Robert" wrote: Hi guys , i have an simple question with an simple scenario is it a good practice to use the bolts for this connects to an external data source ? He

Re: Stream Sequencing and Synchronization

2017-05-06 Thread Ambud Sharma
1. If messages from 2 spouts can trigger updates to the same row you will need ideally need to process them using a single thread, if there is a possibility updates can be triggered at the same time it will require you to have some sort of master epoch and compare timestamps to understand the seque

Re: Apache storm HW recommendation

2017-05-06 Thread Ambud Sharma
Kafka is NOT cpu intensive whereas Storm depending on what you are trying to accomplish is. Memory however is going to be point of contention, with physical machines this is going to be even more evident as OS page cache will be affected. Virtualizing will not provide you any additional bandwidth,

Re: Storm topology freezes and does not process tuples from Kafka

2017-07-16 Thread Ambud Sharma
Please check if you have orphan workers. Orphan workers happen when a topology is redeployed in a short period of time and the old workers haven't yet been cleaned up. Check this running ps aux|grep java or specific jar keyword if you have one. On Jul 14, 2017 11:17 PM, "Sreeram" wrote: > Thank

Re: Decreasing value of Complete Latency in Storm UI

2017-07-16 Thread Ambud Sharma
If I may add, it is also explained by the potential surge of tuples when topology starts which will eventually reach an equilibrium which the normal latency of your topology components. On Jul 14, 2017 4:29 AM, "preethini v" wrote: > Hi, > > I am running WordCountTopology with 3 worker nodes. Th

Re: Automatic scaling and downsizing of storm cluster

2017-07-16 Thread Ambud Sharma
Note: be careful if you topology has state, Storm topologies may be stateful in which case autoscaling (up or down) could lead to data inconsistencies You can perform this using Hortonworks Cloudbreak to perform auto scaling based on configurable thresholds; you can also try to put something toget

Re: Is Java8 supported by Storm?

2017-07-16 Thread Ambud Sharma
Java 8 and Storm 1.x work just fine. I have used Java 8 and 0.10 is production as well. On Mon, Jul 10, 2017 at 7:06 AM, Bobby Evans wrote: > I filed https://issues.apache.org/jira/browse/STORM-2620 to update the > docs and put up some pull requests to clarify things for the 1.x and 2.x > branch

Re: Apache Storm - help in investigating cause of failures in ~20% of total events

2017-10-24 Thread Ambud Sharma
Without anchoring at least once semantics is not honored, i.e. if event is lost Kafka spout doesn't replay it. On Oct 1, 2017 6:12 AM, "Yovav Waichman" wrote: > Hi, > > > > We are using Apache Storm for a couple of years, and everything was fine > till now. > > For our spout we are using “storm-

unsubscribe

2019-03-31 Thread Ambud Sharma
unsubscribe