Re: Avoiding bottleneck in fields grouping

2020-08-17 Thread Rui Abreu
If maintaining the order of the messages is a requirement, fields grouping seems to be the only strategy that ensures that all tuples of the same partition will be sent to the same task ID. https://storm.apache.org/releases/current/Concepts.html Stream groupings > Part of defining a topol

Avoiding bottleneck in fields grouping

2020-08-13 Thread Jayant Sharma
Hi, Is there a standard way of avoiding bottleneck which arises due to fields grouping from one bolt to another. I have a use case where half a million tuples have the same field and go to the same bolt task because of field grouping. I cannot use shuffle grouping here because it is important

Re: Fields grouping consistency for separate streams

2018-11-26 Thread bogun . dmitriy
hashCode(t1["key"]) == hashCode(t2["key"]), > then t1 and t2 will go to the same task in "gamma". > > Den fre. 23. nov. 2018 kl. 09.06 skrev : > >> Hi. >> >> I have a question, regarding fields grouping. In documentation we have: &

Re: Fields grouping consistency for separate streams

2018-11-23 Thread Stig Rohde Døssing
Hi. > > I have a question, regarding fields grouping. In documentation we have: > ``` > Fields grouping: The stream is partitioned by the fields specified in the > grouping. For example, if the stream is grouped by the "user-id" field, > tuples with the same "user-id"

Fields grouping consistency for separate streams

2018-11-23 Thread bogun . dmitriy
Hi. I have a question, regarding fields grouping. In documentation we have: ``` Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always

Re: Fields grouping

2017-08-09 Thread Jakes John
" Using direct grouping will let the bolt upstream of the ES writing bolt decide which ES bolt receives a given message. So you could have spouts -> sorter bolts -> ES bolts, where sorter bolts use direct grouping to partition the stream by index id in whatever way you need. " What is

Re: Fields grouping

2017-08-08 Thread Stig Rohde Døssing
You can implement your own grouping by using direct grouping (from http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html): "*Direct grouping*: This is a special kind of grouping. A stream grouped this way means that the *producer* of the tuple decides which task of the consumer will

Fields grouping

2017-08-07 Thread Jakes John
Hi, I need to have a streaming pipeline Kafka->storm-> ElasticSearch. The volume of message produced to Kafka is in order of millions. Hence, I need to have maximum throughput in Elasticsearch writes. Each message has an id which is mapped to a Elasticsearch index. The number of possible

Fields Grouping Calculation

2016-10-10 Thread Jackson, Aubrey
Hello, I’ve tried to use the information below to generate the hash, mod it, and in turn calculate the correct consuming destination task index, but without success. I’ve scoured the Internet where someone as an example of a hand calculation of this nature and have turned up empty. I must be

Re: Question about Fields grouping

2016-09-18 Thread Navin Ipe
ey 1 will continue going to task 1 > when it is restarted. > It'd help if someone could confirm. > > On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <evgeniy.kh...@gmail.com> > wrote: > >> Hi, >> >> I can't find information how fields grouping works in case of wo

Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Navin Ipe
is the field name based on which the fields grouping is done. Thanks everyone! On Thu, Aug 11, 2016 at 3:52 PM, Erik Weathers <eweath...@groupon.com> wrote: > I think these are the appropriate code pointers: > > Original Clojure-based storm-core: > > https://github.com/apache/storm/b

Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Erik Weathers
; It does not matter who hashes it as long as they all use the same hash >> function it will go to the same bolt >> >> >> -- >> *From:* Navin Ipe <navin@searchlighthealth.com> >> *To:* user@storm.apache.org >> *Sent:* Thursday,

Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Navin Ipe
ghthealth.com> > *To:* user@storm.apache.org > *Sent:* Thursday, August 11, 2016 4:56 PM > *Subject:* Re: How long until fields grouping gets overwhelmed with data? > > If the hash is dynamically computed and is stateless, then that brings up > one more question. > > Let's say there ar

Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Gireesh Ramji
It does not matter who hashes it as long as they all use the same hash function it will go to the same bolt From: Navin Ipe <navin@searchlighthealth.com> To: user@storm.apache.org Sent: Thursday, August 11, 2016 4:56 PM Subject: Re: How long until fields grouping gets overw

Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Navin Ipe
ion#Hashing_ > uniformly_distributed_data > > On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote: > >> It's based on a modulo of a hash of the field. The fields grouping is >> stateless. >> >> On Aug 10, 2016 8:18 AM, "Navin

Re: How long until fields grouping gets overwhelmed with data?

2016-08-10 Thread Navin Ipe
Oh that's good to know. I assume it works like this: https://en.wikipedia.org/wiki/Hash_function#Hashing_uniformly_distributed_data On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote: > It's based on a modulo of a hash of the field. The fields grouping is &g

Re: How long until fields grouping gets overwhelmed with data?

2016-08-10 Thread Nathan Leung
It's based on a modulo of a hash of the field. The fields grouping is stateless. On Aug 10, 2016 8:18 AM, "Navin Ipe" <navin@searchlighthealth.com> wrote: > Hi, > > For spouts to be able to continuously send a fields grouped tuple to the > same bolt, it would h

How long until fields grouping gets overwhelmed with data?

2016-08-10 Thread Navin Ipe
Hi, For spouts to be able to continuously send a fields grouped tuple to the same bolt, it would have to store a key value map something like this, right? field1023 ---> Bolt1 field1343 ---> Bolt3 field1629 ---> Bolt5 field1726 ---> Bolt1 field1481 ---> Bolt3 So if my topology runs for a very

Re: Question about Fields grouping

2016-06-23 Thread Navin Ipe
it is restarted. It'd help if someone could confirm. On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <evgeniy.kh...@gmail.com> wrote: > Hi, > > I can't find information how fields grouping works in case of worker fail. > > With fields grouping tuples are partitioned by

Question about Fields grouping

2016-06-22 Thread Evgeniy Khyst
Hi, I can't find information how fields grouping works in case of worker fail. With fields grouping tuples are partitioned by some key and are sent to different tasks. Tuples with the same key goes to the same task. When a worker dies, the supervisor will restart it. If it continuously fails

Re: How are multiple spouts and fields grouping planned out?

2016-04-25 Thread Navin Ipe
topology has enough TASKS > > (the parallelism cannot be larger as the number of tasks). > > > > Thus, you need to "prepare" your topology during setup for dynamic > > scaling via .setNumTasks(); > > > > > builder.setSpout(spoutName, new MongoSpout(

Re: How are multiple spouts and fields grouping planned out?

2016-04-25 Thread Matthias J. Sax
er of parallel instances you can run, > and the initial deployment will start 5. Using rebalance you can change > the parallelism to up to 100 now. > > Hope this makes sens. > > > -Matthias > > > > > On 04/25/2016 09:11 AM, N

Re: How are multiple spouts and fields grouping planned out?

2016-04-25 Thread Navin Ipe
instances you can run, > and the initial deployment will start 5. Using rebalance you can change > the parallelism to up to 100 now. > > Hope this makes sens. > > > -Matthias > > > > > On 04/25/2016 09:11 AM, Navin Ipe wrote: > > Thank you Matthias for your time

Re: How are multiple spouts and fields grouping planned out?

2016-04-25 Thread Matthias J. Sax
> Thank you Matthias for your time and patient explanation. I'm now clear > about the Fields grouping (an answer on Stackoverflow had confused me > <http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm>). > The first question still stands, where I'm unable to

Re: How are multiple spouts and fields grouping planned out?

2016-04-25 Thread Navin Ipe
Thank you Matthias for your time and patient explanation. I'm now clear about the Fields grouping (an answer on Stackoverflow had confused me <http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm> ). The first question still stands, where I'm unable to understan

Re: How are multiple spouts and fields grouping planned out?

2016-04-24 Thread Matthias J. Sax
code, I considered having this topology. The single > [Spout] or [Bolt] represent multiple Spouts or Bolts. > > *[Spout]--emit--->[Bolt A]--emit--->[Bolt B]* > > If any of the bolts in Bolt A emit a Tuple of value 1, and it gets > processed by a certain bolt in Bolt B, then

How are multiple spouts and fields grouping planned out?

2016-04-24 Thread Navin Ipe
Bolt B, then it is imperative that if any of the bolts in Bolt A again emits the value 1, it should compulsorily be processed by the same bolt in Bolt B. I assume fields grouping can handle this. To have many spouts work in parallel, my initial thoughts were to have: *Integer numberOfSpout

Re: multiple fields grouping in storm

2015-11-03 Thread Priyank Shah
Hi Shuo, Seeing a lot of group errors in log file is expected. From http://storm.apache.org/documentation/Concepts.html the description of Field Grouping says 1. Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped

multiple fields grouping in storm

2015-11-03 Thread Shuo Chen
f BoltY. However, the log of topology shows lots of "group error". So how to group outputs with same fields "A" and "B" to the same task of BoltY? The question is also asked in http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm -- *Shuo Chen* chenatu2...@gmail.com chens...@whaty.com