If maintaining the order of the messages is a requirement, fields grouping
seems to be the only strategy that ensures that all tuples of the same
partition will be sent to the same task ID.
https://storm.apache.org/releases/current/Concepts.html
Stream groupings
> Part of defining a topol
Hi,
Is there a standard way of avoiding bottleneck which arises due to fields
grouping from one bolt to another. I have a use case where half a million
tuples have the same field and go to the same bolt task because of field
grouping. I cannot use shuffle grouping here because it is important
hashCode(t1["key"]) == hashCode(t2["key"]),
> then t1 and t2 will go to the same task in "gamma".
>
> Den fre. 23. nov. 2018 kl. 09.06 skrev :
>
>> Hi.
>>
>> I have a question, regarding fields grouping. In documentation we have:
&
Hi.
>
> I have a question, regarding fields grouping. In documentation we have:
> ```
> Fields grouping: The stream is partitioned by the fields specified in the
> grouping. For example, if the stream is grouped by the "user-id" field,
> tuples with the same "user-id"
Hi.
I have a question, regarding fields grouping. In documentation we have:
```
Fields grouping: The stream is partitioned by the fields specified in the
grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always
" Using direct grouping will let the bolt upstream of the ES writing bolt
decide which ES bolt receives a given message. So you could have spouts ->
sorter bolts -> ES bolts, where sorter bolts use direct grouping to
partition the stream by index id in whatever way you need. "
What is
You can implement your own grouping by using direct grouping (from
http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html): "*Direct
grouping*: This is a special kind of grouping. A stream grouped this way
means that the *producer* of the tuple decides which task of the consumer
will
Hi,
I need to have a streaming pipeline Kafka->storm-> ElasticSearch. The
volume of message produced to Kafka is in order of millions. Hence, I need
to have maximum throughput in Elasticsearch writes. Each message has an id
which is mapped to a Elasticsearch index. The number of possible
Hello,
I’ve tried to use the information below to generate the hash, mod it, and in
turn calculate the correct consuming destination task index, but without
success. I’ve scoured the Internet where someone as an example of a hand
calculation of this nature and have turned up empty. I must be
ey 1 will continue going to task 1
> when it is restarted.
> It'd help if someone could confirm.
>
> On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <evgeniy.kh...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I can't find information how fields grouping works in case of wo
is the field name based on which the fields grouping is done.
Thanks everyone!
On Thu, Aug 11, 2016 at 3:52 PM, Erik Weathers <eweath...@groupon.com>
wrote:
> I think these are the appropriate code pointers:
>
> Original Clojure-based storm-core:
>
> https://github.com/apache/storm/b
; It does not matter who hashes it as long as they all use the same hash
>> function it will go to the same bolt
>>
>>
>> --
>> *From:* Navin Ipe <navin@searchlighthealth.com>
>> *To:* user@storm.apache.org
>> *Sent:* Thursday,
ghthealth.com>
> *To:* user@storm.apache.org
> *Sent:* Thursday, August 11, 2016 4:56 PM
> *Subject:* Re: How long until fields grouping gets overwhelmed with data?
>
> If the hash is dynamically computed and is stateless, then that brings up
> one more question.
>
> Let's say there ar
It does not matter who hashes it as long as they all use the same hash function
it will go to the same bolt
From: Navin Ipe <navin@searchlighthealth.com>
To: user@storm.apache.org
Sent: Thursday, August 11, 2016 4:56 PM
Subject: Re: How long until fields grouping gets overw
ion#Hashing_
> uniformly_distributed_data
>
> On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote:
>
>> It's based on a modulo of a hash of the field. The fields grouping is
>> stateless.
>>
>> On Aug 10, 2016 8:18 AM, "Navin
Oh that's good to know. I assume it works like this:
https://en.wikipedia.org/wiki/Hash_function#Hashing_uniformly_distributed_data
On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote:
> It's based on a modulo of a hash of the field. The fields grouping is
&g
It's based on a modulo of a hash of the field. The fields grouping is
stateless.
On Aug 10, 2016 8:18 AM, "Navin Ipe" <navin@searchlighthealth.com>
wrote:
> Hi,
>
> For spouts to be able to continuously send a fields grouped tuple to the
> same bolt, it would h
Hi,
For spouts to be able to continuously send a fields grouped tuple to the
same bolt, it would have to store a key value map something like this,
right?
field1023 ---> Bolt1
field1343 ---> Bolt3
field1629 ---> Bolt5
field1726 ---> Bolt1
field1481 ---> Bolt3
So if my topology runs for a very
it is restarted.
It'd help if someone could confirm.
On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <evgeniy.kh...@gmail.com>
wrote:
> Hi,
>
> I can't find information how fields grouping works in case of worker fail.
>
> With fields grouping tuples are partitioned by
Hi,
I can't find information how fields grouping works in case of worker fail.
With fields grouping tuples are partitioned by some key and are sent to
different tasks.
Tuples with the same key goes to the same task.
When a worker dies, the supervisor will restart it. If it continuously
fails
topology has enough TASKS
> > (the parallelism cannot be larger as the number of tasks).
> >
> > Thus, you need to "prepare" your topology during setup for dynamic
> > scaling via .setNumTasks();
> >
> > > builder.setSpout(spoutName, new MongoSpout(
er of parallel instances you can run,
> and the initial deployment will start 5. Using rebalance you can change
> the parallelism to up to 100 now.
>
> Hope this makes sens.
>
>
> -Matthias
>
>
>
>
> On 04/25/2016 09:11 AM, N
instances you can run,
> and the initial deployment will start 5. Using rebalance you can change
> the parallelism to up to 100 now.
>
> Hope this makes sens.
>
>
> -Matthias
>
>
>
>
> On 04/25/2016 09:11 AM, Navin Ipe wrote:
> > Thank you Matthias for your time
> Thank you Matthias for your time and patient explanation. I'm now clear
> about the Fields grouping (an answer on Stackoverflow had confused me
> <http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm>).
> The first question still stands, where I'm unable to
Thank you Matthias for your time and patient explanation. I'm now clear
about the Fields grouping (an answer on Stackoverflow had confused me
<http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm>
).
The first question still stands, where I'm unable to understan
code, I considered having this topology. The single
> [Spout] or [Bolt] represent multiple Spouts or Bolts.
>
> *[Spout]--emit--->[Bolt A]--emit--->[Bolt B]*
>
> If any of the bolts in Bolt A emit a Tuple of value 1, and it gets
> processed by a certain bolt in Bolt B, then
Bolt B, then it is imperative that if any of
the bolts in Bolt A again emits the value 1, it should compulsorily be
processed by the same bolt in Bolt B. I assume fields grouping can handle
this.
To have many spouts work in parallel, my initial thoughts were to have:
*Integer numberOfSpout
Hi Shuo,
Seeing a lot of group errors in log file is expected. From
http://storm.apache.org/documentation/Concepts.html the description of Field
Grouping says
1. Fields grouping: The stream is partitioned by the fields specified in the
grouping. For example, if the stream is grouped
f BoltY.
However, the log of topology shows lots of "group error".
So how to group outputs with same fields "A" and "B" to the same task of
BoltY?
The question is also asked in
http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm
--
*Shuo Chen*
chenatu2...@gmail.com
chens...@whaty.com
29 matches
Mail list logo