[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-05-28 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
Attachment: cluster_distribution.png

> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
> Attachments: cluster_distribution.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Necessary to add Processor for distribute data over user specified 
> relationships by distribution key/keys. Data is distributed across 
> relationships in the amount proportional to the relationship weight. For 
> example, if there are two relationships and the first has a weight of 9 while 
> the second has a weight of 10, the first will be sent 9 / 19 parts of the 
> rows, and the second will be sent 10 / 19.
> The row will be sent to the relationship that corresponds to the 
> half-interval of the remainders from 'prev_weight' to 'prev_weights + 
> weight', where 'prev_weights' is the total weight of the relationships with 
> the smallest number, and 'weight' is the weight of this relationship." For 
> example, if there are two relationships, and the first has a weight of 9 
> while the second has a weight of 10, the row will be sent to the first 
> relationship for the remainders from the range [0, 9), and to the second for 
> the remainders from the range [9, 19).
>  
> It will help for loading data to distributed databases like clickhouse 
> [https://clickhouse.tech/docs/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-05-28 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
Description: 
Necessary to add Processor for distribute data over user specified 
relationships by distribution key/keys. Data is distributed across 
relationships in the amount proportional to the relationship weight. For 
example, if there are two relationships and the first has a weight of 9 while 
the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, 
and the second will be sent 10 / 19.

The row will be sent to the relationship that corresponds to the half-interval 
of the remainders from 'prev_weight' to 'prev_weights + weight', where 
'prev_weights' is the total weight of the relationships with the smallest 
number, and 'weight' is the weight of this relationship." For example, if there 
are two relationships, and the first has a weight of 9 while the second has a 
weight of 10, the row will be sent to the first relationship for the remainders 
from the range [0, 9), and to the second for the remainders from the range [9, 
19).

 

It will help for loading data to distributed databases like clickhouse 
[https://clickhouse.tech/docs/en/]

  was:
Necessary to add Processor for distribute data over user specified 
relationships by distribution key/keys. Data is distributed across 
relationships in the amount proportional to the relationship weight. For 
example, if there are two relationships and the first has a weight of 9 while 
the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, 
and the second will be sent 10 / 19.

The row will be sent to the relationship that corresponds to the half-interval 
of the remainders from 'prev_weight' to 'prev_weights + weight', where 
'prev_weights' is the total weight of the relationships with the smallest 
number, and 'weight' is the weight of this relationship." For example, if there 
are two relationships, and the first has a weight of 9 while the second has a 
weight of 10, the row will be sent to the first relationship for the remainders 
from the range [0, 9), and to the second for the remainders from the range [9, 
19).


> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
> Attachments: cluster_distribution.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Necessary to add Processor for distribute data over user specified 
> relationships by distribution key/keys. Data is distributed across 
> relationships in the amount proportional to the relationship weight. For 
> example, if there are two relationships and the first has a weight of 9 while 
> the second has a weight of 10, the first will be sent 9 / 19 parts of the 
> rows, and the second will be sent 10 / 19.
> The row will be sent to the relationship that corresponds to the 
> half-interval of the remainders from 'prev_weight' to 'prev_weights + 
> weight', where 'prev_weights' is the total weight of the relationships with 
> the smallest number, and 'weight' is the weight of this relationship." For 
> example, if there are two relationships, and the first has a weight of 9 
> while the second has a weight of 10, the row will be sent to the first 
> relationship for the remainders from the range [0, 9), and to the second for 
> the remainders from the range [9, 19).
>  
> It will help for loading data to distributed databases like clickhouse 
> [https://clickhouse.tech/docs/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-01-14 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
External issue URL: https://github.com/apache/nifi/pull/3984

> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
>
> Necessary to add Processor for distribute data over user specified 
> relationships by distribution key/keys. Data is distributed across 
> relationships in the amount proportional to the relationship weight. For 
> example, if there are two relationships and the first has a weight of 9 while 
> the second has a weight of 10, the first will be sent 9 / 19 parts of the 
> rows, and the second will be sent 10 / 19.
> The row will be sent to the relationship that corresponds to the 
> half-interval of the remainders from 'prev_weight' to 'prev_weights + 
> weight', where 'prev_weights' is the total weight of the relationships with 
> the smallest number, and 'weight' is the weight of this relationship." For 
> example, if there are two relationships, and the first has a weight of 9 
> while the second has a weight of 10, the row will be sent to the first 
> relationship for the remainders from the range [0, 9), and to the second for 
> the remainders from the range [9, 19).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-01-14 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
Description: 
Necessary to add Processor for distribute data over user specified 
relationships by distribution key/keys. Data is distributed across 
relationships in the amount proportional to the relationship weight. For 
example, if there are two relationships and the first has a weight of 9 while 
the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, 
and the second will be sent 10 / 19.

The row will be sent to the relationship that corresponds to the half-interval 
of the remainders from 'prev_weight' to 'prev_weights + weight', where 
'prev_weights' is the total weight of the relationships with the smallest 
number, and 'weight' is the weight of this relationship." For example, if there 
are two relationships, and the first has a weight of 9 while the second has a 
weight of 10, the row will be sent to the first relationship for the remainders 
from the range [0, 9), and to the second for the remainders from the range [9, 
19).

  was:
Necessary to add processor for {color:#00875a}Record{color} distribution.
 Processor must have next fields :
 reader, writer, keys (list of {color:#00875a}Record{color} fields for 
hashing), hash function name.
 and an arbitrary number of dynamic properties representing relationships. Also 
relationships would have weights. 

if we have one key and this key is integer then hash function does not 
evaluate. If we have several keys then cast record values to String and join 
them with "-" delimiter
like "34-NY-open".
Hash function must return {color:#FF}int{color} value. Next processor find 
{color:#172b4d}+target relationship+ = {color}(hashResult % sum(weights)) and 
move record according integer range for relationships,
so if we have 2 relationships with weights 4 and 7 appropriately then we have 2 
intervals [0, 4) and [4, 11) for first and second relationship (order of 
relationships matters)

 


> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
>
> Necessary to add Processor for distribute data over user specified 
> relationships by distribution key/keys. Data is distributed across 
> relationships in the amount proportional to the relationship weight. For 
> example, if there are two relationships and the first has a weight of 9 while 
> the second has a weight of 10, the first will be sent 9 / 19 parts of the 
> rows, and the second will be sent 10 / 19.
> The row will be sent to the relationship that corresponds to the 
> half-interval of the remainders from 'prev_weight' to 'prev_weights + 
> weight', where 'prev_weights' is the total weight of the relationships with 
> the smallest number, and 'weight' is the weight of this relationship." For 
> example, if there are two relationships, and the first has a weight of 9 
> while the second has a weight of 10, the row will be sent to the first 
> relationship for the remainders from the range [0, 9), and to the second for 
> the remainders from the range [9, 19).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-01-14 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
Status: Patch Available  (was: Open)

> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
>
> Necessary to add processor for {color:#00875a}Record{color} distribution.
>  Processor must have next fields :
>  reader, writer, keys (list of {color:#00875a}Record{color} fields for 
> hashing), hash function name.
>  and an arbitrary number of dynamic properties representing relationships. 
> Also relationships would have weights. 
> if we have one key and this key is integer then hash function does not 
> evaluate. If we have several keys then cast record values to String and join 
> them with "-" delimiter
> like "34-NY-open".
> Hash function must return {color:#FF}int{color} value. Next processor 
> find {color:#172b4d}+target relationship+ = {color}(hashResult % 
> sum(weights)) and move record according integer range for relationships,
> so if we have 2 relationships with weights 4 and 7 appropriately then we have 
> 2 intervals [0, 4) and [4, 11) for first and second relationship (order of 
> relationships matters)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2020-01-06 Thread Joe Witt (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Witt updated NIFI-6970:
---
Fix Version/s: (was: 1.11.0)

> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
>
> Necessary to add processor for {color:#00875a}Record{color} distribution.
>  Processor must have next fields :
>  reader, writer, keys (list of {color:#00875a}Record{color} fields for 
> hashing), hash function name.
>  and an arbitrary number of dynamic properties representing relationships. 
> Also relationships would have weights. 
> if we have one key and this key is integer then hash function does not 
> evaluate. If we have several keys then cast record values to String and join 
> them with "-" delimiter
> like "34-NY-open".
> Hash function must return {color:#FF}int{color} value. Next processor 
> find {color:#172b4d}+target relationship+ = {color}(hashResult % 
> sum(weights)) and move record according integer range for relationships,
> so if we have 2 relationships with weights 4 and 7 appropriately then we have 
> 2 intervals [0, 4) and [4, 11) for first and second relationship (order of 
> relationships matters)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NIFI-6970) Add DistributeRecord processor for distribute data by key hash

2019-12-24 Thread Ilya Kovalev (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kovalev updated NIFI-6970:
---
Description: 
Necessary to add processor for {color:#00875a}Record{color} distribution.
 Processor must have next fields :
 reader, writer, keys (list of {color:#00875a}Record{color} fields for 
hashing), hash function name.
 and an arbitrary number of dynamic properties representing relationships. Also 
relationships would have weights. 

if we have one key and this key is integer then hash function does not 
evaluate. If we have several keys then cast record values to String and join 
them with "-" delimiter
like "34-NY-open".
Hash function must return {color:#FF}int{color} value. Next processor find 
{color:#172b4d}+target relationship+ = {color}(hashResult % sum(weights)) and 
move record according integer range for relationships,
so if we have 2 relationships with weights 4 and 7 appropriately then we have 2 
intervals [0, 4) and [4, 11) for first and second relationship (order of 
relationships matters)

 

  was:
Necessary to add processor for record distribution.
Processor must have next fields :
 reader, writer, keys (list of record fields for hashing), hash function name.
and an arbitrary number of dynamic properties representing relationships.


> Add DistributeRecord processor for distribute data by key hash
> --
>
> Key: NIFI-6970
> URL: https://issues.apache.org/jira/browse/NIFI-6970
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.10.0
>Reporter: Ilya Kovalev
>Priority: Minor
> Fix For: 1.11.0
>
>
> Necessary to add processor for {color:#00875a}Record{color} distribution.
>  Processor must have next fields :
>  reader, writer, keys (list of {color:#00875a}Record{color} fields for 
> hashing), hash function name.
>  and an arbitrary number of dynamic properties representing relationships. 
> Also relationships would have weights. 
> if we have one key and this key is integer then hash function does not 
> evaluate. If we have several keys then cast record values to String and join 
> them with "-" delimiter
> like "34-NY-open".
> Hash function must return {color:#FF}int{color} value. Next processor 
> find {color:#172b4d}+target relationship+ = {color}(hashResult % 
> sum(weights)) and move record according integer range for relationships,
> so if we have 2 relationships with weights 4 and 7 appropriately then we have 
> 2 intervals [0, 4) and [4, 11) for first and second relationship (order of 
> relationships matters)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)