[jira] [Commented] (APEXCORE-332) Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions
[ https://issues.apache.org/jira/browse/APEXCORE-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303566#comment-15303566 ] ASF GitHub Bot commented on APEXCORE-332: - Github user ilganeli commented on the pull request: https://github.com/apache/incubator-apex-core/pull/347#issuecomment-222066531 Makes sense / that's why I didn't want to dive too deep. Just enough to get a feel for what it would look like. > Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions > > > Key: APEXCORE-332 > URL: https://issues.apache.org/jira/browse/APEXCORE-332 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Ilya Ganelin >Priority: Minor > > Currently partitions masks must be defined as a binary mask. As a result the > number of partitions must be a power of 2, otherwise the distribution of > tuples will be uneven. If we support the modulus operation instead of a > binary mask, we could support an even distribution of tuples to a non-power > of 2 number of partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-332) Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions
[ https://issues.apache.org/jira/browse/APEXCORE-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303463#comment-15303463 ] ASF GitHub Bot commented on APEXCORE-332: - Github user vrozov commented on a diff in the pull request: https://github.com/apache/incubator-apex-core/pull/347#discussion_r64856234 --- Diff: .idea/codeStyleSettings.xml --- @@ -104,5 +104,6 @@ + --- End diff -- This file is created by IntelliJ and is supposed not to have final line terminator. > Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions > > > Key: APEXCORE-332 > URL: https://issues.apache.org/jira/browse/APEXCORE-332 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Ilya Ganelin >Priority: Minor > > Currently partitions masks must be defined as a binary mask. As a result the > number of partitions must be a power of 2, otherwise the distribution of > tuples will be uneven. If we support the modulus operation instead of a > binary mask, we could support an even distribution of tuples to a non-power > of 2 number of partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-core pull request: [APEXCORE-332][WIP] Improve part...
Github user amberarrow commented on a diff in the pull request: https://github.com/apache/incubator-apex-core/pull/347#discussion_r64847617 --- Diff: .idea/codeStyleSettings.xml --- @@ -104,5 +104,6 @@ + --- End diff -- Missing final line terminator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-core pull request: [APEXCORE-332][WIP] Improve part...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/incubator-apex-core/pull/347#discussion_r64846471 --- Diff: .idea/codeStyleSettings.xml --- @@ -104,5 +104,6 @@ + --- End diff -- Please keep original IntelliJ code style settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (APEXMALHAR-2086) Kafka Output Operator with Kafka 0.9 API
[ https://issues.apache.org/jira/browse/APEXMALHAR-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandesh updated APEXMALHAR-2086: Description: Goal : 2 Operartors for Kafka Output 1. Simple Kafka Output Operator - Supports Atleast Once - Expose most used producer properties as class properties 2. Exactly Once Kafka Output ( Not possible in all the cases, will be documented later ) Design for Exactly Once Window Data Manager - Stores the Kafka partitions offsets. Kafka Key - Used by the operator = AppID#OperatorId During recovery. Partially written window is re-created using the following approach: Tuples between the largest recovery offsets and the current offset are checked. Based on the key, tuples written by the other entities are discarded. Only tuples which are not in the recovered set are emitted. was: Goal : 2 Operartors for Kafka Output 1. Simple Kafka Output Operator - Supports Atleast Once - Expose most used producer properties as class properties 2. Exactly Once Kafka Output ( Not possible in all the cases, will be documented later ) > Kafka Output Operator with Kafka 0.9 API > > > Key: APEXMALHAR-2086 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2086 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: Sandesh >Assignee: Sandesh > > Goal : 2 Operartors for Kafka Output > 1. Simple Kafka Output Operator > - Supports Atleast Once > - Expose most used producer properties as class properties > 2. Exactly Once Kafka Output ( Not possible in all the cases, will be > documented later ) > > Design for Exactly Once > Window Data Manager - Stores the Kafka partitions offsets. > Kafka Key - Used by the operator = AppID#OperatorId > During recovery. Partially written window is re-created using the following > approach: > Tuples between the largest recovery offsets and the current offset are > checked. Based on the key, tuples written by the other entities are > discarded. > Only tuples which are not in the recovered set are emitted. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Kafka Exactly once output operator
Current design: Window Data Manager - Stores the Kafka partitions offsets. Kafka Key - Used by the operator = AppID#OperatorId During recovery. Partially written window is re-created using the following approach: Tuples between the largest recovery offsets and the current offset are checked. Based on the key, tuples written by the other entities are discarded. Only tuples which are not in the recovered set are emitted. Here is the first cut of the design https://github.com/apache/incubator-apex-malhar/pull/298 Please give your feedbacks on the design. @Bright, Recovery data needs to be present in the Key, to distinguish the tuple coming from the different instances of the output operator or external applications. Thanks On Fri, May 13, 2016 at 2:14 PM Bright Chenwrote: > Hi Sandesh, > I think it’s maybe better to keep it into Jira. > > Do you mean keep the key in other Kafka topic or the key is in fact the > key of Kafka Message which represent user tuple? > If it is separate key, how to keep the relation between key and value? > If Key is the key of Kafka message, basically, it will change the expected > data. As I understand, the key here is just used for recovery, it’s not the > data user required. And the data which write to the Kafka probably need to > be decided by the customer logic. > > Think about a customer build two applications with our operator, the first > application write data to Kafka, the second one read data from Kafka. And > at the very beginning, the first application was implemented by a > none-exactly once output operator, and then changed to exactly once > operator. I think the customer don’t expect to change the second > application. But the second application has to be changed if it’s logic > depended on key. > > thanks > Bright > > > On May 13, 2016, at 12:37 PM, Sandesh Hegde > wrote: > > > > Hi All, > > > > I am working on Kafka 0.9 output operator and one of the requirement is > to > > implement Exactly Once Output operator. Here is the one possible idea, > > please give your feedback or suggest new design. > > > > > - > > > > Use *Key* to store meta information which is used during recovery. > > > > Operator users will use *Value* to store their key-value pair and > implement > > the Kafka partitioning accordingly. > > > > Format of the *Key* is as specified below: > > > > > > > > Key = 1. OperatorName#ApexPartitionId#WindowId#Message#MessageId ( During > > message write ) > > > > 2. OperatorName#ApexPartitionId#WindowId#CheckPoint ( During end > > Window ) > > > > During End window, checkpoint marker is written to all the Kafka > partitions > > of the topic. > > > > Every message is given a message id, counter-reset every window, and then > > written to Kafka. > > > > During recovery, Kafka partitions are read until the last checkpoint > > message from this operator is reached and the partially written window is > > constructed. > > > > > > > > > Note: Existing Kafka exactly once operator, ( Kafka 0.8 ) also needs to > be > > re-written. > > > > Thanks > >
[GitHub] incubator-apex-malhar pull request: *For review only* [APEXMALHAR-...
GitHub user sandeshh opened a pull request: https://github.com/apache/incubator-apex-malhar/pull/298 *For review only* [APEXMALHAR-2086] Kafka output: 0.9.1 first cut Kafka output exactly once operator and the regular output operator. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeshh/incubator-apex-malhar APEXMALHAR-2086 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-malhar/pull/298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #298 commit ea26306b1bc4a3d13ef3e28a222032ad7d1f6955 Author: sandeshhDate: 2016-05-25T15:56:56Z Kafka output: 0.9.1 first cut --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXCORE-208) StreamingContainer.java Contains Violations of Style Guide
[ https://issues.apache.org/jira/browse/APEXCORE-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303221#comment-15303221 ] ASF GitHub Bot commented on APEXCORE-208: - Github user ilganeli closed the pull request at: https://github.com/apache/incubator-apex-core/pull/346 > StreamingContainer.java Contains Violations of Style Guide > -- > > Key: APEXCORE-208 > URL: https://issues.apache.org/jira/browse/APEXCORE-208 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Ilya Ganelin > > StreamingContainer.java needs to be updated to conform to CodeStyle checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-core pull request: [APEXCORE-208] Fixed checkstyle ...
Github user davidyan74 commented on the pull request: https://github.com/apache/incubator-apex-core/pull/346#issuecomment-222028058 @ilganeli I see that you're changing the code by wrapping the long lines. We some time ago decided on the Apex dev mailing list that we are not enforcing the character limit per line. See [here](http://mail-archives.apache.org/mod_mbox/apex-dev/201512.mbox/%3ccakjfldmw68-afrmtxj_7l8y36scojrzyhbkw3axcdmc8po0...@mail.gmail.com%3E). If you're using IntelliJ IDEA, you can turn on the "Soft Wrap" feature so that you don't have to fiddle with the horizontal scrollbar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...
Github user gauravgopi123 commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64840077 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java --- @@ -59,6 +58,9 @@ private long blockSize; private boolean sequencialFileRead = false; private int readersCount; + @NotNull + @Min(1) + protected Integer blocksThreshold; --- End diff -- why Integer type and not primitive int type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule
[ https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303198#comment-15303198 ] ASF GitHub Bot commented on APEXMALHAR-2096: Github user gauravgopi123 commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64840077 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java --- @@ -59,6 +58,9 @@ private long blockSize; private boolean sequencialFileRead = false; private int readersCount; + @NotNull + @Min(1) + protected Integer blocksThreshold; --- End diff -- why Integer type and not primitive int type? > Add blockThreshold parameter to FSInputModule > - > > Key: APEXMALHAR-2096 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Priyanka Gugale >Assignee: Priyanka Gugale > > FileSplitter is very fast, the downstream operators can't work at that speed, > so we need to limit the speed with which fileSplitter works. > One way to do is limit number of blocks emitted per window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule
[ https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303179#comment-15303179 ] ASF GitHub Bot commented on APEXMALHAR-2096: Github user chandnisingh commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64839243 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java --- @@ -53,6 +53,7 @@ * This is a threshold on the no. of blocks emitted per window. A lot of blocks emitted * per window can overwhelm the downstream operators. This setting helps to control that. */ + @NotNull --- End diff -- Please don't change the type. Having the constraint min(1) is sufficient > Add blockThreshold parameter to FSInputModule > - > > Key: APEXMALHAR-2096 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Priyanka Gugale >Assignee: Priyanka Gugale > > FileSplitter is very fast, the downstream operators can't work at that speed, > so we need to limit the speed with which fileSplitter works. > One way to do is limit number of blocks emitted per window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-208) StreamingContainer.java Contains Violations of Style Guide
[ https://issues.apache.org/jira/browse/APEXCORE-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303135#comment-15303135 ] ASF GitHub Bot commented on APEXCORE-208: - GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-core/pull/346 [APEXCORE-208] Fixed checkstyle violations in StreamingContainer.java - Fixed checkstyle violations in StreamingContainer.java. - Updated apex-style configuration in .idea to be consistent with CheckStyle settings in regards to line wrapping You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-208 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-core/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit 32570003bcff059ae9a8f70f7129baf9681ac8e6 Author: Ilya GanelinDate: 2016-05-26T22:58:32Z Fixed checkstyle violations in StreamingContainer.java. Updated apex-style configuration in .idea to be consistent with checkstyle settings. > StreamingContainer.java Contains Violations of Style Guide > -- > > Key: APEXCORE-208 > URL: https://issues.apache.org/jira/browse/APEXCORE-208 > Project: Apache Apex Core > Issue Type: Sub-task >Reporter: Ilya Ganelin > > StreamingContainer.java needs to be updated to conform to CodeStyle checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-core pull request: [APEXCORE-208] Fixed checkstyle ...
GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-core/pull/346 [APEXCORE-208] Fixed checkstyle violations in StreamingContainer.java - Fixed checkstyle violations in StreamingContainer.java. - Updated apex-style configuration in .idea to be consistent with CheckStyle settings in regards to line wrapping You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-208 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-core/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit 32570003bcff059ae9a8f70f7129baf9681ac8e6 Author: Ilya GanelinDate: 2016-05-26T22:58:32Z Fixed checkstyle violations in StreamingContainer.java. Updated apex-style configuration in .idea to be consistent with checkstyle settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...
Github user chandnisingh commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835548 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java --- @@ -53,6 +53,7 @@ * This is a threshold on the no. of blocks emitted per window. A lot of blocks emitted * per window can overwhelm the downstream operators. This setting helps to control that. */ + @NotNull --- End diff -- @DT-Priyanka It is of type primitive int. So default value will be 0. Since the min is 1, it will be necessary for the user to set it. @NotNull is not relevant here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule
[ https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303116#comment-15303116 ] ASF GitHub Bot commented on APEXMALHAR-2096: Github user DT-Priyanka commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835068 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java --- @@ -53,6 +53,7 @@ * This is a threshold on the no. of blocks emitted per window. A lot of blocks emitted * per window can overwhelm the downstream operators. This setting helps to control that. */ + @NotNull --- End diff -- As per discussions on pull request #283 we decided to make blocksThreshold as compulsory parameter, so user always sets it. > Add blockThreshold parameter to FSInputModule > - > > Key: APEXMALHAR-2096 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Priyanka Gugale >Assignee: Priyanka Gugale > > FileSplitter is very fast, the downstream operators can't work at that speed, > so we need to limit the speed with which fileSplitter works. > One way to do is limit number of blocks emitted per window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...
Github user DT-Priyanka commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835068 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java --- @@ -53,6 +53,7 @@ * This is a threshold on the no. of blocks emitted per window. A lot of blocks emitted * per window can overwhelm the downstream operators. This setting helps to control that. */ + @NotNull --- End diff -- As per discussions on pull request #283 we decided to make blocksThreshold as compulsory parameter, so user always sets it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (APEXCORE-172) Fix License Header In StatefulStreamCodec
[ https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXCORE-172: -- Assignee: Ilya Ganelin (was: Timothy Farkas) > Fix License Header In StatefulStreamCodec > - > > Key: APEXCORE-172 > URL: https://issues.apache.org/jira/browse/APEXCORE-172 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (APEXCORE-172) Fix License Header In StatefulStreamCodec
[ https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise closed APEXCORE-172. - Resolution: Fixed > Fix License Header In StatefulStreamCodec > - > > Key: APEXCORE-172 > URL: https://issues.apache.org/jira/browse/APEXCORE-172 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-core pull request: [APEXCORE-172] Fixed license hea...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-apex-core/pull/345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXCORE-172) Fix License Header In StatefulStreamCodec
[ https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303082#comment-15303082 ] ASF GitHub Bot commented on APEXCORE-172: - GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-core/pull/345 [APEXCORE-172] Fixed license header in StatefulStreamCodec.java * Fixed license header to be correct. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-172 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-core/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit b12e8b01c7f471a91d7891a11e04cdae93c03a90 Author: Ilya GanelinDate: 2016-05-26T22:30:15Z Fixed license header in StatefulStreamCodec.java > Fix License Header In StatefulStreamCodec > - > > Key: APEXCORE-172 > URL: https://issues.apache.org/jira/browse/APEXCORE-172 > Project: Apache Apex Core > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...
Github user gauravgopi123 commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64828487 --- Diff: library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java --- @@ -59,6 +58,9 @@ private long blockSize; private boolean sequencialFileRead = false; private int readersCount; + @NotNull --- End diff -- Is this required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2066) Add jdbc poller input operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303005#comment-15303005 ] ASF GitHub Bot commented on APEXMALHAR-2066: Github user gauravgopi123 commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64826973 --- Diff: library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java --- @@ -0,0 +1,652 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package com.datatorrent.lib.db.jdbc; + +import java.io.IOException; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.atomic.AtomicReference; + +import javax.validation.constraints.Min; + +import org.slf4j.LoggerFactory; + +import org.apache.apex.malhar.lib.wal.FSWindowDataManager; +import org.apache.apex.malhar.lib.wal.WindowDataManager; +import org.apache.commons.lang3.tuple.MutablePair; + +import com.datatorrent.api.Context; +import com.datatorrent.api.Context.OperatorContext; +import com.datatorrent.api.DefaultPartition; +import com.datatorrent.api.Operator.ActivationListener; +import com.datatorrent.api.Operator.IdleTimeHandler; +import com.datatorrent.api.Partitioner; +import com.datatorrent.api.annotation.Stateless; +import com.datatorrent.lib.db.AbstractStoreInputOperator; +import com.datatorrent.lib.util.KeyValPair; +import com.datatorrent.lib.util.KryoCloneUtils; +import com.datatorrent.netlet.util.DTThrowable; + +/** + * Abstract operator for for consuming data using JDBC interface + * User needs User needs to provide + * tableName,dbConnection,setEmitColumnList,look-up key + * Optionally batchSize,pollInterval,Look-up key and a where clause can be given + * + * This operator uses static partitioning to arrive at range queries for exactly + * once reads + * Assumption is that there is an ordered column using which range queries can + * be formed + * If an emitColumnList is provided, please ensure that the keyColumn is the + * first column in the list + * Range queries are formed using the {@link JdbcMetaDataUtility}} Output - + * comma separated list of the emit columns eg columnA,columnB,columnC + * + * @displayName Jdbc Polling Input Operator + * @category Input + * @tags database, sql, jdbc, partitionable,exactlyOnce + */ +public abstract class AbstractJdbcPollInputOperator extends AbstractStoreInputOperator--- End diff -- How is AbstractJdbcPollInputOperator different from AbstractJdbcInputOperator? > Add jdbc poller input operator > -- > > Key: APEXMALHAR-2066 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066 > Project: Apache Apex Malhar > Issue Type: Task >Reporter: Ashwin Chandra Putta >Assignee: devendra tagare > > Create a JDBC poller input operator that has the following features. > 1. poll from external jdbc store asynchronously in the input operator. > 2. polling frequency and batch size should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2066) Add jdbc poller input operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303002#comment-15303002 ] ASF GitHub Bot commented on APEXMALHAR-2066: Github user devtagare commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64826542 --- Diff: library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java --- @@ -0,0 +1,652 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package com.datatorrent.lib.db.jdbc; + +import java.io.IOException; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.atomic.AtomicReference; + +import javax.validation.constraints.Min; + +import org.slf4j.LoggerFactory; + +import org.apache.apex.malhar.lib.wal.FSWindowDataManager; +import org.apache.apex.malhar.lib.wal.WindowDataManager; +import org.apache.commons.lang3.tuple.MutablePair; + +import com.datatorrent.api.Context; +import com.datatorrent.api.Context.OperatorContext; +import com.datatorrent.api.DefaultPartition; +import com.datatorrent.api.Operator.ActivationListener; +import com.datatorrent.api.Operator.IdleTimeHandler; +import com.datatorrent.api.Partitioner; +import com.datatorrent.api.annotation.Stateless; +import com.datatorrent.lib.db.AbstractStoreInputOperator; +import com.datatorrent.lib.util.KeyValPair; +import com.datatorrent.lib.util.KryoCloneUtils; +import com.datatorrent.netlet.util.DTThrowable; + +/** + * Abstract operator for for consuming data using JDBC interface + * User needs User needs to provide + * tableName,dbConnection,setEmitColumnList,look-up key + * Optionally batchSize,pollInterval,Look-up key and a where clause can be given + * + * This operator uses static partitioning to arrive at range queries for exactly + * once reads + * Assumption is that there is an ordered column using which range queries can + * be formed + * If an emitColumnList is provided, please ensure that the keyColumn is the + * first column in the list + * Range queries are formed using the {@link JdbcMetaDataUtility}} Output - + * comma separated list of the emit columns eg columnA,columnB,columnC + * + * @displayName Jdbc Polling Input Operator + * @category Input + * @tags database, sql, jdbc, partitionable,exactlyOnce + */ +public abstract class AbstractJdbcPollInputOperator extends AbstractStoreInputOperator+implements ActivationListener, IdleTimeHandler, Partitioner +{ + /* + * poll interval in milliseconds + */ + private int pollInterval; + + @Min(1) + private int partitionCount = 1; + protected transient int operatorId; + protected transient boolean isReplayed; + protected transient boolean isPollable; + protected int batchSize; + protected int fetchSize; + /** + * Map of windowId to of the range key + */ + protected transient MutablePair currentWindowRecoveryState; + + /** + * size of the emit queue used to hold polled records before emit + */ + private int queueCapacity = 4 * 1024 * 1024; + private transient volatile boolean execute; + private transient AtomicReference cause; + protected transient int spinMillis; + private transient OperatorContext context; + protected String tableName; + protected String key; + protected long currentWindowId; + protected KeyValPair rangeQueryPair; + protected String lower; + protected String upper; + protected boolean
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2066 JdbcPolling,id...
Github user gauravgopi123 commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64825894 --- Diff: library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java --- @@ -0,0 +1,652 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package com.datatorrent.lib.db.jdbc; + +import java.io.IOException; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.concurrent.ArrayBlockingQueue; +import java.util.concurrent.atomic.AtomicReference; + +import javax.validation.constraints.Min; + +import org.slf4j.LoggerFactory; + +import org.apache.apex.malhar.lib.wal.FSWindowDataManager; +import org.apache.apex.malhar.lib.wal.WindowDataManager; +import org.apache.commons.lang3.tuple.MutablePair; + +import com.datatorrent.api.Context; +import com.datatorrent.api.Context.OperatorContext; +import com.datatorrent.api.DefaultPartition; +import com.datatorrent.api.Operator.ActivationListener; +import com.datatorrent.api.Operator.IdleTimeHandler; +import com.datatorrent.api.Partitioner; +import com.datatorrent.api.annotation.Stateless; +import com.datatorrent.lib.db.AbstractStoreInputOperator; +import com.datatorrent.lib.util.KeyValPair; +import com.datatorrent.lib.util.KryoCloneUtils; +import com.datatorrent.netlet.util.DTThrowable; + +/** + * Abstract operator for for consuming data using JDBC interface + * User needs User needs to provide + * tableName,dbConnection,setEmitColumnList,look-up key + * Optionally batchSize,pollInterval,Look-up key and a where clause can be given + * + * This operator uses static partitioning to arrive at range queries for exactly + * once reads + * Assumption is that there is an ordered column using which range queries can + * be formed + * If an emitColumnList is provided, please ensure that the keyColumn is the + * first column in the list + * Range queries are formed using the {@link JdbcMetaDataUtility}} Output - + * comma separated list of the emit columns eg columnA,columnB,columnC + * + * @displayName Jdbc Polling Input Operator + * @category Input + * @tags database, sql, jdbc, partitionable,exactlyOnce + */ +public abstract class AbstractJdbcPollInputOperator extends AbstractStoreInputOperator+implements ActivationListener, IdleTimeHandler, Partitioner +{ + /* + * poll interval in milliseconds + */ + private int pollInterval; + + @Min(1) + private int partitionCount = 1; + protected transient int operatorId; + protected transient boolean isReplayed; + protected transient boolean isPollable; + protected int batchSize; + protected int fetchSize; + /** + * Map of windowId to of the range key + */ + protected transient MutablePair currentWindowRecoveryState; + + /** + * size of the emit queue used to hold polled records before emit + */ + private int queueCapacity = 4 * 1024 * 1024; + private transient volatile boolean execute; + private transient AtomicReference cause; + protected transient int spinMillis; + private transient OperatorContext context; + protected String tableName; + protected String key; + protected long currentWindowId; + protected KeyValPair rangeQueryPair; + protected String lower; + protected String upper; + protected boolean recovered; + protected boolean isPolled; + protected String whereCondition = null; + protected String previousUpperBound; + protected String highestPolled; + private static final String user = "user"; + private static final
[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302894#comment-15302894 ] ASF GitHub Bot commented on APEXMALHAR-2094: Github user sandeep-n commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818531 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions()
[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302895#comment-15302895 ] ASF GitHub Bot commented on APEXMALHAR-2094: Github user sandeep-n commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818557 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions()
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...
Github user sandeep-n commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818557 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions() + { +return fractions; + } + + public void setFractions(double[] fractions) + { +this.fractions = fractions; + } + + public double[] getPmfIntervals() + { +return pmfIntervals; + }
Re: IntelliJ and Netbeans code styles are missing
Another question - are CheckStyle settings packaged anywhere? There is obviously a CheckStyle run by Jenkins - is this CheckStyle published for Apex or Malhar? On 5/24/16, 2:19 PM, "Ganelin, Ilya"wrote: >I just realized that these were moved to subdirectories but the associated >documentation was never updated. >I’ll fix that. > > > > >On 5/24/16, 2:16 PM, "Ganelin, Ilya" wrote: > >>Hi all – I’ve been setting up a new dev environment and noticed that the >>apex-style.jar and apex-style.zip are missing from: >>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/intellij >>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/netbeans >> >>Did these get lost in the migration from Apache or did we decide to approach >>this a different way? >> >>In recent work I did in the Alluxio project, I created a utility that would >>package/unpack code-style settings allowing changes to be tracked in GitHub >>but still provide a relatively automated process for setting up >>configurations. Would something like that make sense for Apex? >> >>https://github.com/Alluxio/alluxio/pull/3168 >> >> >> >>The information contained in this e-mail is confidential and/or proprietary >>to Capital One and/or its affiliates and may only be used solely in >>performance of work or services for Capital One. The information transmitted >>herewith is intended only for use by the individual or entity to which it is >>addressed. If the reader of this message is not the intended recipient, you >>are hereby notified that any review, retransmission, dissemination, >>distribution, copying or other use of, or taking of any action in reliance >>upon this information is strictly prohibited. If you have received this >>communication in error, please contact the sender and delete the material >>from your computer. > > >The information contained in this e-mail is confidential and/or proprietary to >Capital One and/or its affiliates and may only be used solely in performance >of work or services for Capital One. The information transmitted herewith is >intended only for use by the individual or entity to which it is addressed. If >the reader of this message is not the intended recipient, you are hereby >notified that any review, retransmission, dissemination, distribution, copying >or other use of, or taking of any action in reliance upon this information is >strictly prohibited. If you have received this communication in error, please >contact the sender and delete the material from your computer. The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302823#comment-15302823 ] ASF GitHub Bot commented on APEXMALHAR-2094: Github user ilganeli commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810211 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions()
[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator
[ https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302821#comment-15302821 ] ASF GitHub Bot commented on APEXMALHAR-2094: Github user ilganeli commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810147 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions()
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810211 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions() + { +return fractions; + } + + public void setFractions(double[] fractions) + { +this.fractions = fractions; + } + + public double[] getPmfIntervals() + { +return pmfIntervals; + }
[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810147 --- Diff: sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java --- @@ -0,0 +1,188 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.apex.malhar.sketches; + +import com.yahoo.sketches.quantiles.QuantilesSketch; + +import com.datatorrent.api.DefaultInputPort; +import com.datatorrent.api.DefaultOutputPort; +import com.datatorrent.api.annotation.OperatorAnnotation; +import com.datatorrent.common.util.BaseOperator; + +/** + * An implementation of BaseOperator that computes a "sketch" (a representation of the probability distribution using + * a low memory footprint) of the incoming numeric data, and evaluates/outputs the cumulative distribution function and + * quantiles of the probability distribution. Leverages the quantiles sketch implementation from the Yahoo Datasketches + * Library. + * + * Input Port(s) : + * data : Data values input port. + * + * Output Port(s) : + * cdfOutput : cumulative distribution function output port. + * quantilesOutput : quantiles output port. + * + * Partitions : No, no will yield wrong results. + * + + */ +@OperatorAnnotation(partitionable = false) +public class QuantilesEstimator extends BaseOperator +{ + private transient QuantilesSketch quantilesSketch = QuantilesSketch.builder().build(); + + /** + * Constructor that allows non-default initialization of the quantile sketch object + * + * @param k:Parameter that determines accuracy and memory usage of quantile sketch. See QuantilesSketch documentation + * for details + * @param seed: The quantile sketch algorithm is inherently random. Set seed to 0 for reproducibility in testing, but + * do not set otherwise. + */ + public QuantilesEstimator(int k, short seed) + { +quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k); + } + + /** + * Output port that emits cdf estimated at the current data point + */ + public final transient DefaultOutputPort cdfOutput = new DefaultOutputPort<>(); + /** + * Emits quantiles of stream seen thus far + */ + public final transient DefaultOutputPortquantilesOutput = new DefaultOutputPort<>(); + /** + * Emits probability masses on specified intervals + */ + public final transient DefaultOutputPort pmfOutput = new DefaultOutputPort<>(); + /** + * This operator computes three different quantities which are output on separate output ports. If not using any of + * these quantities, these variables can be set to avoid unnecessary computation. + */ + + private boolean computeCdf = true; + private boolean computeQuantiles = true; + private boolean computePmf = true; + + public boolean isComputeCdf() + { +return computeCdf; + } + + public void setComputeCdf(boolean computeCdf) + { +this.computeCdf = computeCdf; + } + + public boolean isComputeQuantiles() + { +return computeQuantiles; + } + + public void setComputeQuantiles(boolean computeQuantiles) + { +this.computeQuantiles = computeQuantiles; + } + + public boolean isComputePmf() + { +return computePmf; + } + + public void setComputePmf(boolean computePmf) + { +this.computePmf = computePmf; + } + + public int getK() + { +return quantilesSketch.getK(); + } + + public double[] getFractions() + { +return fractions; + } + + public void setFractions(double[] fractions) + { +this.fractions = fractions; + } + + public double[] getPmfIntervals() + { +return pmfIntervals; + }
[jira] [Updated] (APEXMALHAR-2102) Add A Clone Partitioner Which Sends The Same Data To Each Partition
[ https://issues.apache.org/jira/browse/APEXMALHAR-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated APEXMALHAR-2102: - Assignee: Ilya Ganelin > Add A Clone Partitioner Which Sends The Same Data To Each Partition > --- > > Key: APEXMALHAR-2102 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2102 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > > This should go into com.datatorrent.common.partitioner and would be very > similar to the StatelessPartitioner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (APEXMALHAR-2102) Add A Clone Partitioner Which Sends The Same Data To Each Partition
[ https://issues.apache.org/jira/browse/APEXMALHAR-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise moved APEXCORE-146 to APEXMALHAR-2102: --- Assignee: (was: Ilya Ganelin) Workflow: Default workflow, editable Closed status (was: jira) Key: APEXMALHAR-2102 (was: APEXCORE-146) Project: Apache Apex Malhar (was: Apache Apex Core) > Add A Clone Partitioner Which Sends The Same Data To Each Partition > --- > > Key: APEXMALHAR-2102 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2102 > Project: Apache Apex Malhar > Issue Type: New Feature >Reporter: Timothy Farkas > > This should go into com.datatorrent.common.partitioner and would be very > similar to the StatelessPartitioner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition
[ https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302786#comment-15302786 ] ASF GitHub Bot commented on APEXCORE-146: - Github user ilganeli commented on the pull request: https://github.com/apache/incubator-apex-core/pull/344#issuecomment-221973428 Moved to https://github.com/apache/incubator-apex-malhar/pull/297 > Add A Clone Partitioner Which Sends The Same Data To Each Partition > --- > > Key: APEXCORE-146 > URL: https://issues.apache.org/jira/browse/APEXCORE-146 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > > This should go into com.datatorrent.common.partitioner and would be very > similar to the StatelessPartitioner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition
[ https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302783#comment-15302783 ] ASF GitHub Bot commented on APEXCORE-146: - GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-malhar/pull/297 [APEXCORE-146] Add ClonePartitioner **Note:** This is still marked as an Apex issue even though it should live in Malhar. Once the JIRA is updated I'll update title and reference. * Created a Clone partitioner similar to the StatelessPartitioner that assigns all data to all partitions * Added a simple unit test suite to test scale up and scale down **Questions** Is there a good place/way to add a test to verify that data is partitioned appropriately? For example, the stateless partitioner assigns keys based on the nearest power of two - is that validated anywhere? How is the serialVersionUUID generated? Does the manual assignment of ports to PartitionKeys need to happen for each Partition? There is a comment in the Partition class that by default all data is sent to all partitions. However, digging into the implementation in depth doesn't show that behavior. Instead, it seems data is assigned based on the mask associated with each Partition. This references: https://issues.apache.org/jira/browse/APEXCORE-146 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-malhar APEXCORE-146 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-malhar/pull/297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #297 commit 74f6fd5113a3d488015a9817790018dd63b5e8cb Author: Ilya GanelinDate: 2016-05-26T19:39:24Z Added a new partitioner that replicates data across all partitions by default. > Add A Clone Partitioner Which Sends The Same Data To Each Partition > --- > > Key: APEXCORE-146 > URL: https://issues.apache.org/jira/browse/APEXCORE-146 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > > This should go into com.datatorrent.common.partitioner and would be very > similar to the StatelessPartitioner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition
[ https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302784#comment-15302784 ] ASF GitHub Bot commented on APEXCORE-146: - Github user ilganeli closed the pull request at: https://github.com/apache/incubator-apex-core/pull/344 > Add A Clone Partitioner Which Sends The Same Data To Each Partition > --- > > Key: APEXCORE-146 > URL: https://issues.apache.org/jira/browse/APEXCORE-146 > Project: Apache Apex Core > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Ilya Ganelin > > This should go into com.datatorrent.common.partitioner and would be very > similar to the StatelessPartitioner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-apex-malhar pull request: [APEXCORE-146] Add ClonePartit...
GitHub user ilganeli opened a pull request: https://github.com/apache/incubator-apex-malhar/pull/297 [APEXCORE-146] Add ClonePartitioner **Note:** This is still marked as an Apex issue even though it should live in Malhar. Once the JIRA is updated I'll update title and reference. * Created a Clone partitioner similar to the StatelessPartitioner that assigns all data to all partitions * Added a simple unit test suite to test scale up and scale down **Questions** Is there a good place/way to add a test to verify that data is partitioned appropriately? For example, the stateless partitioner assigns keys based on the nearest power of two - is that validated anywhere? How is the serialVersionUUID generated? Does the manual assignment of ports to PartitionKeys need to happen for each Partition? There is a comment in the Partition class that by default all data is sent to all partitions. However, digging into the implementation in depth doesn't show that behavior. Instead, it seems data is assigned based on the mask associated with each Partition. This references: https://issues.apache.org/jira/browse/APEXCORE-146 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/incubator-apex-malhar APEXCORE-146 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-apex-malhar/pull/297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #297 commit 74f6fd5113a3d488015a9817790018dd63b5e8cb Author: Ilya GanelinDate: 2016-05-26T19:39:24Z Added a new partitioner that replicates data across all partitions by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-apex-core pull request: [APEXCORE-146] Add ClonePartitio...
Github user ilganeli commented on the pull request: https://github.com/apache/incubator-apex-core/pull/344#issuecomment-221973428 Moved to https://github.com/apache/incubator-apex-malhar/pull/297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2063) Integrate WAL to FS WindowDataManager
[ https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301685#comment-15301685 ] Chandni Singh commented on APEXMALHAR-2063: --- Window Data Manager supports dynamic partitioning in operator by allowing an instance to read data saved for a particular window id by all operator instances. The following method provides that support {code} /** * When an operator can partition itself dynamically then there is no guarantee that an input state which was being * handled by one instance previously will be handled by the same instance after partitioning. * For eg. An {@link AbstractFileInputOperator} instance which reads a File X till offset l (not check-pointed) may no * longer be the instance that handles file X after repartitioning as no. of instances may have changed and file X * is re-hashed to another instance. * The new instance wouldn't know from what point to read the File X unless it reads the idempotent storage of all the * operators for the window being replayed and fix it's state. * * @param windowId window id. * @return mapping of operator id to the corresponding state * @throws IOException */ Mapload(long windowId) throws IOException; {code} To provide the support for above with FileSystemWAL becomes complicated. Currently the FileSystemWAL reader and writer are assumed to be in the same physical partition. However, supporting above requires multiple readers which are in different physical partitions than the writer. So the FileSystem WAL needs to be changed, in order to be used in read-only mode. > Integrate WAL to FS WindowDataManager > - > > Key: APEXMALHAR-2063 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh > > FS Window Data Manager is used to save meta-data that helps in replaying > tuples every completed application window after failure. For this it saves > meta-data in a file per window. Having multiple small size files on hdfs > cause issues as highlighted here: > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > Instead FS Window Data Manager can utilize the WAL to write data and maintain > a mapping of how much data was flushed to WAL each window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)