[jira] [Created] (APEXMALHAR-2367) CassandraTransactionalStore should handle erroneous records
Priyanka Gugale created APEXMALHAR-2367: --- Summary: CassandraTransactionalStore should handle erroneous records Key: APEXMALHAR-2367 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2367 Project: Apache Apex Malhar Issue Type: Improvement Reporter: Priyanka Gugale Priority: Minor If CassandraTransactionalStore batch command execution fails, there should be a way to configure retry attempt and after that the statement / batch should be dropped or emitted to error port without blocking application execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [GitHub] apex-core pull request #426: APEXCORE-558 Change highlight color to red and ...
I filed https://issues.apache.org/jira/browse/INFRA-13027 for the Apache Jenkins build failures. Thank you, Vlad On 12/5/16 13:57, Sanjay Pujare wrote: The build triggered by this PR failed with this weird error (see https://builds.apache.org/job/Apex_Core_PR/204/console) git clean -fdx # timeout=10 Parsing POMs Modules changed, recalculating dependency graph Build timed out (after 20 minutes). Marking the build as failed. Build was aborted Putting comment on the pull request Finished: FAILURE Does anyone know what the problem is? How does one fix it? On 12/2/16, 2:19 PM, "sanjaypujare"wrote: GitHub user sanjaypujare opened a pull request: https://github.com/apache/apex-core/pull/426 APEXCORE-558 Change highlight color to red and implement quit command @amberarrow could you review and merge pls? You can merge this pull request into a Git repository by running: $ git pull https://github.com/sanjaypujare/apex-core APEXCORE-558.sanjay Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-core/pull/426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #426 commit 2f3bf30e34fe24367663530133e20b3626784878 Author: Sanjay Pujare Date: 2016-12-02T22:17:24Z APEXCORE-558 Change highlight color to red and implement quit command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS
[ https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723970#comment-15723970 ] Thomas Weise commented on APEXCORE-586: --- There are already points at which the physical plan is checkpointed, and the same should be used here also. There is no need to do anything separate for the logical plan, it is part of the physical plan. Also please do not add any more code to StreamingContainerManager, see how you can modularize this. > Enhancement to persist logical and physical plan snapshots in HDFS > -- > > Key: APEXCORE-586 > URL: https://issues.apache.org/jira/browse/APEXCORE-586 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Assignee: Sanjay M Pujare > > Pls refer to the discussion on dev@apex > http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS
[ https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723879#comment-15723879 ] Sanjay M Pujare commented on APEXCORE-586: -- The following is the implementation plan. We will soon start the implementation so pls send your feedback as soon as possible. We will refactor code to move the logic in com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String) and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan() to the class com.datatorrent.stram.StreamingContainerManager So these 2 methods will return JSONObject representation of the logical or physical plan. 2 new methods will be added to StreamingContainerManager: writePhysicalPlan() writeLogicalPlan() The first method will create an HDFS file with the name physicalPlan_NNN_MMM.json under the current app's app directory (where NNN is the current app attempt id and MMM is a running count (maintained in the StreamingContainerManager object) that is incremented on every call to the method. The method will call getPhysicalPlan() to get the JSON object and write it out to the HDFS file. writeLogicalPlan() will be similarly written to write the logical plan to the file logicalPlan_NNN_MMM.json. There is a separate count for the MMM for logical plans. The method com.datatorrent.stram.plan.physical.PhysicalPlan.onStatusUpdate(PTOperator) will be modified to accept a parameter for the StreamingContainerManager object. Inside this method after redoPartition() is called, writePhysicalPlan() will be called on the StreamingContainerManager object which will write a new physicalPlan_NNN_MMM.json file. Similarly in com.datatorrent.stram.StreamingContainerManager.LogicalPlanChangeRunnable.call() at the end writeLogicalPlan() will be called which will write a new logicalPlan_NNN_MMM.json file. Unit tests will be written to cover existing functionality and new functionality to minimize the possibility of regressions and to cover the new code. > Enhancement to persist logical and physical plan snapshots in HDFS > -- > > Key: APEXCORE-586 > URL: https://issues.apache.org/jira/browse/APEXCORE-586 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Assignee: Sanjay M Pujare > > Pls refer to the discussion on dev@apex > http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXCORE-405) Provide an API to launch DAG on the cluster
[ https://issues.apache.org/jira/browse/APEXCORE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723850#comment-15723850 ] ASF GitHub Bot commented on APEXCORE-405: - Github user asfgit closed the pull request at: https://github.com/apache/apex-core/pull/429 > Provide an API to launch DAG on the cluster > --- > > Key: APEXCORE-405 > URL: https://issues.apache.org/jira/browse/APEXCORE-405 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Pramod Immaneni >Assignee: Pramod Immaneni > Fix For: 3.5.0 > > > Today API exists to launch a DAG in local mode but such an API is not > available to launch the app on the cluster, only a CLI tool is available. > Provide an API to be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-core pull request #429: APEXCORE-405 Make getDAG/prepareDAG available t...
Github user asfgit closed the pull request at: https://github.com/apache/apex-core/pull/429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS
[ https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjay M Pujare updated APEXCORE-586: - Description: Pls refer to the discussion on dev@apex http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html was: Pls refer to the discussion on dev@apex http://mail-archives.apache.org/mod_mbox/apex-dev/201611.mbox/browser > Enhancement to persist logical and physical plan snapshots in HDFS > -- > > Key: APEXCORE-586 > URL: https://issues.apache.org/jira/browse/APEXCORE-586 > Project: Apache Apex Core > Issue Type: Improvement >Reporter: Sanjay M Pujare >Assignee: Sanjay M Pujare > > Pls refer to the discussion on dev@apex > http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [GitHub] apex-core pull request #426: APEXCORE-558 Change highlight color to red and ...
The build triggered by this PR failed with this weird error (see https://builds.apache.org/job/Apex_Core_PR/204/console) > git clean -fdx # timeout=10 Parsing POMs Modules changed, recalculating dependency graph Build timed out (after 20 minutes). Marking the build as failed. Build was aborted Putting comment on the pull request Finished: FAILURE Does anyone know what the problem is? How does one fix it? On 12/2/16, 2:19 PM, "sanjaypujare"wrote: GitHub user sanjaypujare opened a pull request: https://github.com/apache/apex-core/pull/426 APEXCORE-558 Change highlight color to red and implement quit command @amberarrow could you review and merge pls? You can merge this pull request into a Git repository by running: $ git pull https://github.com/sanjaypujare/apex-core APEXCORE-558.sanjay Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-core/pull/426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #426 commit 2f3bf30e34fe24367663530133e20b3626784878 Author: Sanjay Pujare Date: 2016-12-02T22:17:24Z APEXCORE-558 Change highlight color to red and implement quit command --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket
[ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723172#comment-15723172 ] Munagala V. Ramanath commented on APEXMALHAR-2366: -- As the number of elements increases, the probability of false positives increases; are we planning to recreate the filter with a larger bit vector or is this automatically done by the Hadoop implementation ? > Apply BloomFilter to Bucket > --- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: bright chen >Assignee: bright chen > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if > the entry is not in the cache. The checking from files is a pretty heavy > operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be > very huge and the BloomFilter maybe not useful after a while. But If the > buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain > amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and > fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use > Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket
[ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723125#comment-15723125 ] bright chen commented on APEXMALHAR-2366: - The BloomFilter will create bits in memory, and this memory should be released when purge Bucket. > Apply BloomFilter to Bucket > --- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: bright chen >Assignee: bright chen > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if > the entry is not in the cache. The checking from files is a pretty heavy > operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be > very huge and the BloomFilter maybe not useful after a while. But If the > buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain > amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and > fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use > Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket
[ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723065#comment-15723065 ] bright chen commented on APEXMALHAR-2366: - It's better use Guava BloomFilter as it's type safe and can support different type of key. But due to the compatible problem mentioned in previous comment. I'll integrate Hadoop BloomFilter which which the type of key is 'Key' > Apply BloomFilter to Bucket > --- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: bright chen >Assignee: bright chen > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if > the entry is not in the cache. The checking from files is a pretty heavy > operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be > very huge and the BloomFilter maybe not useful after a while. But If the > buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain > amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and > fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use > Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (APEXMALHAR-2365) LogParser - Operator to parse byte array using log format and emit a POJO
[ https://issues.apache.org/jira/browse/APEXMALHAR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shraddha updated APEXMALHAR-2365: - Description: A generic parser which takes the logFileFormat(schema with field and regex) as a parameter and parse the log Input: Byte Array Output: POJO Parameters: -logFileFormat - schema containing fields and regex -encoding - for converting tuple to String -tuple class POJO Schema Configuration is required and the order of schema should match with the log. was: A generic parser which takes the logFileFormat(schema with field and regex) as a parameter and parse the log Input: Byte Array Output: POJO Parameters: -logFileFormat - schema containing fields and regex -tuple class POJO Schema Configuration is required and the order of schema should match with the log. > LogParser - Operator to parse byte array using log format and emit a POJO > - > > Key: APEXMALHAR-2365 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2365 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Shraddha >Assignee: Shraddha > > A generic parser which takes the logFileFormat(schema with field and regex) > as a parameter and parse the log > Input: Byte Array > Output: POJO > Parameters: > -logFileFormat - schema containing fields and regex > -encoding - for converting tuple to String > -tuple class > POJO Schema Configuration is required and the order of schema should match > with the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] apex-malhar pull request #520: APEXMALHAR-2365-Creation of generic log parse...
GitHub user jogshraddha opened a pull request: https://github.com/apache/apex-malhar/pull/520 APEXMALHAR-2365-Creation of generic log parser Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365) APEXMALHAR-2365-Creation of generic log parser You can merge this pull request into a Git repository by running: $ git pull https://github.com/jogshraddha/apex-malhar APEXMALHAR-2365-LogParser Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #520 commit c61835fee785c69c9fd04a98bec9bcc193c10d5c Author: jogshraddhaDate: 2016-12-02T08:25:53Z Creation of generic log parser --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (APEXMALHAR-2365) LogParser - Operator to parse byte array using log format and emit a POJO
[ https://issues.apache.org/jira/browse/APEXMALHAR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721871#comment-15721871 ] ASF GitHub Bot commented on APEXMALHAR-2365: GitHub user jogshraddha opened a pull request: https://github.com/apache/apex-malhar/pull/520 APEXMALHAR-2365-Creation of generic log parser Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365) APEXMALHAR-2365-Creation of generic log parser You can merge this pull request into a Git repository by running: $ git pull https://github.com/jogshraddha/apex-malhar APEXMALHAR-2365-LogParser Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #520 commit c61835fee785c69c9fd04a98bec9bcc193c10d5c Author: jogshraddhaDate: 2016-12-02T08:25:53Z Creation of generic log parser > LogParser - Operator to parse byte array using log format and emit a POJO > - > > Key: APEXMALHAR-2365 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2365 > Project: Apache Apex Malhar > Issue Type: Improvement >Reporter: Shraddha >Assignee: Shraddha > > A generic parser which takes the logFileFormat(schema with field and regex) > as a parameter and parse the log > Input: Byte Array > Output: POJO > Parameters: > -logFileFormat - schema containing fields and regex > -encoding - for converting tuple to String > -tuple class > POJO Schema Configuration is required and the order of schema should match > with the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)