[jira] [Created] (APEXMALHAR-2367) CassandraTransactionalStore should handle erroneous records

2016-12-05 Thread Priyanka Gugale (JIRA)
Priyanka Gugale created APEXMALHAR-2367:
---

 Summary: CassandraTransactionalStore should handle erroneous 
records
 Key: APEXMALHAR-2367
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2367
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Priyanka Gugale
Priority: Minor


If CassandraTransactionalStore batch command execution fails, there should be a 
way to configure retry attempt and after that the statement / batch should be 
dropped or emitted to error port without blocking application execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [GitHub] apex-core pull request #426: APEXCORE-558 Change highlight color to red and ...

2016-12-05 Thread Vlad Rozov
I filed https://issues.apache.org/jira/browse/INFRA-13027 for the Apache 
Jenkins build failures.


Thank you,

Vlad

On 12/5/16 13:57, Sanjay Pujare wrote:

The build triggered by this PR failed with this weird error (see 
https://builds.apache.org/job/Apex_Core_PR/204/console)

  


git clean -fdx # timeout=10

Parsing POMs

Modules changed, recalculating dependency graph

Build timed out (after 20 minutes). Marking the build as failed.

Build was aborted

Putting comment on the pull request

Finished: FAILURE

  

  


Does anyone know what the problem is? How does one fix it?

  

  


On 12/2/16, 2:19 PM, "sanjaypujare"  wrote:

  


 GitHub user sanjaypujare opened a pull request:

 


 https://github.com/apache/apex-core/pull/426

 


 APEXCORE-558 Change highlight color to red and implement quit command

 


 @amberarrow could you review and merge pls?

 


 You can merge this pull request into a Git repository by running:

 


 $ git pull https://github.com/sanjaypujare/apex-core 
APEXCORE-558.sanjay

 


 Alternatively you can review and apply these changes as the patch at:

 


 https://github.com/apache/apex-core/pull/426.patch

 


 To close this pull request, make a commit to your master/trunk branch

 with (at least) the following in the commit message:

 


 This closes #426

 


 

 commit 2f3bf30e34fe24367663530133e20b3626784878

 Author: Sanjay Pujare 

 Date:   2016-12-02T22:17:24Z

 


 APEXCORE-558 Change highlight color to red and implement quit command

 


 

 

 


 ---

 If your project is set up for it, you can reply to this email and have your

 reply appear on GitHub as well. If your project does not have this feature

 enabled and wishes so, or if the feature is enabled but not working, please

 contact infrastructure at infrastruct...@apache.org or file a JIRA ticket

 with INFRA.

 ---

 







[jira] [Commented] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS

2016-12-05 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723970#comment-15723970
 ] 

Thomas Weise commented on APEXCORE-586:
---

There are already points at which the physical plan is checkpointed, and the 
same should be used here also.

There is no need to do anything separate for the logical plan, it is part of 
the physical plan.

Also please do not add any more code to StreamingContainerManager, see how you 
can modularize this. 

> Enhancement to persist logical and physical plan snapshots in HDFS
> --
>
> Key: APEXCORE-586
> URL: https://issues.apache.org/jira/browse/APEXCORE-586
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sanjay M Pujare
>Assignee: Sanjay M Pujare
>
> Pls refer to the discussion on dev@apex 
> http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS

2016-12-05 Thread Sanjay M Pujare (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723879#comment-15723879
 ] 

Sanjay M Pujare commented on APEXCORE-586:
--

The following is the implementation plan. We will soon start the implementation 
so pls send your feedback as soon as possible.

We will refactor code to move the logic in

com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String)

and

com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan()

to the class

com.datatorrent.stram.StreamingContainerManager

So these 2 methods will return JSONObject representation of the logical or 
physical plan.

2 new methods will be added to StreamingContainerManager:

writePhysicalPlan()
writeLogicalPlan()

The first method will create an HDFS file with the name 
physicalPlan_NNN_MMM.json under the current app's app directory (where NNN is 
the current app attempt id and MMM is a running count (maintained in the 
StreamingContainerManager object) that   is incremented on every call to 
the method. The method will call getPhysicalPlan() to get the JSON object and 
write it out to the HDFS file. writeLogicalPlan() will be similarly written to 
write the logical plan to the file logicalPlan_NNN_MMM.json. There is a 
separate count for the MMM for logical plans.

The method 
com.datatorrent.stram.plan.physical.PhysicalPlan.onStatusUpdate(PTOperator) 
will be modified to accept a parameter for the StreamingContainerManager 
object. Inside this method after redoPartition() is called, 
writePhysicalPlan() will be called on the StreamingContainerManager object 
which will write a new physicalPlan_NNN_MMM.json file.

Similarly in 
com.datatorrent.stram.StreamingContainerManager.LogicalPlanChangeRunnable.call()
 at the end writeLogicalPlan() will be called which will write a new 
logicalPlan_NNN_MMM.json file.

Unit tests will be written to cover existing functionality and new 
functionality to minimize the possibility of regressions and to cover the new 
code.


> Enhancement to persist logical and physical plan snapshots in HDFS
> --
>
> Key: APEXCORE-586
> URL: https://issues.apache.org/jira/browse/APEXCORE-586
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sanjay M Pujare
>Assignee: Sanjay M Pujare
>
> Pls refer to the discussion on dev@apex 
> http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-405) Provide an API to launch DAG on the cluster

2016-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723850#comment-15723850
 ] 

ASF GitHub Bot commented on APEXCORE-405:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/429


> Provide an API to launch DAG on the cluster
> ---
>
> Key: APEXCORE-405
> URL: https://issues.apache.org/jira/browse/APEXCORE-405
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
> Fix For: 3.5.0
>
>
> Today API exists to launch a DAG in local mode but such an API is not 
> available to launch the app on the cluster, only a CLI tool is available. 
> Provide an API to be able to do this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #429: APEXCORE-405 Make getDAG/prepareDAG available t...

2016-12-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/429


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXCORE-586) Enhancement to persist logical and physical plan snapshots in HDFS

2016-12-05 Thread Sanjay M Pujare (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay M Pujare updated APEXCORE-586:
-
Description: 
Pls refer to the discussion on dev@apex 

http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html

  was:
Pls refer to the discussion on dev@apex 

http://mail-archives.apache.org/mod_mbox/apex-dev/201611.mbox/browser


> Enhancement to persist logical and physical plan snapshots in HDFS
> --
>
> Key: APEXCORE-586
> URL: https://issues.apache.org/jira/browse/APEXCORE-586
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Sanjay M Pujare
>Assignee: Sanjay M Pujare
>
> Pls refer to the discussion on dev@apex 
> http://apache-apex-developers-list.78491.x6.nabble.com/Proposing-a-new-feature-to-persist-logical-and-physical-plan-snapshots-in-HDFS-td11592.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [GitHub] apex-core pull request #426: APEXCORE-558 Change highlight color to red and ...

2016-12-05 Thread Sanjay Pujare
The build triggered by this PR failed with this weird error (see 
https://builds.apache.org/job/Apex_Core_PR/204/console)

 

> git clean -fdx # timeout=10

Parsing POMs

Modules changed, recalculating dependency graph

Build timed out (after 20 minutes). Marking the build as failed.

Build was aborted

Putting comment on the pull request

Finished: FAILURE

 

 

Does anyone know what the problem is? How does one fix it?

 

 

On 12/2/16, 2:19 PM, "sanjaypujare"  wrote:

 

    GitHub user sanjaypujare opened a pull request:

    

https://github.com/apache/apex-core/pull/426

    

APEXCORE-558 Change highlight color to red and implement quit command

    

@amberarrow could you review and merge pls?

    

You can merge this pull request into a Git repository by running:

    

$ git pull https://github.com/sanjaypujare/apex-core APEXCORE-558.sanjay

    

Alternatively you can review and apply these changes as the patch at:

    

https://github.com/apache/apex-core/pull/426.patch

    

To close this pull request, make a commit to your master/trunk branch

    with (at least) the following in the commit message:

    

This closes #426

    



    commit 2f3bf30e34fe24367663530133e20b3626784878

    Author: Sanjay Pujare 

    Date:   2016-12-02T22:17:24Z

    

APEXCORE-558 Change highlight color to red and implement quit command

    



    



---

    If your project is set up for it, you can reply to this email and have your

    reply appear on GitHub as well. If your project does not have this feature

    enabled and wishes so, or if the feature is enabled but not working, please

    contact infrastructure at infrastruct...@apache.org or file a JIRA ticket

    with INFRA.

    ---

    



[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket

2016-12-05 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723172#comment-15723172
 ] 

Munagala V. Ramanath commented on APEXMALHAR-2366:
--

As the number of elements increases, the probability of false positives 
increases; are we planning to recreate the filter with a larger bit vector or 
is this automatically done by the Hadoop implementation ?

> Apply BloomFilter to Bucket
> ---
>
> Key: APEXMALHAR-2366
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if 
> the entry is not in the cache. The checking from files is a pretty heavy 
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
> very huge and the BloomFilter maybe not useful after a while. But If the 
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and 
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use 
> Sink while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket

2016-12-05 Thread bright chen (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723125#comment-15723125
 ] 

bright chen commented on APEXMALHAR-2366:
-

The BloomFilter will create bits in memory, and this memory should be released 
when purge Bucket.

> Apply BloomFilter to Bucket
> ---
>
> Key: APEXMALHAR-2366
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if 
> the entry is not in the cache. The checking from files is a pretty heavy 
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
> very huge and the BloomFilter maybe not useful after a while. But If the 
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and 
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use 
> Sink while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket

2016-12-05 Thread bright chen (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723065#comment-15723065
 ] 

bright chen commented on APEXMALHAR-2366:
-

It's better use Guava BloomFilter as it's type safe and can support different 
type of key. But due to the compatible problem mentioned in previous comment. 
I'll integrate Hadoop BloomFilter which which the type of key is 'Key'

> Apply BloomFilter to Bucket
> ---
>
> Key: APEXMALHAR-2366
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: bright chen
>Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
>
> The bucket get() will check the cache and then check from the stored files if 
> the entry is not in the cache. The checking from files is a pretty heavy 
> operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be 
> very huge and the BloomFilter maybe not useful after a while. But If the 
> buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain 
> amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and 
> fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use 
> Sink while Guava 14 use PrimitiveSink).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2365) LogParser - Operator to parse byte array using log format and emit a POJO

2016-12-05 Thread Shraddha (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shraddha updated APEXMALHAR-2365:
-
Description: 
A generic parser which takes the logFileFormat(schema with field and regex) as 
a parameter and parse the log 
Input: Byte Array
Output: POJO
Parameters: 
-logFileFormat - schema containing fields and regex
-encoding - for converting tuple to String
-tuple class
 POJO Schema Configuration is required and the order of schema should match 
with the log.

  was:
A generic parser which takes the logFileFormat(schema with field and regex) as 
a parameter and parse the log 
Input: Byte Array
Output: POJO
Parameters: 
-logFileFormat - schema containing fields and regex
-tuple class
 POJO Schema Configuration is required and the order of schema should match 
with the log.


> LogParser - Operator to parse byte array using log format and emit a POJO
> -
>
> Key: APEXMALHAR-2365
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2365
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Shraddha
>Assignee: Shraddha
>
> A generic parser which takes the logFileFormat(schema with field and regex) 
> as a parameter and parse the log 
> Input: Byte Array
> Output: POJO
> Parameters: 
> -logFileFormat - schema containing fields and regex
> -encoding - for converting tuple to String
> -tuple class
>  POJO Schema Configuration is required and the order of schema should match 
> with the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #520: APEXMALHAR-2365-Creation of generic log parse...

2016-12-05 Thread jogshraddha
GitHub user jogshraddha opened a pull request:

https://github.com/apache/apex-malhar/pull/520

APEXMALHAR-2365-Creation of generic log parser

Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365)
APEXMALHAR-2365-Creation of generic log parser



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jogshraddha/apex-malhar 
APEXMALHAR-2365-LogParser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #520


commit c61835fee785c69c9fd04a98bec9bcc193c10d5c
Author: jogshraddha 
Date:   2016-12-02T08:25:53Z

Creation of generic log parser




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2365) LogParser - Operator to parse byte array using log format and emit a POJO

2016-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721871#comment-15721871
 ] 

ASF GitHub Bot commented on APEXMALHAR-2365:


GitHub user jogshraddha opened a pull request:

https://github.com/apache/apex-malhar/pull/520

APEXMALHAR-2365-Creation of generic log parser

Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365)
APEXMALHAR-2365-Creation of generic log parser



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jogshraddha/apex-malhar 
APEXMALHAR-2365-LogParser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #520


commit c61835fee785c69c9fd04a98bec9bcc193c10d5c
Author: jogshraddha 
Date:   2016-12-02T08:25:53Z

Creation of generic log parser




> LogParser - Operator to parse byte array using log format and emit a POJO
> -
>
> Key: APEXMALHAR-2365
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2365
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Shraddha
>Assignee: Shraddha
>
> A generic parser which takes the logFileFormat(schema with field and regex) 
> as a parameter and parse the log 
> Input: Byte Array
> Output: POJO
> Parameters: 
> -logFileFormat - schema containing fields and regex
> -encoding - for converting tuple to String
> -tuple class
>  POJO Schema Configuration is required and the order of schema should match 
> with the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)