[jira] [Commented] (APEXMALHAR-2284) POJOInnerJoinOperatorTest fails in Travis CI

2016-10-23 Thread Chaitanya (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601145#comment-15601145
 ] 

Chaitanya commented on APEXMALHAR-2284:
---

I looked into the PR #434. This uses ManagedTimeUnifiedStateImpl and has the 
implementations of spillable data structure is over BucketedState. 

InnerJoinOperator needs the ManagedTimeStateImpl store which has keyBucket and 
timeBucket as we need the put(keyBucket, timeBucket, Key, Value) call. Here we 
need the TimeSlicedBucketedState where bucketed data is further divided into 
time buckets.

I am planning to remove ManagedTimeStateMultiValue and implement a new 
spillable data structure which use the store as TimeSlicedBucketedState and 
uses ManagedTimeStateImpl.

Does this make sense? Can we go ahead with this plan? 
[~csingh] [~thw] [~bhupesh] Please give your suggestions ?

> POJOInnerJoinOperatorTest fails in Travis CI
> 
>
> Key: APEXMALHAR-2284
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2284
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Thomas Weise
>Assignee: Chaitanya
>Priority: Blocker
> Fix For: 3.6.0
>
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/166322754/log.txt
> {code}
> Failed tests: 
>   POJOInnerJoinOperatorTest.testEmitMultipleTuplesFromStream2:337 Number of 
> tuple emitted  expected:<2> but was:<4>
>   POJOInnerJoinOperatorTest.testInnerJoinOperator:184 Number of tuple emitted 
>  expected:<1> but was:<2>
>   POJOInnerJoinOperatorTest.testMultipleValues:236 Number of tuple emitted  
> expected:<2> but was:<3>
>   POJOInnerJoinOperatorTest.testUpdateStream1Values:292 Number of tuple 
> emitted  expected:<1> but was:<2>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-564) We should be able to remove loggers level settings

2016-10-23 Thread Priyanka Gugale (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601132#comment-15601132
 ] 

Priyanka Gugale commented on APEXCORE-564:
--

Our initial though was, we will set it to NULL so that either parent or root 
level logger will be picked up. As you explained patters are cumulative so 
taking example of com.datatorrent.engine.*, say following levels were set 
earlier

com.datatorrent.engine.*=INFO
com.datatorrent.engine.*=DEBUG

Now to remove it, as LoggerUtil doesn't have a way to remove these patterns, we 
though of setting ti to null so now entries look like
com.datatorrent.engine.*=INFO
com.datatorrent.engine.*=DEBUG
com.datatorrent.engine.*=null

After doing the logger level for com.datatorrent.engine.* will either be picked 
from parent pattern say com.datatorrent.* or from root logger. That was our 
understanding. Also after this if someone wishes to set some new level on 
logger that is also feasible.
Please let us know if this approach looks good?


> We should be able to remove loggers level settings
> --
>
> Key: APEXCORE-564
> URL: https://issues.apache.org/jira/browse/APEXCORE-564
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Priyanka Gugale
>Assignee: Priyanka Gugale
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> StramWebServices has api to setLoggersLevel, but there is not way to remove 
> the already set loggers level. There should be an api to remove loggers level 
> which was set earlier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2309) TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist

2016-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601113#comment-15601113
 ] 

ASF GitHub Bot commented on APEXMALHAR-2309:


GitHub user francisf reopened a pull request:

https://github.com/apache/apex-malhar/pull/464

APEXMALHAR-2309 Comparing times for newer tuples with existing key

@bhupeshchawda please review.
Marking a tuple as unique if the time found for the key in asyncEvents is < 
current tuple's time

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/francisf/apex-malhar 
APEXMALHAR-2309_Deduper_valid_as_duplicates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #464


commit c56e5c36c46f90fb0fee7cb6558bf860dbf6e181
Author: francisf 
Date:   2016-10-21T13:08:39Z

APEXMALHAR-2309 Comparing times for newer tuples with existing key




> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -
>
> Key: APEXMALHAR-2309
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Francis Fernandes
>Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates. 
> Consider the following configuration (number of buckets = 1 )
> {code}
>   
> 
> dt.application.DedupTestApp.operator.Deduper.prop.expireBefore
> 10
>   
>   
> dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan
> 10
>   
> {code}
> The data piped in is : 
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is 
> marked as duplicate because although the first tuple although expired is 
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing 
> duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2309) TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist

2016-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601112#comment-15601112
 ] 

ASF GitHub Bot commented on APEXMALHAR-2309:


Github user francisf closed the pull request at:

https://github.com/apache/apex-malhar/pull/464


> TimeBasedDedupOperator marks new tuples as duplicates if expired tuples exist
> -
>
> Key: APEXMALHAR-2309
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2309
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Francis Fernandes
>Assignee: Francis Fernandes
>
> The deduper marks valid tuples outside the expiry window as duplicates. 
> Consider the following configuration (number of buckets = 1 )
> {code}
>   
> 
> dt.application.DedupTestApp.operator.Deduper.prop.expireBefore
> 10
>   
>   
> dt.application.DedupTestApp.operator.Deduper.prop.bucketSpan
> 10
>   
> {code}
> The data piped in is : 
> {code}
> "10",1474614305000,"Test"
> "11",1474614315000,"Test"
> "10",1474614325000,"Test"
> {code}
> The 3rd tuple is valid since it is outside of the expiry window. But it is 
> marked as duplicate because although the first tuple although expired is 
> still present in the Bucket.flash.
> The issue happens when the expiry duration lesser than the checkpointing 
> duration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #464: APEXMALHAR-2309 Comparing times for newer tup...

2016-10-23 Thread francisf
GitHub user francisf reopened a pull request:

https://github.com/apache/apex-malhar/pull/464

APEXMALHAR-2309 Comparing times for newer tuples with existing key

@bhupeshchawda please review.
Marking a tuple as unique if the time found for the key in asyncEvents is < 
current tuple's time

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/francisf/apex-malhar 
APEXMALHAR-2309_Deduper_valid_as_duplicates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #464


commit c56e5c36c46f90fb0fee7cb6558bf860dbf6e181
Author: francisf 
Date:   2016-10-21T13:08:39Z

APEXMALHAR-2309 Comparing times for newer tuples with existing key




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #464: APEXMALHAR-2309 Comparing times for newer tup...

2016-10-23 Thread francisf
Github user francisf closed the pull request at:

https://github.com/apache/apex-malhar/pull/464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2302) Exposing few properties of FSSplitter and BlockReader operators to FSRecordReaderModule to tune Application

2016-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601061#comment-15601061
 ] 

ASF GitHub Bot commented on APEXMALHAR-2302:


GitHub user deepak-narkhede reopened a pull request:

https://github.com/apache/apex-malhar/pull/457

APEXMALHAR-2302 Exposing few properties of FSSplitter and BlockReader 
operators to FSRecordReaderModule

This change adds blockSize property from FileSplitter to 
FSRecordReaderModule.
Tested with RecordReader Application.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit 7d07dd32c95546a6c4570453163f9ff47b8a7893
Author: deepak-narkhede 
Date:   2016-10-21T12:27:01Z

APEXMALHAR-2302 Exposing the few properties of FSSplitter and BlockReader 
operators to FSRecordReaderModule to tune Application.
This change includes:
1) Expose blockSize property of FileSplitter operator.
2) Expose minReaders and maxReaders for dynamic partitioning of Block 
Reader operator.
3) Deprecate readersCount  from FSRecordReaderModule.




> Exposing few properties of FSSplitter and BlockReader operators to 
> FSRecordReaderModule to tune Application
> ---
>
> Key: APEXMALHAR-2302
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2302
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
>Priority: Minor
>
> Exposing the blockSize property of FSSplitter operator  to 
> FSRecordReaderModule. This will help end users to tune the blockSize value 
> based on application needs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #457: APEXMALHAR-2302 Exposing few properties of FS...

2016-10-23 Thread deepak-narkhede
GitHub user deepak-narkhede reopened a pull request:

https://github.com/apache/apex-malhar/pull/457

APEXMALHAR-2302 Exposing few properties of FSSplitter and BlockReader 
operators to FSRecordReaderModule

This change adds blockSize property from FileSplitter to 
FSRecordReaderModule.
Tested with RecordReader Application.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #457


commit 7d07dd32c95546a6c4570453163f9ff47b8a7893
Author: deepak-narkhede 
Date:   2016-10-21T12:27:01Z

APEXMALHAR-2302 Exposing the few properties of FSSplitter and BlockReader 
operators to FSRecordReaderModule to tune Application.
This change includes:
1) Expose blockSize property of FileSplitter operator.
2) Expose minReaders and maxReaders for dynamic partitioning of Block 
Reader operator.
3) Deprecate readersCount  from FSRecordReaderModule.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[New Feature] Apex SQL Support - Calcite Integration

2016-10-23 Thread Chinmay Kolhatkar
Dear Community,

Thank you for all the contribution in making the first PR to get merged for
adding SQL Support in Apex.

The support for SQL in Apex is present here:
https://github.com/apache/apex-malhar/tree/master/sql

As a part of PR review and other discussion a number of followup items that
came up and I've created a Jira for them. Here is the complete list:
https://issues.apache.org/jira/browse/APEXMALHAR-2311?jql=project%20%3D%20APEXMALHAR%20AND%20component%20%3D%20sql

We'll be looking out for folks that are interested to contribute and
provide feedback to SQL Support in Apex.

So if you have time and interest, please let us know.

Thanks,
Chinmay.


[jira] [Commented] (APEXMALHAR-2312) NullPointerException in FileSplitterInput only if the file path is specified for attribute instead of directory path

2016-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600896#comment-15600896
 ] 

ASF GitHub Bot commented on APEXMALHAR-2312:


Github user deepak-narkhede closed the pull request at:

https://github.com/apache/apex-malhar/pull/463


> NullPointerException in FileSplitterInput only if the file path is specified 
> for attribute  instead of directory path
> 
>
> Key: APEXMALHAR-2312
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2312
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
>Priority: Minor
>
> Problem Statement:
> ==
> NullPointerException seen in FileSplitterInput only if the file path is 
> specified for attribute  instead of directory path.
> Description:
> ===
> 1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the 
> directories/files.
> 2) Each thread checks with help of isIterationCompleted() [referenceTimes] 
> method whether scanned of last iteration are processed by operator thread.
> 3) Previously it used to work because HashMap (referenceTimes) used to return 
> null even if last scanned directory path is null.
> 4) Recently referenceTimes is changed to ConcurrentHashMap, so get() doesn't 
> allow null key's passed to ConcurrentHashMap get() method.
> 5) Hence NullPointerException is seen as if only file path is provided 
> directory path would be empty hence key would be empty.
> Solution:
> 
> Pre-check that directory path is null then we have completed last iterations 
> if only filepath is provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2312) NullPointerException in FileSplitterInput only if the file path is specified for attribute instead of directory path

2016-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600897#comment-15600897
 ] 

ASF GitHub Bot commented on APEXMALHAR-2312:


GitHub user deepak-narkhede reopened a pull request:

https://github.com/apache/apex-malhar/pull/463

APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operat…

Problem Statement:
-
NullPointerException seen in FileSplitterInput only if the file path is 
specified for attribute  instead of directory path.

Description:
---
1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the 
directories/files.
2) Each thread checks with help of isIterationCompleted() [referenceTimes] 
method whether scanned of last iteration are processed by operator thread.
3) Previously it used to work because HashMap (referenceTimes) used to 
return null even if last scanned directory path is null.
4) Recently referenceTimes is changed to ConcurrentHashMap, so get() 
doesn't allow null key's passed to ConcurrentHashMap get() method.
5) Hence NullPointerException is seen as if only file path is provided 
directory path would be empty hence key would be empty.

Solution:
---
Pre-check that directory path is null then we have completed last 
iterations if only filepath is provided.

Testing logs with fix for files/directories/sub-directories:
-
2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Directory path: /user/deepak/files Sub-Directory or File path: 
/user/deepak/files/CustomerTxnData2
2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Scan started for input /user/deepak/files
2016-10-21 11:20:38,386 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan /user/deepak/files
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData 1477028632605
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData1 1477028642067
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData2 1477028645290
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan complete 0 3



2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Directory path: null Sub-Directory or File path: 
/user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Scan started for input /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,702 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,704 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan complete

   

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2312

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #463


commit 47f29f39393a4e43c8423153d32d12c9622872b5
Author: deepak-narkhede 
Date:   2016-10-21T06:44:34Z

APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operator if 
filepath is specified.

Problem Description:
---
1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the 
directories/files.
2) Each thread checks with help of isIterationCompleted() [referenceTimes] 
method whether scanned of last iteration are processed by operator thread.
3) Previously it used to work because HashMap (referenceTimes) used to 
return null even if last scanned directory path is null.
4) Recently referenceTimes is changed to ConcurrentHashMap, so get() 
doesn't allow null key's passed to ConcurrentHashMap get() method.
5) Hence NullPointerException is seen as if only file path is provided 
directory path would be empty hence key would be empty.

Solution:
-
Pre-check that directory path is null then we have completed last 
iterations if only filepath is provided.




> NullPointerException in FileSplitterInput only if the file path is specified 
> for attribute  instead of directory path
> 
>
>   

[GitHub] apex-malhar pull request #463: APEXMALHAR-2312 Fix NullPointerException for ...

2016-10-23 Thread deepak-narkhede
GitHub user deepak-narkhede reopened a pull request:

https://github.com/apache/apex-malhar/pull/463

APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operat…

Problem Statement:
-
NullPointerException seen in FileSplitterInput only if the file path is 
specified for attribute  instead of directory path.

Description:
---
1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the 
directories/files.
2) Each thread checks with help of isIterationCompleted() [referenceTimes] 
method whether scanned of last iteration are processed by operator thread.
3) Previously it used to work because HashMap (referenceTimes) used to 
return null even if last scanned directory path is null.
4) Recently referenceTimes is changed to ConcurrentHashMap, so get() 
doesn't allow null key's passed to ConcurrentHashMap get() method.
5) Hence NullPointerException is seen as if only file path is provided 
directory path would be empty hence key would be empty.

Solution:
---
Pre-check that directory path is null then we have completed last 
iterations if only filepath is provided.

Testing logs with fix for files/directories/sub-directories:
-
2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Directory path: /user/deepak/files Sub-Directory or File path: 
/user/deepak/files/CustomerTxnData2
2016-10-21 11:20:38,382 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Scan started for input /user/deepak/files
2016-10-21 11:20:38,386 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan /user/deepak/files
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData 1477028632605
2016-10-21 11:20:33,372 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData1 1477028642067
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
discovered /user/deepak/files/CustomerTxnData2 1477028645290
2016-10-21 11:20:33,373 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan complete 0 3



2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Directory path: null Sub-Directory or File path: 
/user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,697 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
Scan started for input /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,702 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan /user/deepak/files/CustomerTxnData
2016-10-21 11:25:50,704 DEBUG com.datatorrent.lib.io.fs.FileSplitterInput: 
scan complete

   

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deepak-narkhede/apex-malhar APEXMALHAR-2312

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #463


commit 47f29f39393a4e43c8423153d32d12c9622872b5
Author: deepak-narkhede 
Date:   2016-10-21T06:44:34Z

APEXMALHAR-2312 Fix NullPointerException for FileSplitterInput Operator if 
filepath is specified.

Problem Description:
---
1) TimeBasedDirectoryScanner threads part of scanservice tries to scan the 
directories/files.
2) Each thread checks with help of isIterationCompleted() [referenceTimes] 
method whether scanned of last iteration are processed by operator thread.
3) Previously it used to work because HashMap (referenceTimes) used to 
return null even if last scanned directory path is null.
4) Recently referenceTimes is changed to ConcurrentHashMap, so get() 
doesn't allow null key's passed to ConcurrentHashMap get() method.
5) Hence NullPointerException is seen as if only file path is provided 
directory path would be empty hence key would be empty.

Solution:
-
Pre-check that directory path is null then we have completed last 
iterations if only filepath is provided.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #463: APEXMALHAR-2312 Fix NullPointerException for ...

2016-10-23 Thread deepak-narkhede
Github user deepak-narkhede closed the pull request at:

https://github.com/apache/apex-malhar/pull/463


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2269) AbstractFileInputOperator: During replay, IO errors not handled

2016-10-23 Thread Matt Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600713#comment-15600713
 ] 

Matt Zhang commented on APEXMALHAR-2269:


The function failureHandling() will add the failed file to the failedFiles. So 
if we hit this during replay() it will break idempotency as the failedFiles 
will be different than the original run. It's better just throw 
RuntimeException() and restart the replay() from beginning. Revert to original 
code. 

> AbstractFileInputOperator: During replay, IO errors not handled
> ---
>
> Key: APEXMALHAR-2269
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2269
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Matt Zhang
>
> In AbstractFileInputOperator, during replay(), if any IOExceptions occur, they
> are not handled gracefully -- the code simply throws a RuntimeException.
> Code similar to the behavior of emitTuples() needs to be added where
> the falureHandling() method is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-560) Logical plan is not changed when all physical partitions of operator are removed from DAG

2016-10-23 Thread Vlad Rozov (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600183#comment-15600183
 ] 

Vlad Rozov commented on APEXCORE-560:
-

Such approach may lead to a confusion and hard to debug issues. IMO, it will be 
better to mark some parts of the DAG as inactive and allow 
activation/de-activation of operators in the logical DAG.

> Logical plan is not changed when all physical partitions of operator are 
> removed from DAG
> -
>
> Key: APEXCORE-560
> URL: https://issues.apache.org/jira/browse/APEXCORE-560
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>
> Throwing a ShutdownException() from an input operator removes them from the 
> physical plan, but can still be seen in the logical plan. Ideally the 
> corresponding logical operator must also be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)