Re: impersonation and application path

2017-05-18 Thread Priyanka Gugale
+1 for proposal.

Can we make new behaviour of writing to users own directory as default?
Most probably users will upgrade gateway with apex-core. If not, they
always have option to set the flag and fall back to legacy behaviour.

-Priyanka

On Fri, May 19, 2017 at 7:52 AM, Chinmay Kolhatkar 
wrote:

> +1 for pramod's proposal.
>
> On 19-May-2017 4:51 AM, "Sanjay Pujare"  wrote:
>
> > +1 for Pramod's proposal for impersonation.
> >
> > I have an issue with Sandesh's suggestion about making the new behavior
> as
> > the default (or only) behavior. This will introduce incompatibility with
> > other legacy tools (e.g. Datatorrent's dtGateway) that assume user A's
> HDFS
> > path as the application path. Because the legacy tools will continue to
> > assume the old path (user A's path) they will not work with the Apex core
> > that has this change.
> >
> > The current behavior might also be preferable to certain users or their
> > administrators because of not having to deal with multiple HDFS user
> > directories (for administration, logging, backup etc).
> >
> > On Thu, May 18, 2017 at 4:01 PM, Sandesh Hegde 
> > wrote:
> >
> > > My vote is to make the new proposal as the default behavior. Is there a
> > use
> > > case for the current behavior? If not then no need to add the
> > configuration
> > > setting.
> > >
> > > On Thu, May 18, 2017 at 3:47 PM Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > Sorry typo in sentence "as we are not asking for permissions for a
> > lower
> > > > privilege", please read as "as we are now asking for permissions for
> a
> > > > lower privilege".
> > > >
> > > > On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni <
> > pra...@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > Apex cli supports impersonation in secure mode. With impersonation,
> > the
> > > > > user running the cli or the user authenticating with hadoop
> > (henceforth
> > > > > referred to as login user) can be different from the effective user
> > > with
> > > > > which the actions are performed under hadoop. An example for this
> is
> > an
> > > > > application can be launched by user A to run in hadoop as user B.
> > This
> > > is
> > > > > kind of like the sudo functionality in unix. You can find more
> > details
> > > > > about the functionalilty here
> > > > https://apex.apache.org/docs/apex/security/ in
> > > > > the Impersonation section.
> > > > >
> > > > > What happens today with launching an application with
> impersonation,
> > > > using
> > > > > the above launch example, is that even though the application runs
> as
> > > > user
> > > > > B it still uses user A's hdfs path for the application path. The
> > > > > application path is where the artifacts necessary to run the
> > > application
> > > > > are stored and where the runtime files like checkpoints are stored.
> > > This
> > > > > means that user B needs to have read and write access to user A's
> > > > > application path folders.
> > > > >
> > > > > This may not be allowed in certain environments as it may be a
> policy
> > > > > violation for the following reason. Because user A is able to
> > > impersonate
> > > > > as user B to launch the application, A is considered to be a higher
> > > > > privileged user than B and is given necessary privileges in hadoop
> to
> > > do
> > > > > so. But after launch B needs to access folders belonging to A which
> > > could
> > > > > constitute a violation as we are not asking for permissions for a
> > lower
> > > > > privilege user to access resources of a higher privilege user.
> > > > >
> > > > > I would like to propose adding a configuration setting, which when
> > set
> > > > > will use the application path in the impersonated user's home
> > directory
> > > > > (user B) as opposed to impersonating user's home directory (user
> A).
> > If
> > > > > this setting is not specified then the behavior can default to what
> > it
> > > is
> > > > > today for backwards compatibility.
> > > > >
> > > > > Comments, suggestions, concerns?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>


[jira] [Commented] (APEXCORE-724) Support for Kubernetes

2017-05-18 Thread Deepak Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016862#comment-16016862
 ] 

Deepak Narkhede commented on APEXCORE-724:
--

Hi Thomas,

I would like to contribute to this feature. Also have done some investigation 
earlier using kubernetes client api and pods (single or multi-docker 
containers) on kubernertes.

Thanks,
Deepak
 

> Support for Kubernetes
> --
>
> Key: APEXCORE-724
> URL: https://issues.apache.org/jira/browse/APEXCORE-724
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Thomas Weise
>  Labels: roadmap
>
> It should be possible to run Apex applications on Kubernetes. This will also 
> require that Apex applications can be packaged as containers (Docker or other 
> supported container).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXCORE-724) Support for Kubernetes

2017-05-18 Thread Thomas Weise (JIRA)
Thomas Weise created APEXCORE-724:
-

 Summary: Support for Kubernetes
 Key: APEXCORE-724
 URL: https://issues.apache.org/jira/browse/APEXCORE-724
 Project: Apache Apex Core
  Issue Type: New Feature
Reporter: Thomas Weise


It should be possible to run Apex applications on Kubernetes. This will also 
require that Apex applications can be packaged as containers (Docker or other 
supported container).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2495) Apex SQL: Add support for windowing

2017-05-18 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2495:
-
Labels: roadmap  (was: )

> Apex SQL: Add support for windowing
> ---
>
> Key: APEXMALHAR-2495
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2495
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Thomas Weise
>  Labels: roadmap
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2296) Apex SQL: Add support for SQL GROUP BY (Aggregate RelNode)

2017-05-18 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2296:
-
Labels:   (was: roadmap)

> Apex SQL: Add support for SQL GROUP BY (Aggregate RelNode)
> --
>
> Key: APEXMALHAR-2296
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2296
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: sql
>Reporter: Chinmay Kolhatkar
>
> Add support for SQL GROUP BY (Aggregate RelNode)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2296) Apex SQL: Add support for SQL GROUP BY (Aggregate RelNode)

2017-05-18 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2296:
-
Labels: roadmap  (was: )

> Apex SQL: Add support for SQL GROUP BY (Aggregate RelNode)
> --
>
> Key: APEXMALHAR-2296
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2296
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: sql
>Reporter: Chinmay Kolhatkar
>  Labels: roadmap
>
> Add support for SQL GROUP BY (Aggregate RelNode)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXMALHAR-2495) Apex SQL: Add support for windowing

2017-05-18 Thread Thomas Weise (JIRA)
Thomas Weise created APEXMALHAR-2495:


 Summary: Apex SQL: Add support for windowing
 Key: APEXMALHAR-2495
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2495
 Project: Apache Apex Malhar
  Issue Type: New Feature
Reporter: Thomas Weise






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (APEXCORE-721) Announcement section on the website is not uptodate

2017-05-18 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise closed APEXCORE-721.
-
Resolution: Done

> Announcement section on the website is not uptodate
> ---
>
> Key: APEXCORE-721
> URL: https://issues.apache.org/jira/browse/APEXCORE-721
> Project: Apache Apex Core
>  Issue Type: Bug
>  Components: Website
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Announcement section on the main page on the website still lists malhar 3.6.0 
> and core 3.5.0 as the latest releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-site pull request #75: APEXCORE-721 Updated announcements to malhar 3.7...

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-site/pull/75


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-721) Announcement section on the website is not uptodate

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016809#comment-16016809
 ] 

ASF GitHub Bot commented on APEXCORE-721:
-

Github user asfgit closed the pull request at:

https://github.com/apache/apex-site/pull/75


> Announcement section on the website is not uptodate
> ---
>
> Key: APEXCORE-721
> URL: https://issues.apache.org/jira/browse/APEXCORE-721
> Project: Apache Apex Core
>  Issue Type: Bug
>  Components: Website
>Reporter: Pramod Immaneni
>Assignee: Pramod Immaneni
>
> Announcement section on the main page on the website still lists malhar 3.6.0 
> and core 3.5.0 as the latest releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: impersonation and application path

2017-05-18 Thread Chinmay Kolhatkar
+1 for pramod's proposal.

On 19-May-2017 4:51 AM, "Sanjay Pujare"  wrote:

> +1 for Pramod's proposal for impersonation.
>
> I have an issue with Sandesh's suggestion about making the new behavior as
> the default (or only) behavior. This will introduce incompatibility with
> other legacy tools (e.g. Datatorrent's dtGateway) that assume user A's HDFS
> path as the application path. Because the legacy tools will continue to
> assume the old path (user A's path) they will not work with the Apex core
> that has this change.
>
> The current behavior might also be preferable to certain users or their
> administrators because of not having to deal with multiple HDFS user
> directories (for administration, logging, backup etc).
>
> On Thu, May 18, 2017 at 4:01 PM, Sandesh Hegde 
> wrote:
>
> > My vote is to make the new proposal as the default behavior. Is there a
> use
> > case for the current behavior? If not then no need to add the
> configuration
> > setting.
> >
> > On Thu, May 18, 2017 at 3:47 PM Pramod Immaneni 
> > wrote:
> >
> > > Sorry typo in sentence "as we are not asking for permissions for a
> lower
> > > privilege", please read as "as we are now asking for permissions for a
> > > lower privilege".
> > >
> > > On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni <
> pra...@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Apex cli supports impersonation in secure mode. With impersonation,
> the
> > > > user running the cli or the user authenticating with hadoop
> (henceforth
> > > > referred to as login user) can be different from the effective user
> > with
> > > > which the actions are performed under hadoop. An example for this is
> an
> > > > application can be launched by user A to run in hadoop as user B.
> This
> > is
> > > > kind of like the sudo functionality in unix. You can find more
> details
> > > > about the functionalilty here
> > > https://apex.apache.org/docs/apex/security/ in
> > > > the Impersonation section.
> > > >
> > > > What happens today with launching an application with impersonation,
> > > using
> > > > the above launch example, is that even though the application runs as
> > > user
> > > > B it still uses user A's hdfs path for the application path. The
> > > > application path is where the artifacts necessary to run the
> > application
> > > > are stored and where the runtime files like checkpoints are stored.
> > This
> > > > means that user B needs to have read and write access to user A's
> > > > application path folders.
> > > >
> > > > This may not be allowed in certain environments as it may be a policy
> > > > violation for the following reason. Because user A is able to
> > impersonate
> > > > as user B to launch the application, A is considered to be a higher
> > > > privileged user than B and is given necessary privileges in hadoop to
> > do
> > > > so. But after launch B needs to access folders belonging to A which
> > could
> > > > constitute a violation as we are not asking for permissions for a
> lower
> > > > privilege user to access resources of a higher privilege user.
> > > >
> > > > I would like to propose adding a configuration setting, which when
> set
> > > > will use the application path in the impersonated user's home
> directory
> > > > (user B) as opposed to impersonating user's home directory (user A).
> If
> > > > this setting is not specified then the behavior can default to what
> it
> > is
> > > > today for backwards compatibility.
> > > >
> > > > Comments, suggestions, concerns?
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: impersonation and application path

2017-05-18 Thread Sanjay Pujare
+1 for Pramod's proposal for impersonation.

I have an issue with Sandesh's suggestion about making the new behavior as
the default (or only) behavior. This will introduce incompatibility with
other legacy tools (e.g. Datatorrent's dtGateway) that assume user A's HDFS
path as the application path. Because the legacy tools will continue to
assume the old path (user A's path) they will not work with the Apex core
that has this change.

The current behavior might also be preferable to certain users or their
administrators because of not having to deal with multiple HDFS user
directories (for administration, logging, backup etc).

On Thu, May 18, 2017 at 4:01 PM, Sandesh Hegde 
wrote:

> My vote is to make the new proposal as the default behavior. Is there a use
> case for the current behavior? If not then no need to add the configuration
> setting.
>
> On Thu, May 18, 2017 at 3:47 PM Pramod Immaneni 
> wrote:
>
> > Sorry typo in sentence "as we are not asking for permissions for a lower
> > privilege", please read as "as we are now asking for permissions for a
> > lower privilege".
> >
> > On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni  >
> > wrote:
> >
> > > Apex cli supports impersonation in secure mode. With impersonation, the
> > > user running the cli or the user authenticating with hadoop (henceforth
> > > referred to as login user) can be different from the effective user
> with
> > > which the actions are performed under hadoop. An example for this is an
> > > application can be launched by user A to run in hadoop as user B. This
> is
> > > kind of like the sudo functionality in unix. You can find more details
> > > about the functionalilty here
> > https://apex.apache.org/docs/apex/security/ in
> > > the Impersonation section.
> > >
> > > What happens today with launching an application with impersonation,
> > using
> > > the above launch example, is that even though the application runs as
> > user
> > > B it still uses user A's hdfs path for the application path. The
> > > application path is where the artifacts necessary to run the
> application
> > > are stored and where the runtime files like checkpoints are stored.
> This
> > > means that user B needs to have read and write access to user A's
> > > application path folders.
> > >
> > > This may not be allowed in certain environments as it may be a policy
> > > violation for the following reason. Because user A is able to
> impersonate
> > > as user B to launch the application, A is considered to be a higher
> > > privileged user than B and is given necessary privileges in hadoop to
> do
> > > so. But after launch B needs to access folders belonging to A which
> could
> > > constitute a violation as we are not asking for permissions for a lower
> > > privilege user to access resources of a higher privilege user.
> > >
> > > I would like to propose adding a configuration setting, which when set
> > > will use the application path in the impersonated user's home directory
> > > (user B) as opposed to impersonating user's home directory (user A). If
> > > this setting is not specified then the behavior can default to what it
> is
> > > today for backwards compatibility.
> > >
> > > Comments, suggestions, concerns?
> > >
> > > Thanks
> > >
> >
>


Re: impersonation and application path

2017-05-18 Thread Sandesh Hegde
My vote is to make the new proposal as the default behavior. Is there a use
case for the current behavior? If not then no need to add the configuration
setting.

On Thu, May 18, 2017 at 3:47 PM Pramod Immaneni 
wrote:

> Sorry typo in sentence "as we are not asking for permissions for a lower
> privilege", please read as "as we are now asking for permissions for a
> lower privilege".
>
> On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni 
> wrote:
>
> > Apex cli supports impersonation in secure mode. With impersonation, the
> > user running the cli or the user authenticating with hadoop (henceforth
> > referred to as login user) can be different from the effective user with
> > which the actions are performed under hadoop. An example for this is an
> > application can be launched by user A to run in hadoop as user B. This is
> > kind of like the sudo functionality in unix. You can find more details
> > about the functionalilty here
> https://apex.apache.org/docs/apex/security/ in
> > the Impersonation section.
> >
> > What happens today with launching an application with impersonation,
> using
> > the above launch example, is that even though the application runs as
> user
> > B it still uses user A's hdfs path for the application path. The
> > application path is where the artifacts necessary to run the application
> > are stored and where the runtime files like checkpoints are stored. This
> > means that user B needs to have read and write access to user A's
> > application path folders.
> >
> > This may not be allowed in certain environments as it may be a policy
> > violation for the following reason. Because user A is able to impersonate
> > as user B to launch the application, A is considered to be a higher
> > privileged user than B and is given necessary privileges in hadoop to do
> > so. But after launch B needs to access folders belonging to A which could
> > constitute a violation as we are not asking for permissions for a lower
> > privilege user to access resources of a higher privilege user.
> >
> > I would like to propose adding a configuration setting, which when set
> > will use the application path in the impersonated user's home directory
> > (user B) as opposed to impersonating user's home directory (user A). If
> > this setting is not specified then the behavior can default to what it is
> > today for backwards compatibility.
> >
> > Comments, suggestions, concerns?
> >
> > Thanks
> >
>


Re: impersonation and application path

2017-05-18 Thread Pramod Immaneni
Sorry typo in sentence "as we are not asking for permissions for a lower
privilege", please read as "as we are now asking for permissions for a
lower privilege".

On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni 
wrote:

> Apex cli supports impersonation in secure mode. With impersonation, the
> user running the cli or the user authenticating with hadoop (henceforth
> referred to as login user) can be different from the effective user with
> which the actions are performed under hadoop. An example for this is an
> application can be launched by user A to run in hadoop as user B. This is
> kind of like the sudo functionality in unix. You can find more details
> about the functionalilty here https://apex.apache.org/docs/apex/security/ in
> the Impersonation section.
>
> What happens today with launching an application with impersonation, using
> the above launch example, is that even though the application runs as user
> B it still uses user A's hdfs path for the application path. The
> application path is where the artifacts necessary to run the application
> are stored and where the runtime files like checkpoints are stored. This
> means that user B needs to have read and write access to user A's
> application path folders.
>
> This may not be allowed in certain environments as it may be a policy
> violation for the following reason. Because user A is able to impersonate
> as user B to launch the application, A is considered to be a higher
> privileged user than B and is given necessary privileges in hadoop to do
> so. But after launch B needs to access folders belonging to A which could
> constitute a violation as we are not asking for permissions for a lower
> privilege user to access resources of a higher privilege user.
>
> I would like to propose adding a configuration setting, which when set
> will use the application path in the impersonated user's home directory
> (user B) as opposed to impersonating user's home directory (user A). If
> this setting is not specified then the behavior can default to what it is
> today for backwards compatibility.
>
> Comments, suggestions, concerns?
>
> Thanks
>


impersonation and application path

2017-05-18 Thread Pramod Immaneni
Apex cli supports impersonation in secure mode. With impersonation, the
user running the cli or the user authenticating with hadoop (henceforth
referred to as login user) can be different from the effective user with
which the actions are performed under hadoop. An example for this is an
application can be launched by user A to run in hadoop as user B. This is
kind of like the sudo functionality in unix. You can find more details
about the functionalilty here https://apex.apache.org/docs/apex/security/ in
the Impersonation section.

What happens today with launching an application with impersonation, using
the above launch example, is that even though the application runs as user
B it still uses user A's hdfs path for the application path. The
application path is where the artifacts necessary to run the application
are stored and where the runtime files like checkpoints are stored. This
means that user B needs to have read and write access to user A's
application path folders.

This may not be allowed in certain environments as it may be a policy
violation for the following reason. Because user A is able to impersonate
as user B to launch the application, A is considered to be a higher
privileged user than B and is given necessary privileges in hadoop to do
so. But after launch B needs to access folders belonging to A which could
constitute a violation as we are not asking for permissions for a lower
privilege user to access resources of a higher privilege user.

I would like to propose adding a configuration setting, which when set will
use the application path in the impersonated user's home directory (user B)
as opposed to impersonating user's home directory (user A). If this setting
is not specified then the behavior can default to what it is today for
backwards compatibility.

Comments, suggestions, concerns?

Thanks


[GitHub] apex-malhar pull request #604: Apexmalhar 2467 Move JMS related examples fro...

2017-05-18 Thread prasannapramod
Github user prasannapramod closed the pull request at:

https://github.com/apache/apex-malhar/pull/604


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #604: Apexmalhar 2467 Move JMS related examples fro...

2017-05-18 Thread prasannapramod
GitHub user prasannapramod reopened a pull request:

https://github.com/apache/apex-malhar/pull/604

Apexmalhar 2467 Move JMS related examples from datatorrent examples to 
apex-malhar examples

@amberarrow @tweise @ashwinchandrap please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasannapramod/apex-malhar APEXMALHAR-2467

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #604


commit 5ed84d5284777a3a842003bb5e840ac39eed7d94
Author: Sanjay Pujare 
Date:   2017-04-04T22:42:26Z

New JMS ActiveMQ example to remove duplicate jdbcIngest line.

commit a00f26be4aa74df455bde754c343b0610f0a200c
Author: Sanjay Pujare 
Date:   2017-04-04T22:45:39Z

SPOI-8863 New example for using jmsInput operator for reading from SQS use 
elasticmq jar for unit testing

commit a77a67b915f2e3e4df627c4b7232e22aecb4bf61
Author: Lakshmi Prasanna Velineni 
Date:   2017-04-04T23:06:18Z

Changes completed.

commit 4684ccf65ec0d9dec50f922fc000f9ebebdffa3c
Author: Apex Dev 
Date:   2017-04-13T21:35:36Z

License Headers and checkstyle.

commit 4c148f45123b5ef96e786f8dcb02f0b7ba966347
Author: Apex Dev 
Date:   2017-04-13T21:35:36Z

License Headers and checkstyle.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (APEXMALHAR-2475) CacheStore needn't expire data if it read-only data

2017-05-18 Thread Pramod Immaneni (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni resolved APEXMALHAR-2475.
-
   Resolution: Fixed
Fix Version/s: 3.8.0

> CacheStore needn't expire data if it read-only data
> ---
>
> Key: APEXMALHAR-2475
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2475
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: Pramod Immaneni
>Assignee: Oliver Winke
> Fix For: 3.8.0
>
>
> The db CacheStore implementation supports expiry of data on read or write 
> after a configurable expiry period. The default is one minute. If the data is 
> read-only there is no need to expire this data. The max cache size property 
> will anyway ensure that the cache size doesn't grow indefinitely. The 
> CacheManager can provide the meta-information whether the data is read-only.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXMALHAR-2474) FSLoader only returns value at the beginning

2017-05-18 Thread Pramod Immaneni (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni resolved APEXMALHAR-2474.
-
   Resolution: Fixed
Fix Version/s: 3.8.0

> FSLoader only returns value at the beginning
> 
>
> Key: APEXMALHAR-2474
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2474
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: Pramod Immaneni
>Assignee: Oliver Winke
> Fix For: 3.8.0
>
>
> FSLoader implements Backup store for db CacheManager. In the initial load, it 
> reads all the lines of the file, line by line, and returns a Map of key-value 
> pairs with a key-value pair for every line. It returns data only on the 
> initial load and thereafter it returns null for any key lookup. Also, there 
> is no need to load all the data in the file and return it if the primary 
> cache cannot hold all the entries. These issues need to be addressed and it 
> also helps if the CacheManager supplies meta-information such as how much 
> information should be loaded and returned in the initial load. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXMALHAR-2473) Support for global cache meta information in db CacheManager

2017-05-18 Thread Pramod Immaneni (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni resolved APEXMALHAR-2473.
-
   Resolution: Fixed
Fix Version/s: 3.8.0

> Support for global cache meta information in db CacheManager
> 
>
> Key: APEXMALHAR-2473
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2473
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Oliver Winke
> Fix For: 3.8.0
>
>
> Currently db CacheManager has no knowledge of characteristics of the data or 
> the cache stores, so it handles all scenarios uniformly. This may not be the 
> optimal implementation in all cases. Better optimizations can be performed in 
> the manager if this information is known. A few examples, if the data is 
> read-only the keys in the primary cache need not be refreshed like they are 
> being done daily today, if the primary cache size is known the number of 
> initial entries loaded from backup needn't exceed it. Add support for such 
> general cache meta information in the manager. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2473) Support for global cache meta information in db CacheManager

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016435#comment-16016435
 ] 

ASF GitHub Bot commented on APEXMALHAR-2473:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/605


> Support for global cache meta information in db CacheManager
> 
>
> Key: APEXMALHAR-2473
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2473
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Pramod Immaneni
>Assignee: Oliver Winke
>
> Currently db CacheManager has no knowledge of characteristics of the data or 
> the cache stores, so it handles all scenarios uniformly. This may not be the 
> optimal implementation in all cases. Better optimizations can be performed in 
> the manager if this information is known. A few examples, if the data is 
> read-only the keys in the primary cache need not be refreshed like they are 
> being done daily today, if the primary cache size is known the number of 
> initial entries loaded from backup needn't exceed it. Add support for such 
> general cache meta information in the manager. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-malhar pull request #605: APEXMALHAR-2473 Support for global cache meta...

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/605


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2494) Update demo apps with description

2017-05-18 Thread Chaitanya (JIRA)
Chaitanya created APEXMALHAR-2494:
-

 Summary: Update demo apps with description
 Key: APEXMALHAR-2494
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2494
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Chaitanya
Assignee: Chaitanya
Priority: Minor


Add Readme for the demo apps which are under examples package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2493) KafkaSinglePortExactlyOnceOutputOperator going to the blocked state during recovery

2017-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015556#comment-16015556
 ] 

ASF GitHub Bot commented on APEXMALHAR-2493:


GitHub user chaithu14 opened a pull request:

https://github.com/apache/apex-malhar/pull/622

APEXMALHAR-2493 Fixed the issue of KafkaSinglePortExactlyOnceOutputOperator 
going to the blocked state during recovery

@sandeshh @tushargosavi Please review and merge.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2493-KafkaExactlyCBBug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #622


commit c784f4da46d1cf594aa4156135b9c196aa66d931
Author: chaitanya 
Date:   2017-05-18T10:37:52Z

APEXMALHAR-2493 Fixed the issue of KafkaSinglePortExactlyOnceOutputOperator 
going to the blocked state during recovery




> KafkaSinglePortExactlyOnceOutputOperator going to the blocked state during 
> recovery
> ---
>
> Key: APEXMALHAR-2493
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2493
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Steps to reproduce the issue: 
> ---
> - Created the Kafka topic with single partition.
> - Created the application with the following DAG:
> BatchSequenceGenerator -> KafkaSinglePortExactlyOnceOutputOperator
>   # of partitions of  KafkaSinglePortExactlyOnceOutputOperator =  2. 
> Let's say KO1, KO2 are the two instances.
> - Launched the app, after some time, manually killed the one of the instance 
> of "KafkaSinglePortExactlyOnceOutputOperator" operator(KO2).
> - During recovery, the instance comes up and after some time, it goes to the 
> blocked state. App master killed this instance.
> Observation:
> 
> * There is an infinite while loop in rebuildPartialWindow() method.
> * While loop will break on the below 2 conditions:
>a) # of trails for "polled records from Kafka is empty" = 10
>b) Crossed boundary (consumerRecord.offset() >= currentOffset)
> In this scenario, KO1 keeps on writing the data to Kafka. So, the first 
> condition will not satisfy.
> Operator is not checking the 2nd condition because of the below continue 
> statement: 
>   if (!doesKeyBelongsToThisInstance(operatorId, 
> consumerRecord.key())) {
> continue;
>   }
> Solution: First check the cross boundary condition and then check the 
> doesKeyBelongsToThisInstance(..).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-malhar pull request #622: APEXMALHAR-2493 Fixed the issue of KafkaSingl...

2017-05-18 Thread chaithu14
GitHub user chaithu14 opened a pull request:

https://github.com/apache/apex-malhar/pull/622

APEXMALHAR-2493 Fixed the issue of KafkaSinglePortExactlyOnceOutputOperator 
going to the blocked state during recovery

@sandeshh @tushargosavi Please review and merge.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2493-KafkaExactlyCBBug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #622


commit c784f4da46d1cf594aa4156135b9c196aa66d931
Author: chaitanya 
Date:   2017-05-18T10:37:52Z

APEXMALHAR-2493 Fixed the issue of KafkaSinglePortExactlyOnceOutputOperator 
going to the blocked state during recovery




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2493) KafkaSinglePortExactlyOnceOutputOperator going to the blocked state during recovery

2017-05-18 Thread Chaitanya (JIRA)
Chaitanya created APEXMALHAR-2493:
-

 Summary: KafkaSinglePortExactlyOnceOutputOperator going to the 
blocked state during recovery
 Key: APEXMALHAR-2493
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2493
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Chaitanya
Assignee: Chaitanya


Steps to reproduce the issue: 
---
- Created the Kafka topic with single partition.
- Created the application with the following DAG:
BatchSequenceGenerator -> KafkaSinglePortExactlyOnceOutputOperator
  # of partitions of  KafkaSinglePortExactlyOnceOutputOperator =  2. Let's 
say KO1, KO2 are the two instances.
- Launched the app, after some time, manually killed the one of the instance of 
"KafkaSinglePortExactlyOnceOutputOperator" operator(KO2).
- During recovery, the instance comes up and after some time, it goes to the 
blocked state. App master killed this instance.

Observation:

* There is an infinite while loop in rebuildPartialWindow() method.
* While loop will break on the below 2 conditions:
   a) # of trails for "polled records from Kafka is empty" = 10
   b) Crossed boundary (consumerRecord.offset() >= currentOffset)

In this scenario, KO1 keeps on writing the data to Kafka. So, the first 
condition will not satisfy.
Operator is not checking the 2nd condition because of the below continue 
statement: 
  if (!doesKeyBelongsToThisInstance(operatorId, consumerRecord.key())) {
continue;
  }

Solution: First check the cross boundary condition and then check the 
doesKeyBelongsToThisInstance(..).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)