Re: Adding new log4j appender to Apex core

2016-11-23 Thread David Yan
Vlad,

The feature only works *if the log file name at error does not change later*
.

In this case, Priyanka proposes a default appender that the user can use
that has this behavior while log rotation is still supported.

If the user has a custom log appender, the feature can still work if it
satisfies the above requirement. Otherwise, there is no way for this
feature to work.

We can support a configuration option to force the inclusion of log
location and offset in the error STRAM event even if the Apex appender is
not used, if the user knows that they are using a custom appender that
satisfies the requirement.

David

On Wed, Nov 23, 2016 at 8:47 PM, Vlad Rozov  wrote:

> Additionally, I think that it is necessary to re-evaluate the
> requirements. Custom logging is quite common and many enterprises/Devops
> have own preferences/policy for log rotation and logging format. I saw
> instances when logging was redirected to stdout. By enforcing specific
> rotation policy or log format, the feature is more likely not to be used.
>
> Thank you,
>
> Vlad
>
>
> On 11/23/16 14:21, Vlad Rozov wrote:
>
>> Both approaches look quite "hacky" to me.
>>
>> Thank you,
>>
>> Vlad
>>
>> On 11/23/16 00:01, Mohit Jotwani wrote:
>>
>>> +1 - Approach 2
>>>
>>> Regards,
>>> Mohit
>>>
>>> On Wed, Nov 23, 2016 at 12:35 PM, AJAY GUPTA 
>>> wrote:
>>>
>>> +1 for approach 2.


 Regards,
 Ajay

 On Wed, Nov 23, 2016 at 12:16 PM, David Yan 
 wrote:

 The goal of this log4j appender is to provide a log offset and the fixed
> name of the container log file (instead of apex.log becoming apex.log.1
>
 and

> then apex.log.2, etc due to rotation) as part of an error STRAM event
> so
> users can easily locate the log entries around the error.
>
> The user can override the appender, but in that case, the engine
> detects
> that and will not include the log location as part of the STRAM event.
>
> David
>
> On Tue, Nov 22, 2016 at 7:10 PM, Priyanka Gugale <
>
 priya...@datatorrent.com

> wrote:
>
> Hi,
>>
>> Thomas,
>> Yes log4j is ultimately owned by user, and they should be able to
>>
> override
>
>> it. What I am trying to do is provide a default behavior for Apex. In
>>
> case
>
>> user isn't using any logger of their own we should use this new
>>
> appender

> of
>
>> Apex rather than using standard log4j appender as per hadoop config.
>>
>> Sanjay,
>> Archetype is the good place to put this and I will add it there, but
>>
> many

> time people won't use it. So I wanted to keep it at ~/.dt as well. Is
>>
> there
>
>> any other default config folder for Apex?
>>
>> Also I am not relying on anything. If we fail to find config in app
>> jar
>>
> or
>
>> ~/.dt we are going to skip usage of this new appender.
>>
>> -Priyanka
>>
>> On Wed, Nov 23, 2016 at 5:58 AM, Sanjay Pujare <
>> san...@datatorrent.com
>> wrote:
>>
>> The only way to “enforce” this new appender is to update the
>>>
>> archetypes

> (apex-app-archetype and apex-conf-archetype under apex-core/ )  to
>>>
>> use

> the
>>
>>> new ones as default. But there does not seem to be a way to enforce
>>>
>> this
>
>> for anyone not using the archetypes.
>>>
>>> I agree with not relying on ~/.dt in apex-core.
>>>
>>> On 11/22/16, 1:08 PM, "Thomas Weise"  wrote:
>>>
>>>  The log4j configuration is ultimately owned by the user, so how
>>>
>> do

> you
>>
>>> want
>>>  to enforce a custom appender?
>>>
>>>  I don't think that this should rely on anything in ~/.dt either
>>>
>>>  Thomas
>>>
>>>  On Tue, Nov 22, 2016 at 10:00 AM, Priyanka Gugale <
>>> priya...@datatorrent.com>
>>>  wrote:
>>>
>>>  > Hi,
>>>  >
>>>  > I am working on APEXCORE-563
>>>  > 
>>>  > As per this Jira we should put log file name in
>>>
>> container/operator
>
>> events.
>>>  > The problem is current RollingFileAppender keeps renaming
>>> files
>>>
>> from
>>
>>> 1 to 2
>>>  > to ... n as files reach maximum allowed file size. Because of
>>> constant
>>>  > renaming of files we can't put a fixed file name in stram
>>>
>> event.

>  >
>>>  > To overcome this I would like to add a new log4j appender to
>>> ApexCore.
>>>  > There are two ways I can implement this:
>>>  > 1. Have Daily rolling file appender. The current file will be
>>> recognized
>>>  > based on 

Re: Adding new log4j appender to Apex core

2016-11-23 Thread Vlad Rozov
Additionally, I think that it is necessary to re-evaluate the 
requirements. Custom logging is quite common and many enterprises/Devops 
have own preferences/policy for log rotation and logging format. I saw 
instances when logging was redirected to stdout. By enforcing specific 
rotation policy or log format, the feature is more likely not to be used.


Thank you,

Vlad

On 11/23/16 14:21, Vlad Rozov wrote:

Both approaches look quite "hacky" to me.

Thank you,

Vlad

On 11/23/16 00:01, Mohit Jotwani wrote:

+1 - Approach 2

Regards,
Mohit

On Wed, Nov 23, 2016 at 12:35 PM, AJAY GUPTA  
wrote:



+1 for approach 2.


Regards,
Ajay

On Wed, Nov 23, 2016 at 12:16 PM, David Yan  
wrote:


The goal of this log4j appender is to provide a log offset and the 
fixed
name of the container log file (instead of apex.log becoming 
apex.log.1

and
then apex.log.2, etc due to rotation) as part of an error STRAM 
event so

users can easily locate the log entries around the error.

The user can override the appender, but in that case, the engine 
detects

that and will not include the log location as part of the STRAM event.

David

On Tue, Nov 22, 2016 at 7:10 PM, Priyanka Gugale <

priya...@datatorrent.com

wrote:


Hi,

Thomas,
Yes log4j is ultimately owned by user, and they should be able to

override

it. What I am trying to do is provide a default behavior for Apex. In

case

user isn't using any logger of their own we should use this new

appender

of

Apex rather than using standard log4j appender as per hadoop config.

Sanjay,
Archetype is the good place to put this and I will add it there, but

many

time people won't use it. So I wanted to keep it at ~/.dt as well. Is

there

any other default config folder for Apex?

Also I am not relying on anything. If we fail to find config in 
app jar

or

~/.dt we are going to skip usage of this new appender.

-Priyanka

On Wed, Nov 23, 2016 at 5:58 AM, Sanjay Pujare 
 wrote:

 The log4j configuration is ultimately owned by the user, so how

do

you

want
 to enforce a custom appender?

 I don't think that this should rely on anything in ~/.dt either

 Thomas

 On Tue, Nov 22, 2016 at 10:00 AM, Priyanka Gugale <
priya...@datatorrent.com>
 wrote:

 > Hi,
 >
 > I am working on APEXCORE-563
 > 
 > As per this Jira we should put log file name in

container/operator

events.
 > The problem is current RollingFileAppender keeps renaming 
files

from

1 to 2
 > to ... n as files reach maximum allowed file size. Because of
constant
 > renaming of files we can't put a fixed file name in stram

event.

 >
 > To overcome this I would like to add a new log4j appender to
ApexCore.
 > There are two ways I can implement this:
 > 1. Have Daily rolling file appender. The current file will be
recognized
 > based on timestamp in file name. Also to control max file 
size,

we

need to
 > keep rolling files based on size as well.
 > 2. Have Rolling File Appender but do not rename files. 
When max

file

size
 > is reached create new file with next number e.g. crate log 
file

dt.log.2
 > after dt.log.1 is full. Also to recognize the latest file 
keep

the

softlink
 > named dt.log pointing to current log file.
 >
 > I would prefer to implement approach 2. Please provide your
 > comments/feedback if you feel otherwise.
 >
 > Also to use this new appender we need to use our custom
log4j.properties
 > file instead of one present in hadoop conf. For that we 
need to

set

jvm
 > option -Dlog4j.configuration. I am planning to update file
dt-site.xml in
 > folder ~/.dt  and default properties file available in apex
archetype to
 > set jvm options as follows:
 >  
 > dt.attr.CONTAINER_JVM_OPTIONS
 > -Dlog4j.configuration=log4j.props
 >  
 >
 > And I will copy log4j.props file in ~/.dt folder as well as
 > apex-archetypes.
 >
 > Lastly if someone still miss this new log4j properties 
file or

jvm

option
 > to set -Dlog4j.configuration we will not put log file name in

event

raised
 > by container or operator.
 >
 > Please provide your feedback on this approach.
 >
 > -Priyanka
 >










[jira] [Commented] (APEXCORE-462) StramAgent class initiates a lot of unnecessary YARN connections

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691870#comment-15691870
 ] 

ASF GitHub Bot commented on APEXCORE-462:
-

Github user davidyan74 closed the pull request at:

https://github.com/apache/apex-core/pull/336


> StramAgent class initiates a lot of unnecessary YARN connections
> 
>
> Key: APEXCORE-462
> URL: https://issues.apache.org/jira/browse/APEXCORE-462
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: David Yan
>Assignee: David Yan
>
> A bug has been reported that the usage of StramAgent class can create a lot 
> of unnecessary YARN connections if the application queried is not running or 
> has errors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Amol Kekre
Persisted plan on DFS is good. I am +1 for it. This could be both of the
following

1. Attribute : If set, then upon change in plan persist to DFS
2. On demand

Thks
Amol


On Wed, Nov 23, 2016 at 4:15 PM, Sanjay Pujare 
wrote:

> Okay, but this “state” is gone after the app is “dead” isn’t that true?
> Also the reason for this enhancement is debuggability/troubleshooting of
> Apex apps so it is good to have separate explicit user visible files that
> contain the plan information instead of overloading the state for this
> purpose (in my opinion).
>
> In terms of on-demand, it sounds like a good idea - I didn’t think of it.
> But I would like to drill down the use cases. In most cases,
> logical/physical plan changes are spontaneous or rather internal to the app
> so an external entity making a REST call to save the plan on demand might
> not sync up with when the plan changes took place inside the app. So saving
> the plan JSON files on events described previously seems to be the most
> efficient thing to do (as discussed with @Ashwin Putta) but if there are
> use cases I think it is a good idea to do it on demand as well.
>
> On 11/23/16, 3:00 PM, "Amol Kekre"  wrote:
>
> Good idea. Stram does save state, and maybe a script that translates
> may
> work. But explicit plan saving is also a good idea. Could this be "on
> demand"? a rest call that writes out the plan(s) to specifid hdfs
> files?
>
> We could do both (write on any change/set call) and/or on-demand.
>
> Thks
> Amol
>
>
> On Wed, Nov 23, 2016 at 2:40 PM, Sanjay Pujare  >
> wrote:
>
> > To help Apex developers/users with debugging or troubleshooting
> “dead”
> > applications, I am proposing a new feature to persist logical and
> physical
> > plan snapshots in HDFS.
> >
> >
> >
> > Similar to how the Apex engine persists container data per
> application
> > attempt in HDFS as containers_NNN.json (where NNN is 1 for first app
> > attempt, 2 for the second app attempt and so on), we will create 2
> more
> > sets of files under the …/apps/{appId} directory for an application:
> >
> >
> >
> > logicalPlan_NNN_MMM.json
> >
> > physicalPlan_NNN_MMM.json
> >
> >
> >
> > where NNN stands for the app attempt index (similar to NNN above 1,
> 2, 3
> > and so on) and MMM is a running index starting at 1 which stands for
> a
> > snapshot within an app attempt. Note that a logical or physical plan
> may
> > change within an app attempt for any number of reasons.
> >
> >
> >
> > The StreamingContainerManager class maintains the current
> logical/physical
> > plans in the “plan” member variable. New methods will be added in
> > StreamingContainerManager to save the logical or physical plan as
> JSON
> > representations in the app directory (as described above). The logic
> is
> > similar to com.datatorrent.stram.webapp.StramWebServices.
> getLogicalPlan(String)
> > and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan()
> used
> > inside the Stram Web service. There will be running indexes in
> > StreamingContainerManager to keep track of MMM for the logical plan
> and
> > physical plan. The appropriate save method will be called on the
> occurrence
> > of any event that updates the logical or physical plan for example:
> >
> >
> >
> > inside com.datatorrent.stram.StreamingContainerManager.
> > LogicalPlanChangeRunnable.call()  for logical plan change event
> >
> >
> >
> > inside com.datatorrent.stram.plan.physical.PhysicalPlan.
> redoPartitions(PMapping,
> > String) for physical plan change event (i.e. redoing partitioning)
> >
> >
> >
> > Once these files are created, any user or a tool (such as the Apex
> CLI or
> > the DT Gateway) can look up these files for
> troubleshooting/researching of
> > “dead” applications and significant events in their lifetime in
> terms of
> > logical or physical plan changes. Pls send me your feedback.
> >
> >
> >
> > Sanjay
> >
> >
> >
> >
>
>
>
>


Re: Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Sanjay Pujare
Okay, but this “state” is gone after the app is “dead” isn’t that true? Also 
the reason for this enhancement is debuggability/troubleshooting of Apex apps 
so it is good to have separate explicit user visible files that contain the 
plan information instead of overloading the state for this purpose (in my 
opinion).

In terms of on-demand, it sounds like a good idea - I didn’t think of it. But I 
would like to drill down the use cases. In most cases, logical/physical plan 
changes are spontaneous or rather internal to the app so an external entity 
making a REST call to save the plan on demand might not sync up with when the 
plan changes took place inside the app. So saving the plan JSON files on events 
described previously seems to be the most efficient thing to do (as discussed 
with @Ashwin Putta) but if there are use cases I think it is a good idea to do 
it on demand as well.

On 11/23/16, 3:00 PM, "Amol Kekre"  wrote:

Good idea. Stram does save state, and maybe a script that translates may
work. But explicit plan saving is also a good idea. Could this be "on
demand"? a rest call that writes out the plan(s) to specifid hdfs files?

We could do both (write on any change/set call) and/or on-demand.

Thks
Amol


On Wed, Nov 23, 2016 at 2:40 PM, Sanjay Pujare 
wrote:

> To help Apex developers/users with debugging or troubleshooting “dead”
> applications, I am proposing a new feature to persist logical and physical
> plan snapshots in HDFS.
>
>
>
> Similar to how the Apex engine persists container data per application
> attempt in HDFS as containers_NNN.json (where NNN is 1 for first app
> attempt, 2 for the second app attempt and so on), we will create 2 more
> sets of files under the …/apps/{appId} directory for an application:
>
>
>
> logicalPlan_NNN_MMM.json
>
> physicalPlan_NNN_MMM.json
>
>
>
> where NNN stands for the app attempt index (similar to NNN above 1, 2, 3
> and so on) and MMM is a running index starting at 1 which stands for a
> snapshot within an app attempt. Note that a logical or physical plan may
> change within an app attempt for any number of reasons.
>
>
>
> The StreamingContainerManager class maintains the current logical/physical
> plans in the “plan” member variable. New methods will be added in
> StreamingContainerManager to save the logical or physical plan as JSON
> representations in the app directory (as described above). The logic is
> similar to 
com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String)
> and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan() used
> inside the Stram Web service. There will be running indexes in
> StreamingContainerManager to keep track of MMM for the logical plan and
> physical plan. The appropriate save method will be called on the 
occurrence
> of any event that updates the logical or physical plan for example:
>
>
>
> inside com.datatorrent.stram.StreamingContainerManager.
> LogicalPlanChangeRunnable.call()  for logical plan change event
>
>
>
> inside 
com.datatorrent.stram.plan.physical.PhysicalPlan.redoPartitions(PMapping,
> String) for physical plan change event (i.e. redoing partitioning)
>
>
>
> Once these files are created, any user or a tool (such as the Apex CLI or
> the DT Gateway) can look up these files for troubleshooting/researching of
> “dead” applications and significant events in their lifetime in terms of
> logical or physical plan changes. Pls send me your feedback.
>
>
>
> Sanjay
>
>
>
>





Re: Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Amol Kekre
Good idea. Stram does save state, and maybe a script that translates may
work. But explicit plan saving is also a good idea. Could this be "on
demand"? a rest call that writes out the plan(s) to specifid hdfs files?

We could do both (write on any change/set call) and/or on-demand.

Thks
Amol


On Wed, Nov 23, 2016 at 2:40 PM, Sanjay Pujare 
wrote:

> To help Apex developers/users with debugging or troubleshooting “dead”
> applications, I am proposing a new feature to persist logical and physical
> plan snapshots in HDFS.
>
>
>
> Similar to how the Apex engine persists container data per application
> attempt in HDFS as containers_NNN.json (where NNN is 1 for first app
> attempt, 2 for the second app attempt and so on), we will create 2 more
> sets of files under the …/apps/{appId} directory for an application:
>
>
>
> logicalPlan_NNN_MMM.json
>
> physicalPlan_NNN_MMM.json
>
>
>
> where NNN stands for the app attempt index (similar to NNN above 1, 2, 3
> and so on) and MMM is a running index starting at 1 which stands for a
> snapshot within an app attempt. Note that a logical or physical plan may
> change within an app attempt for any number of reasons.
>
>
>
> The StreamingContainerManager class maintains the current logical/physical
> plans in the “plan” member variable. New methods will be added in
> StreamingContainerManager to save the logical or physical plan as JSON
> representations in the app directory (as described above). The logic is
> similar to 
> com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String)
> and com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan() used
> inside the Stram Web service. There will be running indexes in
> StreamingContainerManager to keep track of MMM for the logical plan and
> physical plan. The appropriate save method will be called on the occurrence
> of any event that updates the logical or physical plan for example:
>
>
>
> inside com.datatorrent.stram.StreamingContainerManager.
> LogicalPlanChangeRunnable.call()  for logical plan change event
>
>
>
> inside 
> com.datatorrent.stram.plan.physical.PhysicalPlan.redoPartitions(PMapping,
> String) for physical plan change event (i.e. redoing partitioning)
>
>
>
> Once these files are created, any user or a tool (such as the Apex CLI or
> the DT Gateway) can look up these files for troubleshooting/researching of
> “dead” applications and significant events in their lifetime in terms of
> logical or physical plan changes. Pls send me your feedback.
>
>
>
> Sanjay
>
>
>
>


Proposing a new feature to persist logical and physical plan snapshots in HDFS

2016-11-23 Thread Sanjay Pujare
To help Apex developers/users with debugging or troubleshooting “dead” 
applications, I am proposing a new feature to persist logical and physical plan 
snapshots in HDFS.

 

Similar to how the Apex engine persists container data per application attempt 
in HDFS as containers_NNN.json (where NNN is 1 for first app attempt, 2 for the 
second app attempt and so on), we will create 2 more sets of files under the 
…/apps/{appId} directory for an application:

 

logicalPlan_NNN_MMM.json

physicalPlan_NNN_MMM.json

 

where NNN stands for the app attempt index (similar to NNN above 1, 2, 3 and so 
on) and MMM is a running index starting at 1 which stands for a snapshot within 
an app attempt. Note that a logical or physical plan may change within an app 
attempt for any number of reasons.

 

The StreamingContainerManager class maintains the current logical/physical 
plans in the “plan” member variable. New methods will be added in 
StreamingContainerManager to save the logical or physical plan as JSON 
representations in the app directory (as described above). The logic is similar 
to com.datatorrent.stram.webapp.StramWebServices.getLogicalPlan(String) and 
com.datatorrent.stram.webapp.StramWebServices.getPhysicalPlan() used inside the 
Stram Web service. There will be running indexes in StreamingContainerManager 
to keep track of MMM for the logical plan and physical plan. The appropriate 
save method will be called on the occurrence of any event that updates the 
logical or physical plan for example:

 

inside 
com.datatorrent.stram.StreamingContainerManager.LogicalPlanChangeRunnable.call()
  for logical plan change event

 

inside 
com.datatorrent.stram.plan.physical.PhysicalPlan.redoPartitions(PMapping, 
String) for physical plan change event (i.e. redoing partitioning)

 

Once these files are created, any user or a tool (such as the Apex CLI or the 
DT Gateway) can look up these files for troubleshooting/researching of “dead” 
applications and significant events in their lifetime in terms of logical or 
physical plan changes. Pls send me your feedback.

 

Sanjay

 



Re: Adding new log4j appender to Apex core

2016-11-23 Thread Vlad Rozov

Both approaches look quite "hacky" to me.

Thank you,

Vlad

On 11/23/16 00:01, Mohit Jotwani wrote:

+1 - Approach 2

Regards,
Mohit

On Wed, Nov 23, 2016 at 12:35 PM, AJAY GUPTA  wrote:


+1 for approach 2.


Regards,
Ajay

On Wed, Nov 23, 2016 at 12:16 PM, David Yan  wrote:


The goal of this log4j appender is to provide a log offset and the fixed
name of the container log file (instead of apex.log becoming apex.log.1

and

then apex.log.2, etc due to rotation) as part of an error STRAM event so
users can easily locate the log entries around the error.

The user can override the appender, but in that case, the engine detects
that and will not include the log location as part of the STRAM event.

David

On Tue, Nov 22, 2016 at 7:10 PM, Priyanka Gugale <

priya...@datatorrent.com

wrote:


Hi,

Thomas,
Yes log4j is ultimately owned by user, and they should be able to

override

it. What I am trying to do is provide a default behavior for Apex. In

case

user isn't using any logger of their own we should use this new

appender

of

Apex rather than using standard log4j appender as per hadoop config.

Sanjay,
Archetype is the good place to put this and I will add it there, but

many

time people won't use it. So I wanted to keep it at ~/.dt as well. Is

there

any other default config folder for Apex?

Also I am not relying on anything. If we fail to find config in app jar

or

~/.dt we are going to skip usage of this new appender.

-Priyanka

On Wed, Nov 23, 2016 at 5:58 AM, Sanjay Pujare  wrote:

 The log4j configuration is ultimately owned by the user, so how

do

you

want
 to enforce a custom appender?

 I don't think that this should rely on anything in ~/.dt either

 Thomas

 On Tue, Nov 22, 2016 at 10:00 AM, Priyanka Gugale <
priya...@datatorrent.com>
 wrote:

 > Hi,
 >
 > I am working on APEXCORE-563
 > 
 > As per this Jira we should put log file name in

container/operator

events.
 > The problem is current RollingFileAppender keeps renaming files

from

1 to 2
 > to ... n as files reach maximum allowed file size. Because of
constant
 > renaming of files we can't put a fixed file name in stram

event.

 >
 > To overcome this I would like to add a new log4j appender to
ApexCore.
 > There are two ways I can implement this:
 > 1. Have Daily rolling file appender. The current file will be
recognized
 > based on timestamp in file name. Also to control max file size,

we

need to
 > keep rolling files based on size as well.
 > 2. Have Rolling File Appender but do not rename files. When max

file

size
 > is reached create new file with next number e.g. crate log file
dt.log.2
 > after dt.log.1 is full. Also to recognize the latest file keep

the

softlink
 > named dt.log pointing to current log file.
 >
 > I would prefer to implement approach 2. Please provide your
 > comments/feedback if you feel otherwise.
 >
 > Also to use this new appender we need to use our custom
log4j.properties
 > file instead of one present in hadoop conf. For that we need to

set

jvm
 > option -Dlog4j.configuration. I am planning to update file
dt-site.xml in
 > folder ~/.dt  and default properties file available in apex
archetype to
 > set jvm options as follows:
 >  
 >dt.attr.CONTAINER_JVM_OPTIONS
 >-Dlog4j.configuration=log4j.props
 >  
 >
 > And I will copy log4j.props file in ~/.dt folder as well as
 > apex-archetypes.
 >
 > Lastly if someone still miss this new log4j properties file or

jvm

option
 > to set -Dlog4j.configuration we will not put log file name in

event

raised
 > by container or operator.
 >
 > Please provide your feedback on this approach.
 >
 > -Priyanka
 >








[jira] [Commented] (APEXMALHAR-2352) Improve performance of keyed windowed operators

2016-11-23 Thread bright chen (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691426#comment-15691426
 ] 

bright chen commented on APEXMALHAR-2352:
-

I am not very sure about the reason.
Maybe in equals, ImmutablePair use instanceof which need to compare the class 
hierarchy while com.datatorrent.common.util.Pair just use getClass()

> Improve performance of keyed windowed operators
> ---
>
> Key: APEXMALHAR-2352
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2352
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: bright chen
>
> refer to https://issues.apache.org/jira/browse/APEXMALHAR-2339 to the keyed 
> windowed operator benchmark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon

2016-11-23 Thread Roman Shaposhnik
Hi!

apologies for the extra wide distribution (this exhausts my once
a year ASF mail-to-all-bigdata-projects quota ;-)) but I wanted
to suggest that all of you should consider submitting talks
to FOSDEM 2017 HPC, Bigdata and Data Science DevRoom:
https://hpc-bigdata-fosdem17.github.io/

It was a great success this year and we hope to make it an even
bigger success in 2017.

Besides -- FOSDEM is the biggest gathering of open source
developers on the face of the earth -- don't miss it!

Thanks,
Roman.

P.S. If you have any questions -- please email me directly and
see you all in Brussels!


[jira] [Updated] (APEXCORE-563) Have a pointer to container log file name and offset in stram events that deliver a container or operator failure event.

2016-11-23 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated APEXCORE-563:

Issue Type: New Feature  (was: Bug)

> Have a pointer to container log file name and offset in stram events that 
> deliver a container or operator failure event.
> 
>
> Key: APEXCORE-563
> URL: https://issues.apache.org/jira/browse/APEXCORE-563
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Sanjay M Pujare
>Assignee: Priyanka Gugale
>
> The default DailyRollingFileAppender does not take into account of how many 
> backup files to keep and it will result in unbounded growth of log files, 
> especially for long running applications.
> The below is an interesting add-on to the default DailyRollingFileAppender 
> that supports maxBackupIndex.
> http://wiki.apache.org/logging-log4j/DailyRollingFileAppender



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-526) Publish javadoc for releases on ASF infrastructure

2016-11-23 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690911#comment-15690911
 ] 

Munagala V. Ramanath commented on APEXCORE-526:
---

Anything that can be generated with a sequence of shell command can be added. 
The Python code and hence the documentation for it can change as we add more 
branches and features but here is a quick summary of the current state:

*Buildbot* is a Python-based tool CI tool. Build services using buildbot are 
available
on Apache infrastructure with some very basic documentation at
[https://ci.apache.org/buildbot.html]
Included on that page are various links to view the status of recent builds.

Buildbot is configured via configuration files with the *{{.conf}}* extension; 
these files
are not just key-value pairs but rather full Python scripts.

The configuration file for Apache projects are in an SVN repo at:
[https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects]
There are files for around 70 projects and looking at those files is a good way
to figure out how to setup a build.

The Apex build is in *{{apex.conf}}*. All configuration files are catenated 
together before
execution, so we need to be mindful of potential name clashes. The result of 
running the
configuration file should be to append suitable objects to the global map named 
simply *{{c}}*.
Commonly, we create and append a builder object to *{{c['builders']}}* and a 
scheduler object to
*{{c['schedulers']}}*.

A builder is just a key-value map containing things like the builder name, list 
of slaves
to run the build on, environment variable settings and, most importantly, a 
*BuildFactory*
which holds the various steps of the build.

In the Apex case, we define a function named *{{add_apex_malhar_builders()}}* 
and invoke it;
this minimizes the likelihood of global name clashes. We create a build factory 
containing
these steps:

{quote}
(a) Run git to checkout desired branch.
(b) Run mvn to generate the javadocs.
(c) Cleanup tmp directory.
(d) Upload files to master.
(e) Publish files to appropriate directory on master.
(f) Cleanup tmp directory again.

We explicitly set *JAVA_HOME* to point to 1.8 to avoid some build problems. 
This path
is not uniform on all the slaves and causes the build to fail sometimes 
depending
on which slave it runs on; a fix is pending.

We add a single scheduler which schedules a build every 24hrs.

Once the *{{apex.conf}}* file is checked back in to the SVN repository after 
making changes
it should take effect a few minutes later.


> Publish javadoc for releases on ASF infrastructure 
> ---
>
> Key: APEXCORE-526
> URL: https://issues.apache.org/jira/browse/APEXCORE-526
> Project: Apache Apex Core
>  Issue Type: Documentation
>Reporter: Thomas Weise
>
> Every release should have the javadocs published and we should have it linked 
> from the download page, as is the case with user docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (APEXCORE-526) Publish javadoc for releases on ASF infrastructure

2016-11-23 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690911#comment-15690911
 ] 

Munagala V. Ramanath edited comment on APEXCORE-526 at 11/23/16 6:04 PM:
-

Anything that can be generated with a sequence of shell commands can be added. 
The Python code (and hence the documentation for it) can change as we add more 
branches and features but here is a quick summary of the current state:

*Buildbot* is a Python-based tool CI tool. Build services using buildbot are 
available
on Apache infrastructure with some very basic documentation at
[https://ci.apache.org/buildbot.html]
Included on that page are various links to view the status of recent builds.

Buildbot is configured via configuration files with the *{{.conf}}* extension; 
these files
are not just key-value pairs but rather full Python scripts.

The configuration file for Apache projects are in an SVN repo at:
[https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects]
There are files for around 70 projects and looking at those files is a good way
to figure out how to setup a build.

The Apex build is in *{{apex.conf}}*. All configuration files are catenated 
together before
execution, so we need to be mindful of potential name clashes. The result of 
running the
configuration file should be to append suitable objects to the global map named 
simply *{{c}}*.
Commonly, we create and append a builder object to *{{c['builders']}}* and a 
scheduler object to
*{{c['schedulers']}}*.

A builder is just a key-value map containing things like the builder name, list 
of slaves
to run the build on, environment variable settings and, most importantly, a 
*BuildFactory*
which holds the various steps of the build.

In the Apex case, we define a function named *{{add_apex_malhar_builders()}}* 
and invoke it;
this minimizes the likelihood of global name clashes. We create a build factory 
containing
these steps:

{quote}
(a) Run git to checkout desired branch.
(b) Run mvn to generate the javadocs.
(c) Cleanup tmp directory.
(d) Upload files to master.
(e) Publish files to appropriate directory on master.
(f) Cleanup tmp directory again.

We explicitly set *JAVA_HOME* to point to 1.8 to avoid some build problems. 
This path
is not uniform on all the slaves and causes the build to fail sometimes 
depending
on which slave it runs on; a fix is pending.

We add a single scheduler which schedules a build every 24hrs.

Once the *{{apex.conf}}* file is checked back in to the SVN repository after 
making changes
it should take effect a few minutes later.



was (Author: dtram):
Anything that can be generated with a sequence of shell command can be added. 
The Python code and hence the documentation for it can change as we add more 
branches and features but here is a quick summary of the current state:

*Buildbot* is a Python-based tool CI tool. Build services using buildbot are 
available
on Apache infrastructure with some very basic documentation at
[https://ci.apache.org/buildbot.html]
Included on that page are various links to view the status of recent builds.

Buildbot is configured via configuration files with the *{{.conf}}* extension; 
these files
are not just key-value pairs but rather full Python scripts.

The configuration file for Apache projects are in an SVN repo at:
[https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects]
There are files for around 70 projects and looking at those files is a good way
to figure out how to setup a build.

The Apex build is in *{{apex.conf}}*. All configuration files are catenated 
together before
execution, so we need to be mindful of potential name clashes. The result of 
running the
configuration file should be to append suitable objects to the global map named 
simply *{{c}}*.
Commonly, we create and append a builder object to *{{c['builders']}}* and a 
scheduler object to
*{{c['schedulers']}}*.

A builder is just a key-value map containing things like the builder name, list 
of slaves
to run the build on, environment variable settings and, most importantly, a 
*BuildFactory*
which holds the various steps of the build.

In the Apex case, we define a function named *{{add_apex_malhar_builders()}}* 
and invoke it;
this minimizes the likelihood of global name clashes. We create a build factory 
containing
these steps:

{quote}
(a) Run git to checkout desired branch.
(b) Run mvn to generate the javadocs.
(c) Cleanup tmp directory.
(d) Upload files to master.
(e) Publish files to appropriate directory on master.
(f) Cleanup tmp directory again.

We explicitly set *JAVA_HOME* to point to 1.8 to avoid some build problems. 
This path
is not uniform on all the slaves and causes the build to fail sometimes 
depending
on which slave it runs on; a fix is pending.

We add a single scheduler which schedules a build 

[jira] [Commented] (APEXCORE-563) Have a pointer to container log file name and offset in stram events that deliver a container or operator failure event.

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689980#comment-15689980
 ] 

ASF GitHub Bot commented on APEXCORE-563:
-

GitHub user DT-Priyanka opened a pull request:

https://github.com/apache/apex-core/pull/421

APEXCORE-563: Adding APEXRFA appender to roll log files without renam…

…ing them, also adding log file name and offset in events

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DT-Priyanka/incubator-apex-core 
APEXCORE-563-log-appender

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #421


commit f6039b232df9aa6c463871c109b93a87147e268a
Author: Priyanka Gugale 
Date:   2016-11-23T08:43:12Z

APEXCORE-563: Adding APEXRFA appender to roll log files without renaming 
them, also adding log file name and offset in events




> Have a pointer to container log file name and offset in stram events that 
> deliver a container or operator failure event.
> 
>
> Key: APEXCORE-563
> URL: https://issues.apache.org/jira/browse/APEXCORE-563
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Sanjay M Pujare
>Assignee: Priyanka Gugale
>
> The default DailyRollingFileAppender does not take into account of how many 
> backup files to keep and it will result in unbounded growth of log files, 
> especially for long running applications.
> The below is an interesting add-on to the default DailyRollingFileAppender 
> that supports maxBackupIndex.
> http://wiki.apache.org/logging-log4j/DailyRollingFileAppender



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-core pull request #421: APEXCORE-563: Adding APEXRFA appender to roll l...

2016-11-23 Thread DT-Priyanka
GitHub user DT-Priyanka opened a pull request:

https://github.com/apache/apex-core/pull/421

APEXCORE-563: Adding APEXRFA appender to roll log files without renam…

…ing them, also adding log file name and offset in events

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DT-Priyanka/incubator-apex-core 
APEXCORE-563-log-appender

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #421


commit f6039b232df9aa6c463871c109b93a87147e268a
Author: Priyanka Gugale 
Date:   2016-11-23T08:43:12Z

APEXCORE-563: Adding APEXRFA appender to roll log files without renaming 
them, also adding log file name and offset in events




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (APEXMALHAR-2353) timeExpression should not be null for time based Dedup

2016-11-23 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar resolved APEXMALHAR-2353.
---
   Resolution: Fixed
Fix Version/s: 3.6.0

> timeExpression should not be null for time based Dedup  
> 
>
> Key: APEXMALHAR-2353
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2353
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
>Priority: Minor
> Fix For: 3.6.0
>
>
> Time Based Dedup has timeExpression as optional. It has to be supplied by the 
> user.
> In the current setting, if the user does not specify a timeExpression, then a 
> different time (System time) will be passed for each tuple, irrespective of 
> whether the tuple is a duplicate or a unique. This is a bug since even the 
> duplicate tuples may fall in different buckets and will be concluded as 
> unique. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2353) timeExpression should not be null for time based Dedup

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689945#comment-15689945
 ] 

ASF GitHub Bot commented on APEXMALHAR-2353:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/508


> timeExpression should not be null for time based Dedup  
> 
>
> Key: APEXMALHAR-2353
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2353
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
>Priority: Minor
> Fix For: 3.6.0
>
>
> Time Based Dedup has timeExpression as optional. It has to be supplied by the 
> user.
> In the current setting, if the user does not specify a timeExpression, then a 
> different time (System time) will be passed for each tuple, irrespective of 
> whether the tuple is a duplicate or a unique. This is a bug since even the 
> duplicate tuples may fall in different buckets and will be concluded as 
> unique. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #508: APEXMALHAR-2353 Made timeExpression as @NotNu...

2016-11-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/508


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2340) Initialize the list of JdbcFieldInfo in JdbcPOJOInsertOutput from properties.xml

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689355#comment-15689355
 ] 

ASF GitHub Bot commented on APEXMALHAR-2340:


GitHub user Hitesh-Scorpio reopened a pull request:

https://github.com/apache/apex-malhar/pull/507

APEXMALHAR-2340 code changes to initialize the list of JdbcFieldInfo …

…in JdbcPOJOInsertOutput from properties.xml @bhupeshchawda please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Hitesh-Scorpio/apex-malhar 
APEXMALHAR-2340_ConfigureJdbcExternally

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #507


commit 6a1d722007aa69aa16e01d6a27a2d81ee98a5d06
Author: Hitesh-Scorpio 
Date:   2016-11-17T05:19:20Z

APEXMALHAR-2340 code changes to initialize the list of JdbcFieldInfo in 
JdbcPOJOInsertOutput from properties.xml




> Initialize the list of JdbcFieldInfo in JdbcPOJOInsertOutput from 
> properties.xml
> 
>
> Key: APEXMALHAR-2340
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2340
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Hitesh Kapoor
>Assignee: Hitesh Kapoor
>
> Currently the list of JdbcFieldInfo is populated using java code.
> This should be done using properties.xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2340) Initialize the list of JdbcFieldInfo in JdbcPOJOInsertOutput from properties.xml

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689356#comment-15689356
 ] 

ASF GitHub Bot commented on APEXMALHAR-2340:


Github user Hitesh-Scorpio closed the pull request at:

https://github.com/apache/apex-malhar/pull/507


> Initialize the list of JdbcFieldInfo in JdbcPOJOInsertOutput from 
> properties.xml
> 
>
> Key: APEXMALHAR-2340
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2340
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Hitesh Kapoor
>Assignee: Hitesh Kapoor
>
> Currently the list of JdbcFieldInfo is populated using java code.
> This should be done using properties.xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #507: APEXMALHAR-2340 code changes to initialize th...

2016-11-23 Thread Hitesh-Scorpio
GitHub user Hitesh-Scorpio reopened a pull request:

https://github.com/apache/apex-malhar/pull/507

APEXMALHAR-2340 code changes to initialize the list of JdbcFieldInfo …

…in JdbcPOJOInsertOutput from properties.xml @bhupeshchawda please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Hitesh-Scorpio/apex-malhar 
APEXMALHAR-2340_ConfigureJdbcExternally

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #507


commit 6a1d722007aa69aa16e01d6a27a2d81ee98a5d06
Author: Hitesh-Scorpio 
Date:   2016-11-17T05:19:20Z

APEXMALHAR-2340 code changes to initialize the list of JdbcFieldInfo in 
JdbcPOJOInsertOutput from properties.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #507: APEXMALHAR-2340 code changes to initialize th...

2016-11-23 Thread Hitesh-Scorpio
Github user Hitesh-Scorpio closed the pull request at:

https://github.com/apache/apex-malhar/pull/507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2353) timeExpression should not be null for time based Dedup

2016-11-23 Thread Bhupesh Chawda (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupesh Chawda updated APEXMALHAR-2353:
---
Description: 
Time Based Dedup has timeExpression as optional. It has to be supplied by the 
user.

In the current setting, if the user does not specify a timeExpression, then a 
different time (System time) will be passed for each tuple, irrespective of 
whether the tuple is a duplicate or a unique. This is a bug since even the 
duplicate tuples may fall in different buckets and will be concluded as unique. 

  was:Time Based Dedup has timeExpression as optional. It has to be supplied by 
the user.


> timeExpression should not be null for time based Dedup  
> 
>
> Key: APEXMALHAR-2353
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2353
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
>Priority: Minor
>
> Time Based Dedup has timeExpression as optional. It has to be supplied by the 
> user.
> In the current setting, if the user does not specify a timeExpression, then a 
> different time (System time) will be passed for each tuple, irrespective of 
> whether the tuple is a duplicate or a unique. This is a bug since even the 
> duplicate tuples may fall in different buckets and will be concluded as 
> unique. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2353) timeExpression should not be null for time based Dedup

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689342#comment-15689342
 ] 

ASF GitHub Bot commented on APEXMALHAR-2353:


GitHub user bhupeshchawda opened a pull request:

https://github.com/apache/apex-malhar/pull/508

APEXMALHAR-2353 Made timeExpression as @NotNull

@chinmaykolhatkar Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhupeshchawda/apex-malhar 
APEXMALHAR-2353-dedup-timeExpression

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #508


commit f617d5e35b08412bcee8fdf0a40e0bc4d317329a
Author: bhupeshchawda 
Date:   2016-11-23T08:25:50Z

APEXMALHAR-2353 Made timeExpression as @NotNull




> timeExpression should not be null for time based Dedup  
> 
>
> Key: APEXMALHAR-2353
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2353
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Bhupesh Chawda
>Assignee: Bhupesh Chawda
>Priority: Minor
>
> Time Based Dedup has timeExpression as optional. It has to be supplied by the 
> user.
> In the current setting, if the user does not specify a timeExpression, then a 
> different time (System time) will be passed for each tuple, irrespective of 
> whether the tuple is a duplicate or a unique. This is a bug since even the 
> duplicate tuples may fall in different buckets and will be concluded as 
> unique. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #508: APEXMALHAR-2353 Made timeExpression as @NotNu...

2016-11-23 Thread bhupeshchawda
GitHub user bhupeshchawda opened a pull request:

https://github.com/apache/apex-malhar/pull/508

APEXMALHAR-2353 Made timeExpression as @NotNull

@chinmaykolhatkar Please review and merge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhupeshchawda/apex-malhar 
APEXMALHAR-2353-dedup-timeExpression

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #508


commit f617d5e35b08412bcee8fdf0a40e0bc4d317329a
Author: bhupeshchawda 
Date:   2016-11-23T08:25:50Z

APEXMALHAR-2353 Made timeExpression as @NotNull




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (APEXMALHAR-2353) timeExpression should not be null for time based Dedup

2016-11-23 Thread Bhupesh Chawda (JIRA)
Bhupesh Chawda created APEXMALHAR-2353:
--

 Summary: timeExpression should not be null for time based Dedup  
 Key: APEXMALHAR-2353
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2353
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Bhupesh Chawda
Assignee: Bhupesh Chawda
Priority: Minor


Time Based Dedup has timeExpression as optional. It has to be supplied by the 
user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Adding new log4j appender to Apex core

2016-11-23 Thread Mohit Jotwani
+1 - Approach 2

Regards,
Mohit

On Wed, Nov 23, 2016 at 12:35 PM, AJAY GUPTA  wrote:

> +1 for approach 2.
>
>
> Regards,
> Ajay
>
> On Wed, Nov 23, 2016 at 12:16 PM, David Yan  wrote:
>
> > The goal of this log4j appender is to provide a log offset and the fixed
> > name of the container log file (instead of apex.log becoming apex.log.1
> and
> > then apex.log.2, etc due to rotation) as part of an error STRAM event so
> > users can easily locate the log entries around the error.
> >
> > The user can override the appender, but in that case, the engine detects
> > that and will not include the log location as part of the STRAM event.
> >
> > David
> >
> > On Tue, Nov 22, 2016 at 7:10 PM, Priyanka Gugale <
> priya...@datatorrent.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > Thomas,
> > > Yes log4j is ultimately owned by user, and they should be able to
> > override
> > > it. What I am trying to do is provide a default behavior for Apex. In
> > case
> > > user isn't using any logger of their own we should use this new
> appender
> > of
> > > Apex rather than using standard log4j appender as per hadoop config.
> > >
> > > Sanjay,
> > > Archetype is the good place to put this and I will add it there, but
> many
> > > time people won't use it. So I wanted to keep it at ~/.dt as well. Is
> > there
> > > any other default config folder for Apex?
> > >
> > > Also I am not relying on anything. If we fail to find config in app jar
> > or
> > > ~/.dt we are going to skip usage of this new appender.
> > >
> > > -Priyanka
> > >
> > > On Wed, Nov 23, 2016 at 5:58 AM, Sanjay Pujare  >
> > > wrote:
> > >
> > > > The only way to “enforce” this new appender is to update the
> archetypes
> > > > (apex-app-archetype and apex-conf-archetype under apex-core/ )  to
> use
> > > the
> > > > new ones as default. But there does not seem to be a way to enforce
> > this
> > > > for anyone not using the archetypes.
> > > >
> > > > I agree with not relying on ~/.dt in apex-core.
> > > >
> > > > On 11/22/16, 1:08 PM, "Thomas Weise"  wrote:
> > > >
> > > > The log4j configuration is ultimately owned by the user, so how
> do
> > > you
> > > > want
> > > > to enforce a custom appender?
> > > >
> > > > I don't think that this should rely on anything in ~/.dt either
> > > >
> > > > Thomas
> > > >
> > > > On Tue, Nov 22, 2016 at 10:00 AM, Priyanka Gugale <
> > > > priya...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am working on APEXCORE-563
> > > > > 
> > > > > As per this Jira we should put log file name in
> > container/operator
> > > > events.
> > > > > The problem is current RollingFileAppender keeps renaming files
> > > from
> > > > 1 to 2
> > > > > to ... n as files reach maximum allowed file size. Because of
> > > > constant
> > > > > renaming of files we can't put a fixed file name in stram
> event.
> > > > >
> > > > > To overcome this I would like to add a new log4j appender to
> > > > ApexCore.
> > > > > There are two ways I can implement this:
> > > > > 1. Have Daily rolling file appender. The current file will be
> > > > recognized
> > > > > based on timestamp in file name. Also to control max file size,
> > we
> > > > need to
> > > > > keep rolling files based on size as well.
> > > > > 2. Have Rolling File Appender but do not rename files. When max
> > > file
> > > > size
> > > > > is reached create new file with next number e.g. crate log file
> > > > dt.log.2
> > > > > after dt.log.1 is full. Also to recognize the latest file keep
> > the
> > > > softlink
> > > > > named dt.log pointing to current log file.
> > > > >
> > > > > I would prefer to implement approach 2. Please provide your
> > > > > comments/feedback if you feel otherwise.
> > > > >
> > > > > Also to use this new appender we need to use our custom
> > > > log4j.properties
> > > > > file instead of one present in hadoop conf. For that we need to
> > set
> > > > jvm
> > > > > option -Dlog4j.configuration. I am planning to update file
> > > > dt-site.xml in
> > > > > folder ~/.dt  and default properties file available in apex
> > > > archetype to
> > > > > set jvm options as follows:
> > > > >  
> > > > >dt.attr.CONTAINER_JVM_OPTIONS
> > > > >-Dlog4j.configuration=log4j.props
> > > > >  
> > > > >
> > > > > And I will copy log4j.props file in ~/.dt folder as well as
> > > > > apex-archetypes.
> > > > >
> > > > > Lastly if someone still miss this new log4j properties file or
> > jvm
> > > > option
> > > > > to set -Dlog4j.configuration we will not put log file name in
> > event
> > > > raised
> > > > > by container or operator.
> > > > >
> > > > > Please provide your feedback on this approach.
> > > > >
>