[jira] [Created] (APEXMALHAR-2363) Kafka input operator needs a force exit in case it can't recover from application offsets

2016-12-01 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2363:
--

 Summary: Kafka input operator needs a force exit in case it can't 
recover from application offsets
 Key: APEXMALHAR-2363
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2363
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua


Nowadays, when the initial offset of kafka is set to APPLICATION_OR_LATEST, and 
if some reason the consumer is temporarily not able to resume from application 
offset, it will fall into latest offset immediately.  
We should 1. give it a retry option
2. give a force exit option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-11-30 Thread Siyuan Hua
+1

Verified checksums
Verified compilation
Verified build and test
Verified pi demo

On Wed, Nov 30, 2016 at 9:50 AM, Tushar Gosavi 
wrote:

> +1
>
> Verified checksums
> Verified compilation
>
> - Tushar.
>
>
> On Wed, Nov 30, 2016 at 7:43 PM, Thomas Weise  wrote:
> > Can folks please verify the release.
> >
> > Thanks
> >
> > --
> > sent from mobile
> > On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:
> >
> >> Dear Community,
> >>
> >> Please vote on the following Apache Apex Malhar 3.6.0 release candidate.
> >>
> >> This is a source release with binary artifacts published to Maven.
> >>
> >> This release is based on Apex Core 3.4 and resolves 69 issues.
> >>
> >> The release adds first iteration of SQL support via Apache Calcite, an
> >> alternative Cassandra output operator (non-transactional, upsert based),
> >> enrichment operator, improvements to window storage and new user
> >> documentation for several operators along with many other enhancements
> and
> >> bug fixes.
> >>
> >> List of all issues fixed: https://s.apache.org/9b0t
> >> User documentation: http://apex.apache.org/docs/malhar-3.6/
> >>
> >> Staging directory:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.6.0-RC1/
> >> Source zip:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
> >> Source tar.gz:
> >> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> >> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
> >> Maven staging repository:
> >> https://repository.apache.org/content/repositories/orgapacheapex-1020/
> >>
> >> Git source:
> >> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> >> commit;h=refs/tags/v3.6.0-RC1
> >>  (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)
> >>
> >> PGP key:
> >> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> >> KEYS file:
> >> https://dist.apache.org/repos/dist/release/apex/KEYS
> >>
> >> More information at:
> >> http://apex.apache.org
> >>
> >> Please try the release and vote; vote will be open util Wed, 11/30 EOD
> PST
> >> considering the US holiday weekend.
> >>
> >> [ ] +1 approve (and what verification was done)
> >> [ ] -1 disapprove (and reason why)
> >>
> >> http://www.apache.org/foundation/voting.html
> >>
> >> How to verify release candidate:
> >>
> >> http://apex.apache.org/verification.html
> >>
> >> Thanks,
> >> Thomas
> >>
> >>
>


Re: Malhar release 3.6

2016-11-20 Thread Siyuan Hua
Yes yes,
We need to put the clear plan upfront

This ticket was thought and scheduled to be done after the APEXMALHAR-2271
SpillableSetMultimap which bucketize elements on the time from values.
But per discussion with David,  we realized this ticket is more important
and affect the general purging functionality so I just jumped to this issue
late last week.
Now the status is the pull request is there. I feel it can be merged soon.
https://github.com/apache/apex-malhar/pull/503

Regards,
Siyuan

On Sat, Nov 19, 2016 at 12:56 AM, Thomas Weise <t...@apache.org> wrote:

> David,
>
> APEXMALHAR-2301 is still open. Are we on track to complete the JIRAs by
> Monday or should we move to next release?
>
> Siyuan,
>
> There is a question on the ticket from a month ago that wasn't addressed. I
> think there needs to be more visibility on the thought process and
> decisions made, otherwise it is very difficult for others to follow. Open
> collaboration is key.
>
> Thanks
>
>
> On Thu, Nov 17, 2016 at 8:29 PM, David Yan <da...@datatorrent.com> wrote:
>
> > Hi Thomas,
> >
> > We would like to finish the following tickets if possible for the 3.6.0
> > release:
> >
> > https://issues.apache.org/jira/browse/APEXMALHAR-2301 (ETA: 11/18)
> > https://issues.apache.org/jira/browse/APEXMALHAR-2339 (ETA: 11/21)
> > https://issues.apache.org/jira/browse/APEXMALHAR-2345 (ETA: 11/21)
> >
> > For https://issues.apache.org/jira/browse/APEXMALHAR-2271, since this
> only
> > affects Session Windows and still quite some work needs to be done there,
> > we would like to defer that to the next release.
> >
> > Siyuan and Bright, please also chime in if I'm missing anything.
> >
> > David
> >
> > On Thu, Nov 17, 2016 at 2:49 AM, Thomas Weise <t...@apache.org> wrote:
> >
> > > David,
> > >
> > > Any update WRT APEXMALHAR-2130
> > > <https://issues.apache.org/jira/browse/APEXMALHAR-2130> /
> > APEXMALHAR-2301
> > > <https://issues.apache.org/jira/browse/APEXMALHAR-2301> ?
> > >
> > > I would like to cut first RC by end of the week.
> > >
> > > Thanks
> > >
> > > On Wed, Nov 16, 2016 at 12:44 AM, Thomas Weise <t...@apache.org> wrote:
> > >
> > > > It has been a while and the issues that we were waiting for are still
> > > > unresolved.
> > > >
> > > > https://issues.apache.org/jira/issues/?jql=fixVersion%
> > > > 20%3D%203.6.0%20AND%20project%20%3D%20APEXMALHAR%20and%
> > > 20resolution%20%3D%
> > > > 20unresolved%20ORDER%20BY%20status%20ASC
> > > >
> > > > I would suggest:
> > > >
> > > > APEXMALHAR-2300 <https://issues.apache.org/
> jira/browse/APEXMALHAR-2300
> > >
> > > -
> > > > defer
> > > > APEXMALHAR-2130 <https://issues.apache.org/
> jira/browse/APEXMALHAR-2130
> > >
> > > /
> > > > APEXMALHAR-2301 <https://issues.apache.org/
> jira/browse/APEXMALHAR-2301
> > >
> > > -
> > > > David/Siyuan can you please give an update and recommendation.
> > > > APEXMALHAR-2284 <https://issues.apache.org/
> jira/browse/APEXMALHAR-2284
> > >
> > > -
> > > > see discussion on the PR and JIRA. Also discussed it with Bhupesh and
> > > > Chinmay today and we think that this join operator should probably be
> > > > replaced with the new operator that is based on WindowOperator.
> > > > APEXMALHAR-2203 <https://issues.apache.org/
> jira/browse/APEXMALHAR-2203
> > >
> > > -
> > > > defer
> > > >
> > > > Please provide any feedback you may have within a day so that we can
> > > > unblock the release.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Nov 7, 2016 at 6:36 AM, Chaitanya Chebolu <
> > > > chaita...@datatorrent.com> wrote:
> > > >
> > > >> Hi Thomas,
> > > >>
> > > >>I am working on APEXMALHAR-2284 and will open a PR in couple of
> > days.
> > > >>
> > > >> Regards,
> > > >> Chaitanya
> > > >>
> > > >> On Sun, Nov 6, 2016 at 10:51 PM, Thomas Weise <t...@apache.org>
> wrote:
> > > >>
> > > >> &g

[jira] [Commented] (APEXMALHAR-2301) Implement another TimeBucketAssigner to work with any time

2016-11-20 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682657#comment-15682657
 ] 

Siyuan Hua commented on APEXMALHAR-2301:


Ok, there are couple things here, the current implementation(Assigner) is not 
easy to use to resolve the unbounded time buckets. You have to set the 
reference to infinite time and the expireduration to Long.MAX - 0 which is not 
very obvious. 
What we do is just assign time to a rounded time buckets(just by bucket 
duration)
For example if a bucket span is 1000, time 1001,1002,1003 will all be in bucket 
1000

Also the current "moving/sliding" mechanism is something we don't need in event 
time handling 

Another issue we need to fix is the bucket metadata current is stored in a 
array which is not good enough for a unbounded time buckets. I also need to 
change it to a map to make it suitable  for monotonically unlimited bucket ids. 

> Implement another TimeBucketAssigner to work with any time
> --
>
> Key: APEXMALHAR-2301
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2301
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: Siyuan Hua
>Assignee: Siyuan Hua
> Fix For: 3.6.0
>
>
> The current TimeBucketAssigner only works for time that is close to current 
> system timestamp. But to support arbitrary event time, we need another 
> TimeBucketAssigner 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSSION] Custom Control Tuples

2016-11-02 Thread Siyuan Hua
I will vote for approach 1.

First of all that one sounds easier to do to me. And I think idempotency is
important. It may run at the cost of higher latency but I think it is ok

And in addition, when in the future if users do need realtime control tuple
processing, we can always add the option on top of it.

So I vote for 1

Thanks,
Siyuan

On Wed, Nov 2, 2016 at 1:28 PM, Pradeep A. Dalvi  wrote:

> As a rule of thumb in any real time operating system, control tuples should
> always be handled using Priority Queues.
>
> We may try to control priorities by defining levels. And shall not
> be delivered at window boundaries.
>
> In short, control tuples shall never be treated as any other tuples in real
> time systems.
>
> On Thursday, November 3, 2016, David Yan  wrote:
>
> > Hi all,
> >
> > I would like to renew the discussion of control tuples.
> >
> > Last time, we were in a debate about whether:
> >
> > 1) the platform should enforce that control tuples are delivered at
> window
> > boundaries only
> >
> > or:
> >
> > 2) the platform should deliver control tuples just as other tuples and
> it's
> > the operator developers' choice whether to handle the control tuples as
> > they arrive or delay the processing till the next window boundary.
> >
> > To summarize the pros and cons:
> >
> > Approach 1: If processing control tuples results in changes of the
> behavior
> > of the operator, if idempotency needs to be preserved, the processing
> must
> > be done at window boundaries. This approach will save the operator
> > developers headache to ensure that. However, this will take away the
> > choices from the operator developer if they just need to process the
> > control tuples as soon as possible.
> >
> > Approach 2: The operator has a chance to immediately process control
> > tuples. This would be useful if latency is more valued than correctness.
> > However, if this would open the possibility for operator developers to
> > shoot themselves in the foot. This is especially true if there are
> multiple
> > input ports. as there is no easy way to guarantee processing order for
> > multiple input ports.
> >
> > We would like to arrive to a consensus and close this discussion soon
> this
> > time so we can start the work on this important feature.
> >
> > Thanks!
> >
> > David
> >
> > On Tue, Jun 28, 2016 at 10:04 AM, Vlad Rozov  > >
> > wrote:
> >
> > > It is not clear how operator will emit custom control tuple at window
> > > boundaries. One way is to cache/accumulate control tuples in the
> operator
> > > output port till window closes (END_WINDOW is inserted into the output
> > > sink) or only allow an operator to emit control tuples inside the
> > > endWindow(). The later is a slight variation of the operator output
> port
> > > caching behavior with the only difference that now the operator itself
> is
> > > responsible for caching/accumulating control tuples. Note that in many
> > > cases it will be necessary to postpone emitting payload tuples that
> > > logically come after the custom control tuple till the next window
> > begins.
> > >
> > > IMO, that too restrictive and in a case where input operator uses a
> push
> > > instead of a poll (for example, it provides an end point where remote
> > > agents may connect and publish/push data), control tuples may be used
> for
> > > connect/disconnect/watermark broadcast to (partitioned) downstream
> > > operators. In this case the platform just need to guarantee order
> barrier
> > > (any tuple emitted prior to a control tuple needs to be delivered prior
> > to
> > > the control tuple).
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> > >
> > > On 6/27/16 19:36, Amol Kekre wrote:
> > >
> > >> I agree with David. Allowing control tuples within a window (along
> with
> > >> data tuples) creates very dangerous situation where guarantees are
> > >> impacted. It is much safer to enable control tuples (send/receive) at
> > >> window boundaries (after END_WINDOW of window N, and before
> BEGIN_WINDOW
> > >> for window N+1). My take on David's list is
> > >>
> > >> 1. -> window boundaries -> Strong +1; there will be a big issue with
> > >> guarantees for operators with multiple ports. (see Thomas's response)
> > >> 2. -> All downstream windows -> +1, but there are situations; a caveat
> > >> could be "only to operators that implement control tuple
> > >> interface/listeners", which could effectively translates to "all
> > >> interested
> > >> downstream operators"
> > >> 3. Only Input operator can create control tuples -> -1; is restrictive
> > >> even
> > >> though most likely 95% of the time it will be input operators
> > >>
> > >> Thks,
> > >> Amol
> > >>
> > >>
> > >> On Mon, Jun 27, 2016 at 4:37 PM, Thomas Weise  > >
> > >> wrote:
> > >>
> > >> The windowing we discuss here is in general event time based, arrival
> > time
> > >>> is a 

Re: Malhar release 3.6

2016-10-28 Thread Siyuan Hua
+1

Thanks!

Sent from my iPhone

> On Oct 26, 2016, at 13:11, Thomas Weise  wrote:
> 
> Hi,
> 
> I'm proposing another release of Malhar in November. There are 49 issues
> marked for the release, including important bug fixes, new documentation,
> SQL support and the work for windowed operator state management:
> 
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%203.6.0%20AND%20project%20%3D%20APEXMALHAR%20ORDER%20BY%20status%20ASC
> 
> Currently there is at least one blocker, the join operator is broken after
> change in managed state. It also affects the SQL feature.
> 
> Thanks,
> Thomas


[jira] [Updated] (APEXMALHAR-2203) Control tuple port and watermark support in high-level API (version 1)

2016-10-20 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2203:
---
Summary: Control tuple port and watermark support in high-level API 
(version 1)  (was: Control tuple port and watermark support in high-level API)

> Control tuple port and watermark support in high-level API (version 1)
> --
>
> Key: APEXMALHAR-2203
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2203
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>        Reporter: Siyuan Hua
>    Assignee: Siyuan Hua
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2203) Control tuple port and watermark support in high-level API (version 1)

2016-10-20 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592494#comment-15592494
 ] 

Siyuan Hua commented on APEXMALHAR-2203:


Support _single_ control tuple port

Support both watermark in operator itself and/or injecting watermark generation 
after operator based on time or data tuple itself


> Control tuple port and watermark support in high-level API (version 1)
> --
>
> Key: APEXMALHAR-2203
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2203
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>        Reporter: Siyuan Hua
>    Assignee: Siyuan Hua
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Make cli support high-level API and SQL

2016-10-20 Thread Siyuan Hua
Given we already have first version of high-level API and SQL PR is getting
close to be merged. And also we have ability to launch application
grammatically. I think it's nice to have our cli support both high-level
API and SQL directly so people can simply prototype what they need and/or
try out Apex with more flexibility.

Any thoughts?


Regards,
Siyuan


[jira] [Created] (APEXMALHAR-2301) Implement another TimeBucketAssigner to work with any time

2016-10-14 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2301:
--

 Summary: Implement another TimeBucketAssigner to work with any time
 Key: APEXMALHAR-2301
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2301
 Project: Apache Apex Malhar
  Issue Type: Sub-task
Reporter: Siyuan Hua
Assignee: Siyuan Hua


The current TimeBucketAssigner only works for time that is close to current 
system timestamp. But to support arbitrary event time, we need another 
TimeBucketAssigner 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-13 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573091#comment-15573091
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


[~d9liang] The malhar-library should be depends on malhar-stream. The classes 
related to this tickets can be moved from malhar-stream to malhar-library 
including the function interface and function operator. They should not depend 
on anything from malhar-stream

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2283) Refactor kafka output operator

2016-10-12 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569248#comment-15569248
 ] 

Siyuan Hua commented on APEXMALHAR-2283:


There are couple of solutions to "exactly-once". To me they are all different 
and they all have different assumptions. 
First of all, if we assume there is unique message id, we can definitely use 
that for dedup, but that is not always the case, then we need appid and 
operatorid to do dedup.
Then this information can be store in either key or extra topic, I wouldn't say 
either of them is better than the other, really depends on how user's 
requirement.
And no matter what we do, as long as there are number of operators writes to 
same kafka partition, I'm afraid there is no way to do perfect dedup because we 
don't know the safe place to do dedup from or too late to do dedup(too much 
noise from the safe place if other operator instances are fully loaded)
I don't remember hashcode, but I think hashcode will include some false 
positive?

And there is other solution like, create topic and number of partitions 
automatically based on kafka operator instances, in this case, it is much 
easier, we are always use what messages needs to be dedup because one operator 
only write to one kafka partition. This solution, my understanding, is most 
reliable and some user might want it. But the metadata of that kafka topic is 
kind of automatic created and it's very hard to support dynamic partition in 
this case.

That's the reason why I say there is no general solution for exactly-once kafka 
output operator. We may need to provide different solutions in "examples" for 
people to choose from. 

Anyways, Sandesh, can you wrap up the current solution, post and discuss it in 
mailing list?

 

> Refactor kafka output operator
> --
>
> Key: APEXMALHAR-2283
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2283
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Siyuan Hua
>Assignee: Siyuan Hua
>
> The abstract kafka output operator needs to be refactored
> 1. Needs to set some mandatory properties on operator level instead of kafka 
> property level.
> 2. More document and examples
> 3. Find a standard way to achieve exactly once in both 0.8 and 0.9
> More will be added when working on the ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2283) Refactor kafka output operator

2016-10-07 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2283:
---
Assignee: Siyuan Hua

> Refactor kafka output operator
> --
>
> Key: APEXMALHAR-2283
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2283
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>
> The abstract kafka output operator needs to be refactored
> 1. Needs to set some mandatory properties on operator level instead of kafka 
> property level.
> 2. More document.
> 3. Find a standard way to achieve exactly once in both 0.8 and 0.9
> More will be added when working on the ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2283) Refactor kafka output operator

2016-10-07 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2283:
---
Description: 
The abstract kafka output operator needs to be refactored

1. Needs to set some mandatory properties on operator level instead of kafka 
property level.

2. More document and examples

3. Find a standard way to achieve exactly once in both 0.8 and 0.9

More will be added when working on the ticket

  was:
The abstract kafka output operator needs to be refactored

1. Needs to set some mandatory properties on operator level instead of kafka 
property level.

2. More document.

3. Find a standard way to achieve exactly once in both 0.8 and 0.9

More will be added when working on the ticket


> Refactor kafka output operator
> --
>
> Key: APEXMALHAR-2283
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2283
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>
> The abstract kafka output operator needs to be refactored
> 1. Needs to set some mandatory properties on operator level instead of kafka 
> property level.
> 2. More document and examples
> 3. Find a standard way to achieve exactly once in both 0.8 and 0.9
> More will be added when working on the ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Many FileSplitterInputTest.testRecoveryOfPartialFile test failures

2016-10-06 Thread Siyuan Hua
There are many com.datatorrent.lib.io.fs.FileSplitterInputTest.
testRecoveryOfPartialFile

 test  failures in Jenkins CI build. Can anyone take a look? Thanks!


https://builds.apache.org/job/Apex_Malhar_PR/33/
https://builds.apache.org/job/Apex_Malhar_PR/27/
https://builds.apache.org/job/Apex_Malhar_PR/35/


Regards,
Siyuan


[jira] [Commented] (APEXMALHAR-2276) ManagedState: value of a key does not get over-written in the same time bucket

2016-10-04 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546598#comment-15546598
 ] 

Siyuan Hua commented on APEXMALHAR-2276:


Time will be mapped to a time bucket anyways, correct?

> ManagedState: value of a key does not get over-written in the same time bucket
> --
>
> Key: APEXMALHAR-2276
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2276
> Project: Apache Apex Malhar
>  Issue Type: Bug
>    Reporter: Siyuan Hua
>Assignee: Chandni Singh
> Fix For: 3.6.0
>
>
> For example:
> ManagedTimeUnifiedStateImpl mtus;
> mtus.put(1, key1, val1)
> mtus.put(1, key1, val2)
> mtus.get(1, key1).equals(val2) will return false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-03 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543057#comment-15543057
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


Sorry, I mean org.apache.apex.malhar.lib.function would be good

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-10-03 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542972#comment-15542972
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


Maybe call them com.datatorrent.lib.function?

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2275) ManagedTimeUnifiedStateImpl doesn't support time events properly

2016-09-30 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2275:
--

 Summary: ManagedTimeUnifiedStateImpl doesn't support time events 
properly
 Key: APEXMALHAR-2275
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2275
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua


For example:

ManagedTimeUnifiedStateImpl mtus;
mtus.put(1, key1, val1)
mtus.put(1, key1, val2)
mtus.get(1, key1).equals(val2) will return false




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2276) ManagedTimeUnifiedStateImpl doesn't support time events properly

2016-09-30 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2276:
--

 Summary: ManagedTimeUnifiedStateImpl doesn't support time events 
properly
 Key: APEXMALHAR-2276
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2276
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua


For example:

ManagedTimeUnifiedStateImpl mtus;
mtus.put(1, key1, val1)
mtus.put(1, key1, val2)
mtus.get(1, key1).equals(val2) will return false




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2267) Remove the word "Byte" in the spillable data structures because it's implied

2016-09-27 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2267.

Resolution: Fixed

> Remove the word "Byte" in the spillable data structures because it's implied
> 
>
> Key: APEXMALHAR-2267
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2267
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: David Yan
>Assignee: David Yan
> Fix For: 3.6.0
>
>
> Spillable means part of the data is on disk and they need to be serialized, 
> and part of the data is in cache as objects. That already means the 
> serialized form (byte array) of the key needs to agree with the equals method 
> on the key object, and no need to add the word "Byte" in all the DS names. 
> Removing the word "Byte" also will remove the confusion caused by names like 
> "SpillableByteArrayListMultimap".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2130) Scalable windowed storage

2016-09-27 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2130.

Resolution: Fixed

> Scalable windowed storage
> -
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: David Yan
>  Labels: roadmap
> Fix For: 3.6.0
>
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (APEXMALHAR-2130) Scalable windowed storage

2016-09-27 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua reopened APEXMALHAR-2130:


> Scalable windowed storage
> -
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: David Yan
>  Labels: roadmap
> Fix For: 3.6.0
>
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2130) Scalable windowed storage

2016-09-27 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2130:
---
Fix Version/s: 3.6.0

> Scalable windowed storage
> -
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: David Yan
>  Labels: roadmap
> Fix For: 3.6.0
>
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-27 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526808#comment-15526808
 ] 

Siyuan Hua commented on APEXMALHAR-2230:


Another test failure that could be useful to diagnose
https://api.travis-ci.org/jobs/163154951/log.txt?deansi=true

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>    Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2248) Create SpillableSet and SpillableSetMultimap interfaces and implementation

2016-09-22 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2248.

Resolution: Fixed

> Create SpillableSet and SpillableSetMultimap interfaces and implementation
> --
>
> Key: APEXMALHAR-2248
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2248
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: David Yan
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2248) Create SpillableSet and SpillableSetMultimap interfaces and implementation

2016-09-22 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2248:
---
Fix Version/s: 3.6.0

> Create SpillableSet and SpillableSetMultimap interfaces and implementation
> --
>
> Key: APEXMALHAR-2248
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2248
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>Assignee: David Yan
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2244) Optimize WindowedStorage and Spillable data structures for time series

2016-09-22 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513770#comment-15513770
 ] 

Siyuan Hua commented on APEXMALHAR-2244:


Each spillable DS implementation use a SpillableStateStore to store things and 
we can make ManagedTimeUnifiedStateImpl implement the store as well and it can 
take some time extract function to get/calculate time and time buckets from 
each V/KV data.  And the Store can be setup by the WindowedOperator, correct? 

> Optimize WindowedStorage and Spillable data structures for time series
> --
>
> Key: APEXMALHAR-2244
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>Reporter: David Yan
>        Assignee: Siyuan Hua
>
> The spillable data structures currently does not make any assumption about 
> the key that is used in Managed State, and as a result, it uses 
> ManagedStateImpl to interface with Managed State and uses time buckets that 
> are based on the apex window id. But for WindowedStorage used by 
> WindowedOperator, the key to the storage is a window, which is event time 
> based. Using the default ManagedStateImpl would be very inefficient for event 
> time based keys, since it would write data that would belong to the same 
> window to different time buckets.
> On a high level, the below summarizes roughly what needs to be done:
> 1. a way to tell the spillable data structures to use the 
> ManagedTimeUnifiedStateImpl
> 2. a way to tell the spillable data structures how to extract the timestamp 
> from the key. Note that in the case of WindowedOperator, the timestamp should 
> be the end timestamp of the window (beginTimeMillis + durationMillis), not 
> the begin timestamp.
> 3. a way to tell the spillable data structures how to assign the time bucket 
> given that timestamp
> 4. with point 3, the spillable implementations of WindowedStorage will need 
> to take a config parameter that says how much time (in millis) is each time 
> bucket
> 5. only purge a time bucket when all keys that belong to that time bucket are 
> removed and the apex window id of the first window in which the keys are all 
> removed has been committed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2253) Iterator for spillable data structure should throw ConcurrentModificationException during the iteration

2016-09-20 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2253:
---
Assignee: David Yan

> Iterator for spillable data structure should throw 
> ConcurrentModificationException during the iteration
> ---
>
> Key: APEXMALHAR-2253
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2253
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>        Reporter: Siyuan Hua
>Assignee: David Yan
>
> Right now, the iterator leaves the spillable data structure mutable during 
> iteration which violates java collection agreement. We should try to fix this 
> when there are more users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2253) Iterator for spillable data structure should throw ConcurrentModificationException during the iteration

2016-09-20 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2253:
--

 Summary: Iterator for spillable data structure should throw 
ConcurrentModificationException during the iteration
 Key: APEXMALHAR-2253
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2253
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua


Right now, the iterator leaves the spillable data structure mutable during 
iteration which violates java collection agreement. We should try to fix this 
when there are more users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2241) The metadata kafka consumer should also pickup the properties setting on the kafka input operator.

2016-09-16 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496632#comment-15496632
 ] 

Siyuan Hua commented on APEXMALHAR-2241:


Hey [~Venkatesh Kottapalli] Would you like to fix this as you already know the 
root cause. And also a small test case please.

> The metadata kafka consumer should also pickup the properties setting on the 
> kafka input operator. 
> ---
>
> Key: APEXMALHAR-2241
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2241
> Project: Apache Apex Malhar
>  Issue Type: Bug
>        Reporter: Siyuan Hua
>Assignee: Venkatesh Kottapalli
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2241) The metadata kafka consumer should also pickup the properties setting on the kafka input operator.

2016-09-16 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2241:
--

 Summary: The metadata kafka consumer should also pickup the 
properties setting on the kafka input operator. 
 Key: APEXMALHAR-2241
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2241
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2241) The metadata kafka consumer should also pickup the properties setting on the kafka input operator.

2016-09-16 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2241:
---
Assignee: Venkatesh Kottapalli

> The metadata kafka consumer should also pickup the properties setting on the 
> kafka input operator. 
> ---
>
> Key: APEXMALHAR-2241
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2241
> Project: Apache Apex Malhar
>  Issue Type: Bug
>        Reporter: Siyuan Hua
>Assignee: Venkatesh Kottapalli
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-14 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491639#comment-15491639
 ] 

Siyuan Hua commented on APEXMALHAR-2230:


Good news is this time we can be sure this is the Kafka test issue. Because we 
don't see NullPointerException before this assertion error

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>    Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-14 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491628#comment-15491628
 ] 

Siyuan Hua commented on APEXMALHAR-2230:


The code looks impossible to receive extra END_TUPLE but possible lose some 
END_TUPLE. [~brightchen] Could you also take a look since you change the test?

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>    Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXCORE-529) Support annotation to automate lifecycle callbacks like setup() teardown() etc

2016-09-13 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXCORE-529:
---

 Summary: Support annotation to automate lifecycle callbacks like 
setup() teardown() etc 
 Key: APEXCORE-529
 URL: https://issues.apache.org/jira/browse/APEXCORE-529
 Project: Apache Apex Core
  Issue Type: Improvement
Reporter: Siyuan Hua


When we work on Windowed operator, it's very common that operators depend on 
several sub components align with the operator/component's life cycle and have 
to manually call setup and teardown method in operator code which is all 
duplicate code and error-prone. It's nice to have a annotation on the field so 
if it is set, those method will be called automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Apache Apex PMC Member: Chandni Singh

2016-09-12 Thread Siyuan Hua
Congrats Chandni!

Regards,
Siyuan

On Mon, Sep 12, 2016 at 9:33 AM, Thomas Weise  wrote:

> The Apache Apex PMC is pleased to announce that Chandni Singh is now a PMC
> member. We appreciate all her contributions to the project so far, and are
> looking forward to more.
>
> Congrats Chandni!
> Thomas, for the Apache Apex PMC.
>


[jira] [Commented] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-08 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15474990#comment-15474990
 ] 

Siyuan Hua commented on APEXMALHAR-2230:


Not necessarily related to Kafka. This is caused by another exception 
2016-09-07 20:13:14,825 [6/Kafka 
inputtesttopic6.outputPort#unifier:DefaultUnifier] ERROR 
engine.StreamingContainer run - Operator set 
[OperatorDeployInfo.UnifierDeployInfo[id=6,name=Kafka 
inputtesttopic6.outputPort#unifier,type=UNIFIER,checkpoint={57d074d30001, 
0, 
0},inputs=[OperatorDeployInfo.InputDeployInfo[portName=<merge#outputPort>(1.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=1,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=<merge#outputPort>(2.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=2,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=<merge#outputPort>(3.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=3,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=],
 
OperatorDeployInfo.InputDeployInfo[portName=<merge#outputPort>(4.outputPort),streamId=Kafka
 
messagetesttopic6,sourceNodeId=4,sourcePortName=outputPort,locality=,partitionMask=0,partitionKeys=]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=outputPort,streamId=Kafka
 
messagetesttopic6,bufferServer=testing-worker-linux-docker-c25f223a-3424-linux-7
 stopped running due to an exception.
java.lang.NullPointerException
at com.datatorrent.bufferserver.packet.Tuple.getTuple(Tuple.java:54)
at 
com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:309)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:259)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)
2016-09-07 20:13:14,826 [6/Kafka 
inputtesttopic6.outputPort#unifier:DefaultUnifier] INFO  
stram.StramLocalCluster log - container-1 msg: Stopped running due to an 
exception. java.lang.NullPointerException
at com.datatorrent.bufferserver.packet.Tuple.getTuple(Tuple.java:54)
at 
com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:309)
at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:259)
at 
com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1407)This
 is related to some 

Basically on the bufferserver side it already received some unexpected data.  

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>    Reporter: Thomas Weise
>Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (APEXMALHAR-2230) Intermittent test failure in Kafka module

2016-09-08 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua reassigned APEXMALHAR-2230:
--

Assignee: Siyuan Hua

> Intermittent test failure in Kafka module
> -
>
> Key: APEXMALHAR-2230
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2230
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Thomas Weise
>    Assignee: Siyuan Hua
>
> Test fails intermittently in Travis CI. Could be a race condition in the test?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-09-06 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468074#comment-15468074
 ] 

Siyuan Hua commented on APEXMALHAR-2220:


[~d9liang]
The function operators  are all under 
org.apache.apex.malhar.stream.api.operator and the function interface is under 
org.apache.apex.malhar.stream.api.function. 
I think they can be merged into one package.

Thanks!

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-09-06 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2220:
---
Assignee: Dongming Liang  (was: Siyuan Hua)

> Move the FunctionOperator to Malhar library
> ---
>
> Key: APEXMALHAR-2220
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Dongming Liang
>
> FunctionOperator initially is just designed for high-level API and we think 
> it can also useful if people want to build stateless transformation and work 
> with other operator directly. FunctionOperator can be reused. Thus we should 
> move FO to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2220) Move the FunctionOperator to Malhar library

2016-09-01 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2220:
--

 Summary: Move the FunctionOperator to Malhar library
 Key: APEXMALHAR-2220
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2220
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua
Assignee: Siyuan Hua


FunctionOperator initially is just designed for high-level API and we think it 
can also useful if people want to build stateless transformation and work with 
other operator directly. FunctionOperator can be reused. Thus we should move FO 
to malhar library



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2142) High-level API window support

2016-09-01 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2142:
---
Attachment: Java-High-level-API-Siyuan.pdf
Apache Beam AutoComplete Example with Apex High-Level API.pdf

Some write-up about high-level API

> High-level API window support
> -
>
> Key: APEXMALHAR-2142
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2142
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>  Labels: roadmap
> Fix For: 3.5.0
>
> Attachments: Apache Beam AutoComplete Example with Apex High-Level 
> API.pdf, Java-High-level-API-Siyuan.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-1939) Stream API

2016-09-01 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-1939:
---
Attachment: Java-High-level-API-Siyuan.pdf
Apache Beam AutoComplete Example with Apex High-Level API.pdf

Some writeups about high-level API

> Stream API
> --
>
> Key: APEXMALHAR-1939
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-1939
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>    Reporter: Siyuan Hua
>Priority: Critical
>  Labels: roadmap
> Attachments: Apache Beam AutoComplete Example with Apex High-Level 
> API.pdf, Java-High-level-API-Siyuan.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2217) Remove some redundant code in WindowedStorage and WindowedKeyedStorage

2016-08-31 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2217:
--

 Summary: Remove some redundant code in WindowedStorage and 
WindowedKeyedStorage
 Key: APEXMALHAR-2217
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2217
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua
Assignee: David Yan


WindowedStorage has an entrySet() method which return the windowstorage itself 
which is redundant. WindowedKeyedStorage has 2 methods remove and migrateWindow 
can both inherit from parent interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Apex Malhar Release 3.5.0 (RC2)

2016-08-31 Thread Siyuan Hua
+1
File integrity check
Build and license check
Pi-demo for 10min
All above look good

Thanks,
Siyuan

On Tue, Aug 30, 2016 at 11:37 PM, Thomas Weise  wrote:

> Dear Community,
>
> Please vote on the following Apache Apex Malhar 3.5.0 release candidate.
>
> RC2 fixes the copyright related issues with some test data files.
>
> This is a source release with binary artifacts published to Maven.
>
> This release is based on Apex Core 3.4 and comes with 61 resolved issues.
>
> The release advances the high level stream API to support stateful
> transformations with Beam style windowing semantics. The demo package has
> examples for usage of the API. There are also important improvements to
> underlying operator state management components, which are functional first
> cut and will be enhanced in upcoming releases, such as WindowOperator,
> spillable collections and incremental state saving.
>
> The release also adds several new operators.
>
> List of all issues fixed: https://s.apache.org/5vQi
>
> Staging directory:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-malhar-3.5.0-RC2/
> Source zip:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.5.0-RC2/apache-apex-malhar-3.5.0-source-release.zip
> Source tar.gz:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> malhar-3.5.0-RC2/apache-apex-malhar-3.5.0-source-release.tar.gz
> Maven staging repository:
> https://repository.apache.org/content/repositories/orgapacheapex-1017/
>
> Git source:
> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> commit;h=refs/tags/v3.5.0-RC2
>  (commit: f4c975a1ba9e7e4c68cd02924e526e612906b3e7)
>
> PGP key:
> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> KEYS file:
> https://dist.apache.org/repos/dist/release/apex/KEYS
>
> More information at:
> http://apex.apache.org
>
> Please try the release and vote; vote will be open for at least 72 hours.
>
> [ ] +1 approve (and what verification was done)
> [ ] -1 disapprove (and reason why)
>
> http://www.apache.org/foundation/voting.html
>
> How to verify release candidate:
>
> http://apex.apache.org/verification.html
>
> Thanks,
> Thomas
>


Re: [VOTE] Apache Apex Malhar Release 3.5.0 (RC1)

2016-08-29 Thread Siyuan Hua
+1

Checked for
- File integration
- Source code verification
- Check for compilation and license
- Run pi demo for 10 min


On Sun, Aug 28, 2016 at 9:59 PM, Thomas Weise 
wrote:

> Dear Community,
>
> Please vote on the following Apache Apex Malhar 3.5.0 release candidate.
>
> This is a source release with binary artifacts published to Maven.
>
> This release is based on Apex Core 3.4 and comes with 61 resolved issues.
>
> The release advances the high level stream API to support stateful
> transformations with Beam style windowing semantics. The demo package has
> examples for usage of the API. There are also important improvements to
> underlying operator state management components, which are functional first
> cut and will be enhanced in upcoming releases, such as WindowOperator,
> spillable collections and incremental state saving.
>
> The release also adds several new operators.
>
> List of all issues fixed: https://s.apache.org/5vQi
>
> Staging directory (new dist directories don't have access sorted out yet):
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-malhar-3.5.0-RC1/
> Source zip:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-malh
> ar-3.5.0-RC1/apache-apex-malhar-3.5.0-source-release.zip
> Source tar.gz:
> https://dist.apache.org/repos/dist/dev/apex/apache-apex-malh
> ar-3.5.0-RC1/apache-apex-malhar-3.5.0-source-release.tar.gz
> Maven staging repository:
> https://repository.apache.org/content/repositories/orgapacheapex-1016/
>
> Git source:
> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
> commit;h=refs/tags/v3.5.0-RC1
>  (commit: f96f0025f2bc27dff79dd95e9f88d7a43bba6c41)
>
> PGP key:
> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
> KEYS file:
> https://dist.apache.org/repos/dist/release/apex/KEYS
>
> More information at:
> http://apex.apache.org
>
> Please try the release and vote; vote will be open for at least 72 hours.
>
> [ ] +1 approve (and what verification was done)
> [ ] -1 disapprove (and reason why)
>
> http://www.apache.org/foundation/voting.html
>
> How to verify release candidate:
>
> http://apex.apache.org/verification.html
>
> Thanks,
> Thomas
>


[jira] [Updated] (APEXMALHAR-2214) Use tuple internally so it can carry over the window semantics

2016-08-29 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2214:
---
Assignee: Siyuan Hua

> Use tuple internally so it can carry over the window semantics
> --
>
> Key: APEXMALHAR-2214
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2214
> Project: Apache Apex Malhar
>  Issue Type: Bug
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>
> If some windowed operation followed by some unwidowed operation. We need to 
> carry the window semantic over the dag. That's why even the unwindowed 
> operations are on the value of a tuple, we still need to use Tuple type in 
> network. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2214) Use tuple internally so it can carry over the window semantics

2016-08-29 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2214:
--

 Summary: Use tuple internally so it can carry over the window 
semantics
 Key: APEXMALHAR-2214
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2214
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua


If some windowed operation followed by some unwidowed operation. We need to 
carry the window semantic over the dag. That's why even the unwindowed 
operations are on the value of a tuple, we still need to use Tuple type in 
network. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2213) Refactor Stream API to make it easier to specify keys for keyed transformation

2016-08-29 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2213:
--

 Summary: Refactor Stream API to make it easier to specify keys for 
keyed transformation 
 Key: APEXMALHAR-2213
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2213
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua
Assignee: Siyuan Hua


Right now, the keyed transformation enforce you to specify the MapToKeyVal 
interface. It's not flexible if the output of upstream transformation is 
already KeyaluePair or people want to specify an explicit Map before keyed 
operation.
 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2212) Tuple codec to make application build by High-level api more efficient

2016-08-29 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2212:
--

 Summary: Tuple codec to make application build by High-level api 
more efficient
 Key: APEXMALHAR-2212
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2212
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua


We should have a tuple codec to make the application build by high-level api 
much faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXMALHAR-2134) Catch NullPointerException if some Kafka partition has no leader broker

2016-08-26 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua closed APEXMALHAR-2134.
--
Resolution: Fixed

> Catch NullPointerException if some Kafka partition has no leader broker
> ---
>
> Key: APEXMALHAR-2134
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2134
> Project: Apache Apex Malhar
>  Issue Type: Bug
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>  Labels: newbie
> Fix For: 3.5.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Kafka partition could have no leader broker some time and we need to catch 
> exception and skip that partition for the time until new leader is elected
> Here is the exception we see in the stacktrace
> 2016-07-05 14:00:46,087 ERROR kafka.SimpleKafkaConsumer 
> (SimpleKafkaConsumer.java:run(481)) - Exception {}
> java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
> at 
> com.datatorrent.contrib.kafka.SimpleKafkaConsumer$MetaDataMonitorTask.monitorMetadata(SimpleKafkaConsumer.java:511)
> at 
> com.datatorrent.contrib.kafka.SimpleKafkaConsumer$MetaDataMonitorTask.run(SimpleKafkaConsumer.java:477)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2016-07-05 14:01:15,999 ERROR kafka.SimpleKafkaConsumer 
> (SimpleKafkaConsumer.java:run(481)) - Exception {}
> java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
> at 
> com.datatorrent.contrib.kafka.SimpleKafkaConsumer$MetaDataMonitorTask.monitorMetadata(SimpleKafkaConsumer.java:511)
> at 
> com.datatorrent.contrib.kafka.SimpleKafkaConsumer$MetaDataMonitorTask.run(SimpleKafkaConsumer.java:477)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2208) High-level API beam examples

2016-08-26 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2208:
--

 Summary: High-level API beam examples
 Key: APEXMALHAR-2208
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2208
 Project: Apache Apex Malhar
  Issue Type: Task
Affects Versions: 3.5.0
Reporter: Siyuan Hua
Assignee: Shunxin Lu
 Fix For: 3.5.0


A set of beam examples written in Apex High-Level API

The original examples can be found here
https://github.com/apache/incubator-beam/tree/master/examples/java/src/main/java/org/apache/beam/examples

Code is done with other pull request
https://github.com/apache/apex-malhar/pull/339



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXMALHAR-2208) High-level API beam examples

2016-08-26 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua closed APEXMALHAR-2208.
--
Resolution: Fixed

> High-level API beam examples
> 
>
> Key: APEXMALHAR-2208
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2208
> Project: Apache Apex Malhar
>  Issue Type: Task
>Affects Versions: 3.5.0
>        Reporter: Siyuan Hua
>Assignee: Shunxin Lu
> Fix For: 3.5.0
>
>
> A set of beam examples written in Apex High-Level API
> The original examples can be found here
> https://github.com/apache/incubator-beam/tree/master/examples/java/src/main/java/org/apache/beam/examples
> Code is done with other pull request
> https://github.com/apache/apex-malhar/pull/339



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-26 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2199:
---
Affects Version/s: (was: 3.4.0)

> 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant 
> kafka support)
> --
>
> Key: APEXMALHAR-2199
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>
> If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
> with IllegalArgumentException
> The reason is because this zookeeper path will be parsed as node1:2181   and 
> node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-26 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2199:
---
Assignee: Siyuan Hua  (was: Chaitanya)

> 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant 
> kafka support)
> --
>
> Key: APEXMALHAR-2199
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.5.0
>    Reporter: Siyuan Hua
>Assignee: Siyuan Hua
>
> If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
> with IllegalArgumentException
> The reason is because this zookeeper path will be parsed as node1:2181   and 
> node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-08-25 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437102#comment-15437102
 ] 

Siyuan Hua commented on APEXMALHAR-2154:


I don't think so




> Update kafka 0.9 input operator to use new CheckpointNotificationListener
> -
>
> Key: APEXMALHAR-2154
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> CheckpointListener interface has been deprecated. We should upgrade the 
> operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-24 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435402#comment-15435402
 ] 

Siyuan Hua commented on APEXMALHAR-2199:


0.9 operator should not have this problem

> 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant 
> kafka support)
> --
>
> Key: APEXMALHAR-2199
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.5.0
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>
> If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
> with IllegalArgumentException
> The reason is because this zookeeper path will be parsed as node1:2181   and 
> node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-24 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435399#comment-15435399
 ] 

Siyuan Hua commented on APEXMALHAR-2199:


[~chaithu] Can you please take a look? The bug is in parseZookeeperStr method. 
It doesn't handle chroot zookeeper path. And also can you add more debug logs 
in the operator itself? Thanks!

> 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant 
> kafka support)
> --
>
> Key: APEXMALHAR-2199
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.5.0
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>
> If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
> with IllegalArgumentException
> The reason is because this zookeeper path will be parsed as node1:2181   and 
> node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-24 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2199:
---
Affects Version/s: 3.5.0
   3.4.0

> 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant 
> kafka support)
> --
>
> Key: APEXMALHAR-2199
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.5.0
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>
> If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
> with IllegalArgumentException
> The reason is because this zookeeper path will be parsed as node1:2181   and 
> node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2199) 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

2016-08-24 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2199:
--

 Summary: 0.8 kafka input operator doesn't support chroot zookeeper 
path (multitenant kafka support)
 Key: APEXMALHAR-2199
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2199
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua
Assignee: Chaitanya


If you set the zookeeper path to node1:2181,node2:2181/chroot/, it will fail 
with IllegalArgumentException

The reason is because this zookeeper path will be parsed as node1:2181   and 
node2:2181/chroot/   which in node1:2181 the topics are not there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2180) KafkaInput Operator partitions has to be unchanged in case of dynamic scaling of ONE_TO_MANY strategy.

2016-08-23 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2180:
---
Fix Version/s: 3.5.0

> KafkaInput Operator partitions has to be unchanged in case of dynamic scaling 
> of ONE_TO_MANY strategy.
> --
>
> Key: APEXMALHAR-2180
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2180
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> Operator creates a new partition whenever the topic metadata changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2180) KafkaInput Operator partitions has to be unchanged in case of dynamic scaling of ONE_TO_MANY strategy.

2016-08-23 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2180.

Resolution: Fixed

> KafkaInput Operator partitions has to be unchanged in case of dynamic scaling 
> of ONE_TO_MANY strategy.
> --
>
> Key: APEXMALHAR-2180
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2180
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> Operator creates a new partition whenever the topic metadata changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2169) KafkaInputoperator: Remove the stuff related to Partition Based on throughput.

2016-08-23 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2169.

Resolution: Fixed

> KafkaInputoperator: Remove the stuff related to Partition Based on throughput.
> --
>
> Key: APEXMALHAR-2169
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2169
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> Dynamic Partition is not working in case of ONE_TO_MANY partition strategy.
> Affected Operator: AbstractKafkaInputOperator (0.8 version)
> Steps to reproduce: 
> (1) Created a topic with 3 partitions
> (2) Created an application as  KAFKA -> Console with below configuration:
>strategy : one_to_many
>initialPartitionCount: 2
> (3) Launched the above application.
> (4) After some time, re-partition the topic to 5
> Observations:
> (1) Operator re-partitioning is not happened.
> (2) Operator is not emitting the messages.
> (3) Following warning messages in log:
> INFO com.datatorrent.contrib.kafka.AbstractKafkaInputOperator: [ONE_TO_MANY]: 
> Repartition the operator(s) under 9223372036854775807 msgs/s and 
> 9223372036854775807 bytes/s hard limit
> WARN com.datatorrent.stram.plan.physical.PhysicalPlan: Empty partition list 
> after repartition: OperatorMeta{name=Input, 
> operator=com.datatorrent.contrib.kafka.KafkaSinglePortByteArrayInputOperator@1b822fcc,
>  attributes={Attribute{defaultValue=1024, 
> name=com.datatorrent.api.Context.OperatorContext.MEMORY_MB, 
> codec=com.datatorrent.api.StringCodec$Integer2String@4682eba5}=1024}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2169) KafkaInputoperator: Remove the stuff related to Partition Based on throughput.

2016-08-23 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2169:
---
Fix Version/s: 3.5.0

> KafkaInputoperator: Remove the stuff related to Partition Based on throughput.
> --
>
> Key: APEXMALHAR-2169
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2169
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> Dynamic Partition is not working in case of ONE_TO_MANY partition strategy.
> Affected Operator: AbstractKafkaInputOperator (0.8 version)
> Steps to reproduce: 
> (1) Created a topic with 3 partitions
> (2) Created an application as  KAFKA -> Console with below configuration:
>strategy : one_to_many
>initialPartitionCount: 2
> (3) Launched the above application.
> (4) After some time, re-partition the topic to 5
> Observations:
> (1) Operator re-partitioning is not happened.
> (2) Operator is not emitting the messages.
> (3) Following warning messages in log:
> INFO com.datatorrent.contrib.kafka.AbstractKafkaInputOperator: [ONE_TO_MANY]: 
> Repartition the operator(s) under 9223372036854775807 msgs/s and 
> 9223372036854775807 bytes/s hard limit
> WARN com.datatorrent.stram.plan.physical.PhysicalPlan: Empty partition list 
> after repartition: OperatorMeta{name=Input, 
> operator=com.datatorrent.contrib.kafka.KafkaSinglePortByteArrayInputOperator@1b822fcc,
>  attributes={Attribute{defaultValue=1024, 
> name=com.datatorrent.api.Context.OperatorContext.MEMORY_MB, 
> codec=com.datatorrent.api.StringCodec$Integer2String@4682eba5}=1024}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-08-19 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2154:
---
Fix Version/s: 3.5.0

> Update kafka 0.9 input operator to use new CheckpointNotificationListener
> -
>
> Key: APEXMALHAR-2154
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> CheckpointListener interface has been deprecated. We should upgrade the 
> operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-08-19 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua resolved APEXMALHAR-2154.

Resolution: Fixed

> Update kafka 0.9 input operator to use new CheckpointNotificationListener
> -
>
> Key: APEXMALHAR-2154
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
> Fix For: 3.5.0
>
>
> CheckpointListener interface has been deprecated. We should upgrade the 
> operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2168) The setter method for double field is not generated correctly in JdbcPOJOInputOperator.

2016-08-18 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2168:
---
Summary: The setter method for double field is not generated correctly in 
JdbcPOJOInputOperator.  (was: The  in JdbcPOJOInputOperator.)

> The setter method for double field is not generated correctly in 
> JdbcPOJOInputOperator.
> ---
>
> Key: APEXMALHAR-2168
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2168
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shunxin Lu
>Assignee: Shunxin Lu
> Fix For: 3.5.0
>
>
> Bug found in JdbcPOJOInputOperator: handling double values in database will 
> cause the following exception:
> java.lang.ClassCastException: SC cannot be cast to 
> com.datatorrent.lib.util.PojoUtils$SetterDouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2168) The in JdbcPOJOInputOperator.

2016-08-18 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2168:
---
Summary: The  in JdbcPOJOInputOperator.  (was: Bug Found in 
JdbcPOJOInputOperator.)

> The  in JdbcPOJOInputOperator.
> --
>
> Key: APEXMALHAR-2168
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2168
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shunxin Lu
>Assignee: Shunxin Lu
> Fix For: 3.5.0
>
>
> Bug found in JdbcPOJOInputOperator: handling double values in database will 
> cause the following exception:
> java.lang.ClassCastException: SC cannot be cast to 
> com.datatorrent.lib.util.PojoUtils$SetterDouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-08-17 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2154:
---
Fix Version/s: (was: 3.5.0)

> Update kafka 0.9 input operator to use new CheckpointNotificationListener
> -
>
> Key: APEXMALHAR-2154
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>
> CheckpointListener interface has been deprecated. We should upgrade the 
> operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-08-17 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2154:
---
Assignee: Chaitanya  (was: Siyuan Hua)

> Update kafka 0.9 input operator to use new CheckpointNotificationListener
> -
>
> Key: APEXMALHAR-2154
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: Chaitanya
>
> CheckpointListener interface has been deprecated. We should upgrade the 
> operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2186) Configurable start offset for kafka input operator (0.9)

2016-08-11 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2186:
--

 Summary: Configurable start offset for kafka input operator (0.9)
 Key: APEXMALHAR-2186
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2186
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua
Assignee: Siyuan Hua


There are users ask for 2 configurable start offset for kafka input operator
1. The configurable topic name for consumer offsets(default is application name)
2. Being able to specify offset manual 
It would be good for debugging or running same application in parallel.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2186) Configurable start offset for kafka input operator (0.9)

2016-08-11 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2186:
---
Issue Type: Improvement  (was: Bug)

> Configurable start offset for kafka input operator (0.9)
> 
>
> Key: APEXMALHAR-2186
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2186
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>
> There are users ask for 2 configurable start offset for kafka input operator
> 1. The configurable topic name for consumer offsets(default is application 
> name)
> 2. Being able to specify offset manual 
> It would be good for debugging or running same application in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2168) Bug Found in JdbcPOJOInputOperator.

2016-08-10 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2168:
---
Fix Version/s: 3.5.0

> Bug Found in JdbcPOJOInputOperator.
> ---
>
> Key: APEXMALHAR-2168
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2168
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shunxin Lu
>Assignee: Shunxin Lu
> Fix For: 3.5.0
>
>
> Bug found in JdbcPOJOInputOperator: handling double values in database will 
> cause the following exception:
> java.lang.ClassCastException: SC cannot be cast to 
> com.datatorrent.lib.util.PojoUtils$SetterDouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2028) Add System.err to ConsoleOutputOperator

2016-08-10 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415943#comment-15415943
 ] 

Siyuan Hua commented on APEXMALHAR-2028:


For high-level API as well

> Add System.err to ConsoleOutputOperator 
> 
>
> Key: APEXMALHAR-2028
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2028
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2169) KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY partition strategy

2016-08-08 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411388#comment-15411388
 ] 

Siyuan Hua commented on APEXMALHAR-2169:


[~chaithu]  
Then I think the problem is in softConstraint and hardConstraint code, it 
should never return true because default limit is Long.MAX_VALUE.  

There is something in backlog that I didn't track in Jira(my bad). But since 
you have issue here, can you please do some refactor here. 
We want to actually simplify the operator code instead of making it more and 
more complicate. And kafka input operator is there for awhile and I don't see 
any requirement/asking for dynamic partition based on throughput.
Can we take away the hardConstraint and softConstraint condition check and make 
the 2 upperbound property deprecated.  So dynamic partition by default should 
only happen when kafka partition changes. 
And for ONE_TO_MANY partition strategy, the number of operator partitions 
should stay unchanged for the whole application with the specified 
initialPartitionCount. I think there is still bug there that if new kafka 
partition is added, we always start a new partition. That is not correct.

And you can create another ticket to move all repartition based on throughput 
to a separate Partitioner so the operator code would be simple and easy to 
understand/debug



> KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY 
> partition strategy
> --
>
> Key: APEXMALHAR-2169
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2169
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Dynamic Partition is not working in case of ONE_TO_MANY partition strategy.
> Affected Operator: AbstractKafkaInputOperator (0.8 version)
> Steps to reproduce: 
> (1) Created a topic with 3 partitions
> (2) Created an application as  KAFKA -> Console with below configuration:
>strategy : one_to_many
>initialPartitionCount: 2
> (3) Launched the above application.
> (4) After some time, re-partition the topic to 5
> Observations:
> (1) Operator re-partitioning is not happened.
> (2) Operator is not emitting the messages.
> (3) Following warning messages in log:
> INFO com.datatorrent.contrib.kafka.AbstractKafkaInputOperator: [ONE_TO_MANY]: 
> Repartition the operator(s) under 9223372036854775807 msgs/s and 
> 9223372036854775807 bytes/s hard limit
> WARN com.datatorrent.stram.plan.physical.PhysicalPlan: Empty partition list 
> after repartition: OperatorMeta{name=Input, 
> operator=com.datatorrent.contrib.kafka.KafkaSinglePortByteArrayInputOperator@1b822fcc,
>  attributes={Attribute{defaultValue=1024, 
> name=com.datatorrent.api.Context.OperatorContext.MEMORY_MB, 
> codec=com.datatorrent.api.StringCodec$Integer2String@4682eba5}=1024}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2169) KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY partition strategy

2016-08-07 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1546#comment-1546
 ] 

Siyuan Hua commented on APEXMALHAR-2169:


[~chaithu]
I'm still not convinced. In your setup both msgRateUpperBound and 
byteRateUpperBound are unlimited. The isPartitionRequired method should not 
return true based on throughput. If isPartitionRequired return true because of 
the new kafka partition, then it will go to line 599 in definePartition method. 
 

> KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY 
> partition strategy
> --
>
> Key: APEXMALHAR-2169
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2169
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Dynamic Partition is not working in case of ONE_TO_MANY partition strategy.
> Affected Operator: AbstractKafkaInputOperator (0.8 version)
> Steps to reproduce: 
> (1) Created a topic with 3 partitions
> (2) Created an application as  KAFKA -> Console with below configuration:
>strategy : one_to_many
>initialPartitionCount: 2
> (3) Launched the above application.
> (4) After some time, re-partition the topic to 5
> Observations:
> (1) Operator re-partitioning is not happened.
> (2) Operator is not emitting the messages.
> (3) Following warning messages in log:
> INFO com.datatorrent.contrib.kafka.AbstractKafkaInputOperator: [ONE_TO_MANY]: 
> Repartition the operator(s) under 9223372036854775807 msgs/s and 
> 9223372036854775807 bytes/s hard limit
> WARN com.datatorrent.stram.plan.physical.PhysicalPlan: Empty partition list 
> after repartition: OperatorMeta{name=Input, 
> operator=com.datatorrent.contrib.kafka.KafkaSinglePortByteArrayInputOperator@1b822fcc,
>  attributes={Attribute{defaultValue=1024, 
> name=com.datatorrent.api.Context.OperatorContext.MEMORY_MB, 
> codec=com.datatorrent.api.StringCodec$Integer2String@4682eba5}=1024}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2169) KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY partition strategy

2016-07-27 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396353#comment-15396353
 ] 

Siyuan Hua commented on APEXMALHAR-2169:


[~chaithu] 

Can you elaborate more on this ticket? Is it dynamic partition based on 
throughput or metadata change or both?
If you want dynamic partition happen because more kafka partitions are added, I 
think the code should jump into line 599 where newWaitingPartition should be 
non-empty, right?



> KafkaInputOperator: Dynamic partition is not working in case of ONE_TO_MANY 
> partition strategy
> --
>
> Key: APEXMALHAR-2169
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2169
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Dynamic Partition is not working in case of ONE_TO_MANY partition strategy.
> Affected Operator: AbstractKafkaInputOperator (0.8 version)
> Steps to reproduce: 
> (1) Created a topic with 3 partitions
> (2) Created an application as  KAFKA -> Console with below configuration:
>strategy : one_to_many
>initialPartitionCount: 2
> (3) Launched the above application.
> (4) After some time, re-partition the topic to 5
> Observations:
> (1) Operator re-partitioning is not happened.
> (2) Operator is not emitting the messages.
> (3) Following warning messages in log:
> INFO com.datatorrent.contrib.kafka.AbstractKafkaInputOperator: [ONE_TO_MANY]: 
> Repartition the operator(s) under 9223372036854775807 msgs/s and 
> 9223372036854775807 bytes/s hard limit
> WARN com.datatorrent.stram.plan.physical.PhysicalPlan: Empty partition list 
> after repartition: OperatorMeta{name=Input, 
> operator=com.datatorrent.contrib.kafka.KafkaSinglePortByteArrayInputOperator@1b822fcc,
>  attributes={Attribute{defaultValue=1024, 
> name=com.datatorrent.api.Context.OperatorContext.MEMORY_MB, 
> codec=com.datatorrent.api.StringCodec$Integer2String@4682eba5}=1024}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2168) Bug Found in JdbcPOJOInputOperator.

2016-07-26 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394635#comment-15394635
 ] 

Siyuan Hua commented on APEXMALHAR-2168:


Hey [~shubhamp],

[~lushun...@gmail.com] found a bug in JdbcPOJOInputOperator for getting data 
from double-type columns. 
I saw you worked on this lately. Would you please help review his code? Thanks!

> Bug Found in JdbcPOJOInputOperator.
> ---
>
> Key: APEXMALHAR-2168
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2168
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shunxin Lu
>Assignee: Shunxin Lu
>
> Bug found in JdbcPOJOInputOperator: handling double values in database will 
> cause the following exception:
> java.lang.ClassCastException: SC cannot be cast to 
> com.datatorrent.lib.util.PojoUtils$SetterDouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2165) Implement a useful default Partitioner and/or Unifier for Keyed Windowed Operator

2016-07-21 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2165:
--

 Summary: Implement a useful default Partitioner and/or Unifier for 
Keyed Windowed Operator
 Key: APEXMALHAR-2165
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2165
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua


Based on first discussion, we can have some default partitioner to partition 
KeyedWindowedOperator into different instance and the possible Unifier to unify 
the result together.

For a non-keyed windowed operator, on the high-level API level, we can 
translate transform to 2 operators one KeyedWindowedOperator with a specified 
PartitionKey(not logically partition key) connect with another WindowedOperator 
which do final accumulation. This will be tracked in another ticket



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Using DSL api to construct sql queries

2016-07-21 Thread Siyuan Hua
But is it a duplication of integration with Calcite?

On Thu, Jul 21, 2016 at 9:26 AM, Timothy Farkas <
timothytiborfar...@gmail.com> wrote:

> I see, cool :)
>
> On Thu, Jul 21, 2016 at 9:21 AM, Priyanka Gugale 
> wrote:
>
> > Hi Tim,
> >
> > We are not creating our own DSL, the jooq is just another query
> > parser/builder like JsqlParser. I am trying to use one of these query DSL
> > libraries to replace the existing code in operator which is written to
> > construct the queries.
> >
> > -Priyanka
> >
> > On Thu, Jul 21, 2016 at 9:42 PM, Timothy Farkas <
> > timothytiborfar...@gmail.com> wrote:
> >
> > > I don't know the exact context here so please forgive me if I'm
> > mistaken. I
> > > don't think creating our own DSL is the way to go. Creating a generic
> DSL
> > > is hard. We should support setting the flavor of SQL being used as a
> > > property and then allow standard sql to be specified. There are already
> > > mature Apache License SQL parsers which support many different SQL
> > > implementations.
> > >
> > > https://github.com/JSQLParser/JSqlParser
> > >
> > > Thanks,
> > > Tim
> > >
> > > On Thu, Jul 21, 2016 at 2:19 AM, Priyanka Gugale 
> > > wrote:
> > >
> > > > Looking closely at licensing, it says it *depends but doesn't bundle*
> > > those
> > > > non ASL license dependencies. As per my understanding those will be
> > > > included only if we explicitly include them using our application
> pom.
> > > > Right away we are not using any of those features which depend of
> such
> > > > third party licenses.
> > > >
> > > > Anyone have any suggestion over including this library?
> > > >
> > > > Dev,
> > > > Yes querydsl is an option, but jooq seems more promising. If there we
> > see
> > > > license is a problem then may be we can go to querydsl.
> > > >
> > > > -Priyanka
> > > >
> > > >
> > > >
> > > > On Wed, Jul 20, 2016 at 8:58 PM, Devendra Tagare <
> > > > devend...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > +1 for using DSL constructs that are vendor agnostic.
> > > > >
> > > > > Checkout https://github.com/querydsl/querydsl (Apache licensed) as
> > > well
> > > > > in-case it fits better in terms of implementation.
> > > > >
> > > > > Also, once the DSL work is done, please test and document the
> > behavior
> > > > > (exactly once, at-least once ..)the operator has with different
> > > > databases.
> > > > >
> > > > > Thanks,
> > > > > Dev
> > > > >
> > > > > On Wed, Jul 20, 2016 at 4:04 AM, Bhupesh Chawda <
> bhup...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > It is a good idea to get rid of vendor specific implementation
> > > > > differences
> > > > > > for SQL.
> > > > > >
> > > > > > However, the licensing does not seem to be straightforward.
> Please
> > > > check:
> > > > > > http://www.jooq.org/legal/licensing. Can this be used as a
> > > dependency
> > > > in
> > > > > > Apex?
> > > > > >
> > > > > > ~ Bhupesh
> > > > > >
> > > > > > On Wed, Jul 20, 2016 at 3:06 AM, Priyanka Gugale <
> > pri...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Malhar JDBC operator does lots of string manipulation and other
> > > > > handling
> > > > > > to
> > > > > > > construct sql queries as per user inputs. Instead of
> constructing
> > > > > queries
> > > > > > > on our own we should use some dsl api library which will let us
> > > write
> > > > > DB
> > > > > > > agnostic code and take care of all other complexities.
> > > > > > >
> > > > > > > I am trying out JOOQ library to write sql query in
> > > > > > > AbstractJdbcPollInputOperator. Would like to hear about
> > communities
> > > > > > > feedback and suggestions.
> > > > > > >
> > > > > > > -Priyanka
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Bleeding edge branch ?

2016-07-20 Thread Siyuan Hua
Ok, whether branches or forks. I still think we should have at least some
materialized version of malhar/core for the big influencer like java,
hadoop or even kafka. Java 8, for example, is actually not new.  We don't
have to be aggressive to try out new features from those right now. But we
can at least have some CI run build/test periodically and make sure our
current code is future-prove and avoid some future-deprecated code when we
add new features. Also if people ask for it, we can have a link to point
them to.  BTW, High-level API can definitely benefit from java 8.  :)

Regards,
Siyuan

On Wed, Jul 20, 2016 at 8:30 AM, Sandesh Hegde 
wrote:

> Our current model of supporting the oldest supported Hadoop, penalizes the
> users of latest Hadoop versions by favoring the slow movers.
> Also, we won't benefit from the increased maturity of the Hadoop platform,
> as we will be working on the many years old version of Hadoop.
> We also need to incentivize our customers to upgrade their Hadoop version,
> by making use of new features.
>
> My vote goes to start the work on the Hadoop 2.6 ( or any other version )
> in a different branch, without waiting for the EOL policies.
>
> On Tue, Jul 12, 2016 at 1:16 AM Thomas Weise 
> wrote:
>
> > -0
> >
> > I read the thread twice, it is not clear to me what benefit Apex users
> > derive from this exercise. A branch normally contains development work
> that
> > is eventually brought back to the main line and into a release. Here, the
> > suggestion seems to be an open ended effort to play with latest tech,
> isn't
> > that something anyone (including a group of folks) can do in a fork. I
> > don't see value in a permanent branch for that, who is going to maintain
> > such code and who will ever use it?
> >
> > There was a point that we can find out about potential problems with
> later
> > versions. The way to find such issues is to take the releases and run
> them
> > on these later versions (that's what users do), not by changing the code!
> >
> > Regarding Java version: Our users don't use Apex in a vacuum. Please
> have a
> > look at ASF Hadoop and the distros EOL policies. That will answer the
> > question what Java version is appropriate. I would be surprised if
> > something that works on Java 7 falls flat on the face with Java 8 as a
> lot
> > of diligence goes into backward compatibility. Again the way to tests
> this
> > is to run verification with existing Apex releases on Java 8 based stack.
> >
> > Regarding Hadoop version: This has been discussed off record several
> times
> > and there are actual JIRA tickets marked accordingly so that the work is
> > done when we move. It is a separate discussion, no need to mix Java
> > versions and branching with it. I agree with what David said, if someone
> > can show that we can move up to 2.6 based on EOL policies and what known
> > Apex users have in production, then we should work on that upgrade. The
> way
> > I imagine it would work is that we have a Hadoop-2.6 (or whatever
> version)
> > branch, make all the upgrade related changes there (which should be a
> list
> > of JIRAs) and then merge it back to master when we are satisfied. After
> > that, the branch can be deleted.
> >
> > Thomas
> >
> >
> >
> > On Tue, Jul 12, 2016 at 8:36 AM, Chinmay Kolhatkar <
> > chin...@datatorrent.com>
> > wrote:
> >
> > > I'm -0 on this idea.
> > >
> > > Here is the reason:
> > > Unless we see a real case where users want to see everything on latest,
> > > this branch might quickly become low hanging fruit and eventually get
> > > obsolete because its anyway a "no gaurantee" branch.
> > >
> > > We have a bunch of dependencies which we'll have to take care of to
> > really
> > > make it bleeding edge. Specially about malhar, its a long list. That
> > looks
> > > like quite significant work.
> > > Moreover, if this branch is going to be in "may or may not work" state;
> > I,
> > > as a user or developer, would bank on what certainly works.
> > >
> > > I also think that, if its going to be "no gaurantee" then its worth
> > > spending time contributions towards master rather than bleeding-edge
> > > branch.
> > >
> > > If a question of "should we upgrade?" comes, the community is mature to
> > > take that call then and work accordingly.
> > >
> > > -Chinmay.
> > >
> > >
> > >
> > > On Tue, Jul 12, 2016 at 11:42 AM, Priyanka Gugale 
> > > wrote:
> > >
> > > > +1 for creating such branch.
> > > > One of us will have to rebase it with master branch at intervals. I
> > don't
> > > > think everyone will cherry-pick their commits here. We can make it
> once
> > > in
> > > > a month activity. Are we considering updating all dependency library
> > > > version as well?
> > > >
> > > > -Priyanka
> > > >
> > > > On Tue, Jul 12, 2016 at 2:34 AM, Munagala Ramanath <
> > r...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Following up on some comments, wanted to clarify 

[jira] [Updated] (APEXMALHAR-2135) Upgrade Kafka 0.8 input operator to support 0.8.2 client

2016-07-19 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2135:
---
Fix Version/s: 3.5.0

> Upgrade Kafka 0.8 input operator to support 0.8.2 client
> 
>
> Key: APEXMALHAR-2135
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2135
> Project: Apache Apex Malhar
>  Issue Type: Bug
>    Reporter: Siyuan Hua
>Assignee: Thomas Weise
> Fix For: 3.5.0
>
>
> Right now, if you are using 0.8.2 client with 0.8 kafka inputoperator.
> You will get exception:
> *java.lang.NoSuchMethodError:
> > kafka.cluster.Broker.getConnectionString()Ljava/lang/String;*
> >
> >
> >
> > *at
> > com.datatorrent.contrib.kafka.KafkaMetadataUtil.getBrokers(KafkaMetadataUtil.java:114)*
> >
> > *at
> > com.datatorrent.contrib.kafka.KafkaConsumer.initBrokers(KafkaConsumer.java:131)*
> >
> > *at
> > com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.definePartitions(AbstractKafkaInputOperator.java:488)*
> >
> > 
> We should support both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


The first cut of high-level API for windowed transform is ready, please review

2016-07-15 Thread Siyuan Hua
The first cut of high-level API for windowed transform is ready, please
review.

https://github.com/apache/apex-malhar/pull/339

Thanks!


Regards,
Siyuan


[jira] [Created] (APEXCORE-492) The exception from ExitCondition should not be swallowed

2016-07-14 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXCORE-492:
---

 Summary: The exception from ExitCondition should not be swallowed
 Key: APEXCORE-492
 URL: https://issues.apache.org/jira/browse/APEXCORE-492
 Project: Apache Apex Core
  Issue Type: New Feature
Reporter: Siyuan Hua


When you run application in local mode if there is some exception from the  
ExitCondition call method, it will be swallowed silently which is not good for 
debugging



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2154) Update kafka 0.9 input operator to use new CheckpointNotificationListener

2016-07-14 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2154:
--

 Summary: Update kafka 0.9 input operator to use new 
CheckpointNotificationListener
 Key: APEXMALHAR-2154
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2154
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua
Assignee: Siyuan Hua
 Fix For: 3.5.0


CheckpointListener interface has been deprecated. We should upgrade the 
operator to use the new one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2155) Make high-level api support multiple ports

2016-07-14 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2155:
--

 Summary: Make high-level api support multiple ports
 Key: APEXMALHAR-2155
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2155
 Project: Apache Apex Malhar
  Issue Type: New Feature
Reporter: Siyuan Hua
Assignee: Siyuan Hua


Right now there is no easy way to support multiple ports connection in between 
operators in high-level API. We need to support that for watermark ports for 
now and for other possible use cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2149) ApexStream.filter() not working properly.

2016-07-13 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua updated APEXMALHAR-2149:
---
Assignee: Shunxin Lu

> ApexStream.filter() not working properly.
> -
>
> Key: APEXMALHAR-2149
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2149
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shunxin Lu
>Assignee: Shunxin Lu
>
> By adding filter to the stream, all tuples will be filter out no matter how 
> the f() in FilterFunction is overridden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2148) Reduce the noise of kafka input operator

2016-07-13 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2148:
--

 Summary: Reduce the noise of kafka input operator
 Key: APEXMALHAR-2148
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2148
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua
Assignee: Siyuan Hua
 Fix For: 3.5.0




Acceptance Criteria:
Reduce logging level

—

Even when there are no tuples flowing, the Kafka input operator generates a lot 
of noise in the logs as shown below; at the lease the config values should be 
switched to DEBUG level.
-
2016-04-23 14:16:49,179 INFO org.apache.kafka.common.utils.AppInfoParser: Kafka 
version : 0.9.0.1
2016-04-23 14:16:49,179 INFO org.apache.kafka.common.utils.AppInfoParser: Kafka 
commitId : 23c69d62a0cabf06
2016-04-23 14:16:49,296 INFO com.datatorrent.stram.FSRecoveryHandler: Creating 
hdfs://localhost:9000/user/dtadmin/datatorrent/apps/application_1461443973082_0002/recovery/log016-04-23
 14:17:19,460 INFO org.apache.kafka.clients.consumer.ConsumerConfig: 
ConsumerConfig values:
request.timeout.ms = 4
check.crcs = true
retry.backoff.ms = 100
ssl.truststore.password = null
ssl.keymanager.algorithm = SunX509
receive.buffer.bytes = 32768
ssl.cipher.suites = null
ssl.key.password = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.provider = null
sasl.kerberos.service.name = null
session.timeout.ms = 3
sasl.kerberos.ticket.renew.window.factor = 0.8
bootstrap.servers = [localhost:9092]
client.id =
fetch.max.wait.ms = 500
fetch.min.bytes = 1
key.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer
sasl.kerberos.kinit.cmd = /usr/bin/kinit
auto.offset.reset = latest
value.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
partition.assignment.strategy = 
[org.apache.kafka.clients.consumer.RangeAssignor]
ssl.endpoint.identification.algorithm = null
max.partition.fetch.bytes = 1048576
ssl.keystore.location = null
ssl.truststore.location = null
ssl.keystore.password = null
metrics.sample.window.ms = 3
metadata.max.age.ms = 30
security.protocol = PLAINTEXT
auto.commit.interval.ms = 5000
ssl.protocol = TLS
sasl.kerberos.min.time.before.relogin = 6
connections.max.idle.ms = 54
ssl.trustmanager.algorithm = PKIX
group.id = org.apache.apex.malhar.kafka.AbstractKafkaInputOperatorMETA_GROUP
enable.auto.commit = false
metric.reporters = []
ssl.truststore.type = JKS
send.buffer.bytes = 131072
reconnect.backoff.ms = 50
metrics.num.samples = 2
ssl.keystore.type = JKS
heartbeat.interval.ms = 3000




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2142) High-level API window support

2016-07-12 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373783#comment-15373783
 ] 

Siyuan Hua commented on APEXMALHAR-2142:


https://github.com/apache/apex-malhar/pull/339

> High-level API window support
> -
>
> Key: APEXMALHAR-2142
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2142
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2142) High-level API window support

2016-07-12 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373760#comment-15373760
 ] 

Siyuan Hua commented on APEXMALHAR-2142:


Basically use windowed operator to do windowed transformations.


> High-level API window support
> -
>
> Key: APEXMALHAR-2142
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2142
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>    Reporter: Siyuan Hua
>        Assignee: Siyuan Hua
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2142) High-level API window support

2016-07-12 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2142:
--

 Summary: High-level API window support
 Key: APEXMALHAR-2142
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2142
 Project: Apache Apex Malhar
  Issue Type: New Feature
Reporter: Siyuan Hua
Assignee: Siyuan Hua






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2141) An efficient Accumulation interface for primitive type

2016-07-12 Thread Siyuan Hua (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373694#comment-15373694
 ] 

Siyuan Hua commented on APEXMALHAR-2141:


A stateless accumulation vs stateful accumulation.

> An efficient Accumulation interface for primitive type
> --
>
> Key: APEXMALHAR-2141
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2141
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>    Reporter: Siyuan Hua
>Assignee: David Yan
>
> It is possible that we can refine/add new interface for Accumulation for 
> primitive types



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2141) An efficient Accumulation interface for primitive type

2016-07-12 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2141:
--

 Summary: An efficient Accumulation interface for primitive type
 Key: APEXMALHAR-2141
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2141
 Project: Apache Apex Malhar
  Issue Type: Improvement
Reporter: Siyuan Hua
Assignee: David Yan


It is possible that we can refine/add new interface for Accumulation for 
primitive types



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Bleeding edge branch ?

2016-07-11 Thread Siyuan Hua
+1

On Mon, Jul 11, 2016 at 8:59 AM, Munagala Ramanath 
wrote:

> We've had a number of issues recently related to dependencies on old
> versions
> of various packages/libraries such as Hadoop itself, Google guava,
> HTTPClient,
> mbassador, etc.
>
> How about we create a "bleeding-edge" branch in both Core and Malhar which
> will use the latest versions of these various dependencies, upgrade to Java
> 8 so
> we can use the new Java features, etc. ?
>
> This will give us an opportunity to discover these sorts of problems early
> and,
> when we are ready to pull the trigger for a major version, we have a branch
> ready
> for merge with, hopefully, minimal additional effort.
>
> There will be no guarantees w.r.t. this branch so people using it use it at
> their own
> risk.
>
> Ram
>


Re: Windowed Operator PR

2016-07-08 Thread Siyuan Hua
+1

We should merge this ASAP.
I don't think we could solve all the problems in one PR and I think David's
PR is good enough that we can keep working on this incrementally and in
parallel.

Regards,
Siyuan

On Fri, Jul 8, 2016 at 4:10 PM, David Yan  wrote:

> Hi all,
>
> The Windowed Operator PR is ready to be merged. Thank you very much for all
> your feedback so far.
>
> https://github.com/apache/apex-malhar/pull/319
>
> Merging this PR will make projects related to the WindowedOperator go on
> more easily, which includes High level API, Apache Calcite support, Apex
> runner in Beam and Dedup operator. Please speak up now If you think there
> are reasons for not merging it.
>
> Also please note that all the classes and interfaces are marked "Evolving"
> so we can always change them later.
>
> Thanks,
>
> David
>


[jira] [Created] (APEXMALHAR-2135) Upgrade Kafka 0.8 input operator to support both 0.8.1 and 0.8.2

2016-07-06 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2135:
--

 Summary: Upgrade Kafka 0.8 input operator to support both 0.8.1 
and 0.8.2
 Key: APEXMALHAR-2135
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2135
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Siyuan Hua


Right now, if you are using 0.8.2 client with 0.8 kafka inputoperator.
You will get exception:

*java.lang.NoSuchMethodError:
> kafka.cluster.Broker.getConnectionString()Ljava/lang/String;*
>
>
>
> *at
> com.datatorrent.contrib.kafka.KafkaMetadataUtil.getBrokers(KafkaMetadataUtil.java:114)*
>
> *at
> com.datatorrent.contrib.kafka.KafkaConsumer.initBrokers(KafkaConsumer.java:131)*
>
> *at
> com.datatorrent.contrib.kafka.AbstractKafkaInputOperator.definePartitions(AbstractKafkaInputOperator.java:488)*
>
> 

We should support both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Suggestions for WindowedStream interface in the next phase

2016-07-01 Thread Siyuan Hua
Hi Community,

Right now, the WindowedStream in the pull request has following "window"
transformations:

count
countByKey
top
topByKey
combine
combineByKey
fold
foldByKey
group
groupByKey


Do you have other suggestions that are nice to have in the WindowedStream
interface.

Also I think it might be a good idea to further divide WindowedStream to
KeyedWindowedStream so developer can define some custom function to convert
normal tuple to KeyValuePair that is required by xxxByKey methods. Any
ideas on that?


Regards,
Siyuan


[jira] [Created] (APEXMALHAR-2132) Rename Manifest attributes from DT-App-xxxx to Apex-App-xxxx

2016-07-01 Thread Siyuan Hua (JIRA)
Siyuan Hua created APEXMALHAR-2132:
--

 Summary: Rename Manifest attributes from DT-App- to 
Apex-App-
 Key: APEXMALHAR-2132
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2132
 Project: Apache Apex Malhar
  Issue Type: Task
Affects Versions: 4.0.0
Reporter: Siyuan Hua
 Fix For: 4.0.0


This ticket is to track the same work as 
https://issues.apache.org/jira/browse/APEXCORE-481  that needs to be done for 
malhar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXMALHAR-2131) Change the default container log name from dt.log to apex.log

2016-07-01 Thread Siyuan Hua (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Hua closed APEXMALHAR-2131.
--
Resolution: Won't Fix

Sorry wrong project name

> Change the default container log name from dt.log to apex.log
> -
>
> Key: APEXMALHAR-2131
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2131
> Project: Apache Apex Malhar
>  Issue Type: Task
>  Components: build
>        Reporter: Siyuan Hua
> Fix For: 4.0.0
>
>
> Rename dt.log to apex.loig



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >