[GitHub] apex-malhar pull request #545: APEXMALHAR-2376 Add Common Log support in Log...

2017-01-30 Thread akshay-harale
GitHub user akshay-harale opened a pull request:

https://github.com/apache/apex-malhar/pull/545

APEXMALHAR-2376 Add Common Log support in LogParser operator

https://issues.apache.org/jira/browse/APEXMALHAR-2376


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshay-harale/apex-malhar 
APEXMALHAR-2376-COMMON_LOG

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #545


commit e5a6fd35ded1560755dbdf2c8363ea4629458c62
Author: akshay 
Date:   2017-01-31T07:06:38Z

APEXMALHAR-2376 Add Common Log support in LogParser operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Schema Discovery Support in Apex Applications

2017-01-30 Thread Chinmay Kolhatkar
Consumer of output port operator schema is going next downstream operator.


On Tue, Jan 31, 2017 at 4:01 AM, Sergey Golovko 
wrote:

> Sorry, I’m a new person in the APEX team. And I don't understand clearly
> who are consumers of the output port operator schema(s).
>
> 1. If the consumers are non-run-time callers like the application manager
> or UI designer, maybe it makes sense to use Java static method(s) to
> retrieve the output port operator schema(s). I guess the performance of a
> single call of a static method via reflection can be ignored.
>
> 2. If the consumer is next downstream operator, maybe it makes sense to
> send an output port operator schema from upstream operator to next
> downstream operator via the stream. The corresponded methods that would
> send and receive the schema should be declared in the
> interface/abstract-class of the upstream and downstream operators. The
> sending/receiving of an output schema should be processed right before the
> sending of the first data record via the stream.
>
> One of examples of a typical implementation for sending of metadata with a
> regular result set is the sending of JDBC metadata as a part of JDBC result
> set. And I hope the output schema (metadata of the streamed data) in the
> implementation should contain not only a signature of the streamed objects
> (like field names and data types), but also any other properties of the
> data that can be useful by the schema receiver to process the data (for
> instance, a delimiter for CSV record stream).
>
> Thanks,
> Sergey
>
> On 2017-01-25 01:47 (-0800), Chinmay Kolhatkar 
> wrote:
> > Thank you all for the feedback.
> >
> > I've created a Jira for this: APEXCORE-623 and I'll attach the same
> > document and link to this mailchain there.
> >
> > As a first part of this Jira, there are 2 steps I would like to propose:
> > 1. Add following interface at com.datatorrent.common.util.SchemaAware.
> >
> > interface SchemaAware {
> >
> > Map registerSchema(Map
> inputSchema);
> > }
> >
> > This interface can be implemented by Operators to communicate its output
> > schema(s) to engine.
> > Input to this schema will be schema at its input port.
> >
> > 2. After LogicalPlan is created call SchemaAware method from upstream to
> > downstream operator in the DAG to propagate the Schema.
> >
> > Once this is done, changes can be done in Malhar for the operators in
> > question.
> >
> > Please share your opinion on this approach.
> >
> > Thanks,
> > Chinmay.
> >
> >
> >
> >
> > On Wed, Jan 18, 2017 at 2:31 PM, Priyanka Gugale 
> wrote:
> >
> > > +1 to have this feature.
> > >
> > > -Priyanka
> > >
> > > On Tue, Jan 17, 2017 at 9:18 PM, Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Mon, Jan 16, 2017 at 1:23 AM, Chinmay Kolhatkar <
> chin...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > Currently a DAG that is generated by user, if contains any POJOfied
> > > > > operators, TUPLE_CLASS attribute needs to be set on each and every
> port
> > > > > which receives or sends a POJO.
> > > > >
> > > > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup ->
> > > > > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by
> user
> > > on
> > > > > both input and output ports of transform, dedup operators and also
> on
> > > > > parser output and formatter input.
> > > > >
> > > > > The proposal here is to reduce work that is required by user to
> > > configure
> > > > > the DAG. Technically speaking if an operators knows input schema
> and
> > > > > processing properties, it can determine output schema and convey
> it to
> > > > > downstream operators. This way the complete pipeline can be
> configured
> > > > > without user setting TUPLE_CLASS or even creating POJOs and adding
> them
> > > > to
> > > > > classpath.
> > > > >
> > > > > On the same idea, I want to propose an approach where the pipeline
> can
> > > be
> > > > > configured without user setting TUPLE_CLASS or even creating POJOs
> and
> > > > > adding them to classpath.
> > > > > Here is the document which at a high level explains the idea and a
> high
> > > > > level design:
> > > > > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_
> > > > > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing
> > > > >
> > > > > I would like to get opinion from community about feasibility and
> > > > > applications of this proposal.
> > > > > Once we get some consensus we can discuss the design in details.
> > > > >
> > > > > Thanks,
> > > > > Chinmay.
> > > > >
> > > >
> > >
> >
>


[GitHub] apex-malhar pull request #544: APEXMALHAR-2397 #resolve Removing DAG.GATEWAY...

2017-01-30 Thread sashadt
GitHub user sashadt opened a pull request:

https://github.com/apache/apex-malhar/pull/544

APEXMALHAR-2397 #resolve Removing DAG.GATEWAY_CONNECT_ADDRESS which i…

…s causing evaluation failures during apex get-app-package-info call

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sashadt/apex-malhar 
PiDemoAppData-DAG-null.APEXMALHAR-2397

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/544.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #544


commit b61d4fc21f2cec23be0643a11e1f1533d65fa5e5
Author: sashadt 
Date:   2017-01-31T02:48:18Z

APEXMALHAR-2397 #resolve Removing DAG.GATEWAY_CONNECT_ADDRESS which is 
causing evaluation failures during apex get-app-package-info call




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-core pull request #461: APEXCORE-504 - Possible race condition in Strea...

2017-01-30 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/apex-core/pull/461

APEXCORE-504 - Possible race condition in 
StreamingContainerAgent.getStreamCodec()

@PramodSSImmaneni or @tweise Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/apex-core APEXCORE-504

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #461


commit 29ca3ef1966b1ca2071136dd57ce860f05dfcf21
Author: Vlad Rozov 
Date:   2017-01-31T01:24:45Z

APEXCORE-504 - Possible race condition in 
StreamingContainerAgent.getStreamCodec()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Schema Discovery Support in Apex Applications

2017-01-30 Thread Sergey Golovko
Sorry, I’m a new person in the APEX team. And I don't understand clearly who 
are consumers of the output port operator schema(s).

1. If the consumers are non-run-time callers like the application manager or UI 
designer, maybe it makes sense to use Java static method(s) to retrieve the 
output port operator schema(s). I guess the performance of a single call of a 
static method via reflection can be ignored.

2. If the consumer is next downstream operator, maybe it makes sense to send an 
output port operator schema from upstream operator to next downstream operator 
via the stream. The corresponded methods that would send and receive the schema 
should be declared in the interface/abstract-class of the upstream and 
downstream operators. The sending/receiving of an output schema should be 
processed right before the sending of the first data record via the stream.

One of examples of a typical implementation for sending of metadata with a 
regular result set is the sending of JDBC metadata as a part of JDBC result 
set. And I hope the output schema (metadata of the streamed data) in the 
implementation should contain not only a signature of the streamed objects 
(like field names and data types), but also any other properties of the data 
that can be useful by the schema receiver to process the data (for instance, a 
delimiter for CSV record stream).

Thanks,
Sergey

On 2017-01-25 01:47 (-0800), Chinmay Kolhatkar  wrote: 
> Thank you all for the feedback.
> 
> I've created a Jira for this: APEXCORE-623 and I'll attach the same
> document and link to this mailchain there.
> 
> As a first part of this Jira, there are 2 steps I would like to propose:
> 1. Add following interface at com.datatorrent.common.util.SchemaAware.
> 
> interface SchemaAware {
> 
> Map registerSchema(Map inputSchema);
> }
> 
> This interface can be implemented by Operators to communicate its output
> schema(s) to engine.
> Input to this schema will be schema at its input port.
> 
> 2. After LogicalPlan is created call SchemaAware method from upstream to
> downstream operator in the DAG to propagate the Schema.
> 
> Once this is done, changes can be done in Malhar for the operators in
> question.
> 
> Please share your opinion on this approach.
> 
> Thanks,
> Chinmay.
> 
> 
> 
> 
> On Wed, Jan 18, 2017 at 2:31 PM, Priyanka Gugale  wrote:
> 
> > +1 to have this feature.
> >
> > -Priyanka
> >
> > On Tue, Jan 17, 2017 at 9:18 PM, Pramod Immaneni 
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Jan 16, 2017 at 1:23 AM, Chinmay Kolhatkar 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Currently a DAG that is generated by user, if contains any POJOfied
> > > > operators, TUPLE_CLASS attribute needs to be set on each and every port
> > > > which receives or sends a POJO.
> > > >
> > > > For e.g., if a DAG is like File -> Parser -> Transform -> Dedup ->
> > > > Formatter -> Kafka, then TUPLE_CLASS attribute needs to be set by user
> > on
> > > > both input and output ports of transform, dedup operators and also on
> > > > parser output and formatter input.
> > > >
> > > > The proposal here is to reduce work that is required by user to
> > configure
> > > > the DAG. Technically speaking if an operators knows input schema and
> > > > processing properties, it can determine output schema and convey it to
> > > > downstream operators. This way the complete pipeline can be configured
> > > > without user setting TUPLE_CLASS or even creating POJOs and adding them
> > > to
> > > > classpath.
> > > >
> > > > On the same idea, I want to propose an approach where the pipeline can
> > be
> > > > configured without user setting TUPLE_CLASS or even creating POJOs and
> > > > adding them to classpath.
> > > > Here is the document which at a high level explains the idea and a high
> > > > level design:
> > > > https://docs.google.com/document/d/1ibLQ1KYCLTeufG7dLoHyN_
> > > > tRQXEM3LR-7o_S0z_porQ/edit?usp=sharing
> > > >
> > > > I would like to get opinion from community about feasibility and
> > > > applications of this proposal.
> > > > Once we get some consensus we can discuss the design in details.
> > > >
> > > > Thanks,
> > > > Chinmay.
> > > >
> > >
> >
> 


[GitHub] apex-core pull request #446: APEXCORE-610 Avoid multiple calls to getBytes.

2017-01-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-core/pull/446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: One Yarn with Multiple Apex Applications

2017-01-30 Thread Munagala Ramanath
Are you running on the sandbox (if so, what version ?) or your own cluster ?

In either case, please check the following configuration item in
capacity-scheduler.xml:

 
yarn.scheduler.capacity.maximum-am-resource-percent
0.1

  Maximum percent of resources in the cluster which can be used to run
  application masters i.e. controls number of concurrent running
  applications.

  

Try increasing the value from 0.1 to 0.5, restart YARN and try launching
multiple applications again.

Ram

On Mon, Jan 30, 2017 at 12:54 AM, Santhosh Kumari G <
santhosh.kum...@qolsys.com> wrote:

> Hi,
>
>   Can we launch more than one (multiple) apex engine in one node with
> multiple terminals and one yarn running. If yes, what is the process.
>
> I tried launching 2 apex apps with 2 apex engine's. First apex app is
> running without any issue using the port 8042 configured in
> yarn-default.xml Then I tried to launch 2nd app it is saying accepted but
> not running as 8042 port is already in use. When I killed the 1st app,2nd
> app is getting launched automatically.
>
> So can we manage one yarn with multiple apex engine app's?.
>
> Thank you,
> Santhosh Kumari G.
>


Re: One Yarn with Multiple Apex Applications

2017-01-30 Thread AJAY GUPTA
Hi Santhosh,

We can definitely run multiple Apex applications on a single yarn instance.
The behaviour in your case is most probably due to shortage of resources
that were required by the second application. Once the first application
was killed, the resources were released and the second application got all
its required resources and started running.


Ajay

On Mon, 30 Jan 2017 at 8:47 PM, Santhosh Kumari G <
santhosh.kum...@qolsys.com> wrote:

> Hi,
>
>   Can we launch more than one (multiple) apex engine in one node with
> multiple terminals and one yarn running. If yes, what is the process.
>
> I tried launching 2 apex apps with 2 apex engine's. First apex app is
> running without any issue using the port 8042 configured in
> yarn-default.xml Then I tried to launch 2nd app it is saying accepted but
> not running as 8042 port is already in use. When I killed the 1st app,2nd
> app is getting launched automatically.
>
> So can we manage one yarn with multiple apex engine app's?.
>
> Thank you,
> Santhosh Kumari G.
>


Re: One Yarn with Multiple Apex Applications

2017-01-30 Thread Chinmay Kolhatkar
Hi Santhosh,

It seems that your YARN does not have enough resources available for
allocating memory for 2 application.
When you kill the first application, the memory is regained by yarn and
then allocated to the second application.

You can try to give more memory to yarn if your system allows RAM.

You can add a property as follows in yarn-site.xml and restart yarn
services:

yarn.nodemanager.resource.memory-mb
8192


You can give the value in MB as per availability of RAM on your machine.

-Chinmay.



On Mon, Jan 30, 2017 at 2:24 PM, Santhosh Kumari G <
santhosh.kum...@qolsys.com> wrote:

> Hi,
>
>   Can we launch more than one (multiple) apex engine in one node with
> multiple terminals and one yarn running. If yes, what is the process.
>
> I tried launching 2 apex apps with 2 apex engine's. First apex app is
> running without any issue using the port 8042 configured in
> yarn-default.xml Then I tried to launch 2nd app it is saying accepted but
> not running as 8042 port is already in use. When I killed the 1st app,2nd
> app is getting launched automatically.
>
> So can we manage one yarn with multiple apex engine app's?.
>
> Thank you,
> Santhosh Kumari G.
>


Re: [DISCUSS] Policy for patches

2017-01-30 Thread Pramod Immaneni
You make some fair points, a contributor may not want to submit patches for
all release branches but the community can pick it up as a policy. In many
cases, it might be as simple as the reviewer cherry picking the fix onto
the other branches. In cases where it is not trivial and the reviewer or
somebody in the community cannot help out at that time, we can put in the
backlog till somebody picks it up and possibly use JIRA to track this
backlog.

On Sun, Jan 29, 2017 at 11:22 AM, Thomas Weise  wrote:

> The problem with this discussion is that it assumes that a policy could be
> established to apply patches. Any form of contribution to the project is
> volunteer work, so this is a no starter.
>
> For example, if someone contributes a patch, then there is no way to
> enforce contribution of the patch for multiple branches. A contributor may
> do it due to other interests (like the vendor having to support a customer
> on that code base), another contributor may have no such incentive.
> Likewise the committer reviewing the work cannot be forced to repeat the
> same for multiple branches.
>
> I would like to see vendor motivation cleanly separated from community
> concerns.
>
> Perhaps it makes sense to come up with recommendations or guidelines around
> this though. For example there is in general little incentive for the
> community to release from outdated branches or to release code with known
> issues such as CVE. And there is a release process and vote to deal with
> it.
>
> For example, one could recommend to not maintain a minor release branch
> after there have been n (2?) more recent minor or major releases. Or that
> we don't want maintenance releases for minor releases that have security
> issues.
>
> Looking at the current situation, I think that 3.2 and 3.3. could be
> considered obsolete from community perspective (which does not stop a
> vendor to add patches and consume it for their purposes).
>
> Soon (maybe when 3.6 is out?) there should be little reason to maintain 3.4
> (3.5 is backward compatible and users should be incentivised to move up).
>
> I also think that under the current contribution guidelines there is no
> need to remove branches (even when they are fully reflected in tags). See
> apex-core repository.
>
> I do think however that it may be good to clean up the pre ASF branches in
> apex-malhar.
>
> Thomas
>
>
>
> On Fri, Jan 27, 2017 at 11:04 AM, Vlad Rozov 
> wrote:
>
> > I prefer to go with the second approach as well.
> >
> > My preference is to go not with a strict end of life policy, but by
> > severity of an issue and complexity of providing fixes for all subsequent
> > releases. In a case a contributor decides to fix a bug in
> > an old release, she will need to provide the fix for many branches. It is
> > unlikely that such work will be done without justification.
> >
> > I am strongly against deleting old branches:
> > - they preserve history.
> > - I am not 100% sure, but it is likely against ASF policy. Any
> > contribution to a project needs to be preserved (including author of a
> > commit).
> > - It does not cost much to have branches in remote git repository and it
> > does not affect git operations
> > - It is not necessary to load all branches into local repository
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 1/27/17 10:16, Sanjay Pujare wrote:
> >
> >> A strong +1 for the second approach for the reasons Pramod mentioned.
> >>
> >> Is it also possible to “prune” branches so that we have less of this
> >> activity of merging fixes across branches? If we can ascertain that a
> >> certain branch is not used by any user/customer (by asking in the
> >> community) we should be able to remove it. For example, apex-malhar has
> >> release-3.6 which is definitely required but 3 year old branches like
> >> release-0.8.5, release-0.9.0, … telecom most probably are not being
> used by
> >> anybody.
> >>
> >> On 1/27/17, 8:43 AM, "Pramod Immaneni"  wrote:
> >>
> >>  Hi,
> >>   I wanted to bring up the topic of patches for issues
> discovered
> >> in older
> >>  releases and start a discussion to come up with a policy on how to
> >> apply
> >>  them.
> >>   One approach is the patch gets only applied to the release it
> >> was
> >>  discovered in and master. Another approach is it gets applied to
> all
> >>  release branches >= discovered release and master. There may be
> other
> >>  approaches as well which can come up in this discussion.
> >>   The advantage of the first approach is that the immediate work
> >> is limited
> >>  to a single fix and merge. The second approach requires more work
> >> initially
> >>  as the patch needs to get applied to more one or more places.
> >>   I am tending towards the second approach of applying the fix
> to
> >> all release
> >>  branches >= discovered release, while also having some sort of an
> 

One Yarn with Multiple Apex Applications

2017-01-30 Thread Santhosh Kumari G
Hi,

  Can we launch more than one (multiple) apex engine in one node with 
multiple terminals and one yarn running. If yes, what is the process.

I tried launching 2 apex apps with 2 apex engine's. First apex app is running 
without any issue using the port 8042 configured in yarn-default.xml Then I 
tried to launch 2nd app it is saying accepted but not running as 8042 port is 
already in use. When I killed the 1st app,2nd app is getting launched 
automatically.

So can we manage one yarn with multiple apex engine app's?.

Thank you,
Santhosh Kumari G.


Re: APEXMALHAR-2261 Python Binding for HighLevel APIs

2017-01-30 Thread vikram patil
Hi Thomas,

I had looked at APEXMALHAR-2260 as well and it will also be part of this
development. Though Apex provide python script operator, it is actually
very limited script implementation. Lambda function or custom python
functions which may have to run as scripts in python operator can be
serialised using CloudPickle and run on various nodes.

I am still investigating how to ensure that all libraries required by
python code made available to operators running on different nodes. One of
the approach suggested by cloudera is to make sure all libraries are
available on each node of the cluster. This was suggested with respect to
pyspark jobs .

Please do suggest better alternative for making python environment
available as required even in cluster environment.

Thanks & Regards,
Vikram

On Sun, Jan 29, 2017 at 1:11 AM, Thomas Weise  wrote:

> Hi,
>
> Python support would be great to have. Users look for the ability to use
> Python with its library ecosystem. How will that be possible with this API
> proposal?
>
> I suspect that just being able to wire operators in Python is of limited
> impact when operators cannot execute Python. Have you looked
> at APEXMALHAR-2260 as well?
>
> Thanks
>
>
> On Fri, Jan 27, 2017 at 11:39 PM, vikram patil 
> wrote:
>
> > Hi All,
> >
> > I would like to take up development for python binding implementation for
> > highlevel APIs (APEXMALHAR-2261 ). I went over High-Level APIs from
> Apache
> > Malhar Stream API project. It can be initiated as separated project in
> the
> > Apache Malhar project just like sql or stream project.
> >
> > In first phase I would like to focus on providing python binding for
> > following APIs:
> >
> > 1) StreamFactory.fromFolder
> > 2) StreamFactory.fromKafka*
> > 3) StreamFactory.fromLocal
> > 4) StreamFactory.fromInput
> > 5) ApexStream.map
> > 6) ApexStream.flatMap
> > 7) ApexStream.filter
> > 9) ApexStream.endWith
> > 11) ApexStream.setGlobalAttribute
> > 12) Custom functions in python .
> >
> >
> > Rest of the Apex HighLevel APIs such as addStream, addOperator can be
> > implemented as part of phase II .
> >
> >
> > For integration of this purpose,I would like to use py4j as python-java
> > binding due to wide acceptance and very good community support.  Also
> py4j
> > also allows call backs to python code from java which can make certain
> > functionalities easier to implement.
> >
> > Py4j Version: 0.10.4
> >
> > Please share your suggestions about this implementation.
> >
> > Thanks & Regards,
> > Vikram
> >
>


Re: APEXMALHAR-2261 Python Binding for HighLevel APIs

2017-01-30 Thread vikram patil
Hi Thomas,

I had looked at APEXMALHAR-2260 as well and it will also be part of this
development. Though Apex provide python script operator, it is actually
very limited script implementation. Lambda function or custom python
functions which may have to run as scripts in python operator can be
serialised using CloudPickle and run on various nodes.

I am still investigating how to ensure that all libraries required by
python code made available to operators running on different nodes. One of
the approach suggested by cloudera is to make sure all libraries are
available on each node of the cluster. This was suggested with respect to
pyspark jobs .

Thanks & Regards,
Vikram




On Sun, Jan 29, 2017 at 1:11 AM, Thomas Weise  wrote:

> Hi,
>
> Python support would be great to have. Users look for the ability to use
> Python with its library ecosystem. How will that be possible with this API
> proposal?
>
> I suspect that just being able to wire operators in Python is of limited
> impact when operators cannot execute Python. Have you looked
> at APEXMALHAR-2260 as well?
>
> Thanks
>
>
> On Fri, Jan 27, 2017 at 11:39 PM, vikram patil 
> wrote:
>
> > Hi All,
> >
> > I would like to take up development for python binding implementation for
> > highlevel APIs (APEXMALHAR-2261 ). I went over High-Level APIs from
> Apache
> > Malhar Stream API project. It can be initiated as separated project in
> the
> > Apache Malhar project just like sql or stream project.
> >
> > In first phase I would like to focus on providing python binding for
> > following APIs:
> >
> > 1) StreamFactory.fromFolder
> > 2) StreamFactory.fromKafka*
> > 3) StreamFactory.fromLocal
> > 4) StreamFactory.fromInput
> > 5) ApexStream.map
> > 6) ApexStream.flatMap
> > 7) ApexStream.filter
> > 9) ApexStream.endWith
> > 11) ApexStream.setGlobalAttribute
> > 12) Custom functions in python .
> >
> >
> > Rest of the Apex HighLevel APIs such as addStream, addOperator can be
> > implemented as part of phase II .
> >
> >
> > For integration of this purpose,I would like to use py4j as python-java
> > binding due to wide acceptance and very good community support.  Also
> py4j
> > also allows call backs to python code from java which can make certain
> > functionalities easier to implement.
> >
> > Py4j Version: 0.10.4
> >
> > Please share your suggestions about this implementation.
> >
> > Thanks & Regards,
> > Vikram
> >
>