Re: [ANNOUNCE] New Apache Apex PMC: Tushar Gosavi

2017-11-03 Thread Chaitanya Chebolu
Congratulations Tushar!!!

On Fri, Nov 3, 2017 at 10:06 PM, Venkatesh Kottapalli <
venkat...@datatorrent.com> wrote:

> Congratulations Tushar!
>
> -Venky
>
> On Fri, Nov 3, 2017 at 9:24 AM Hitesh Kapoor 
> wrote:
>
> > Congratulations Tushar.
> >
> > Regards,
> > Hitesh Kapoor
> >
> > On 03-Nov-2017 9:53 PM, "Ashwin Chandra Putta"  >
> > wrote:
> >
> > > Congratulations Tushar!!
> > >
> > > Regards,
> > > Ashwin.
> > >
> > > On Fri, Nov 3, 2017 at 9:17 AM, Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > A bit delayed but nevertheless important announcement, Apache Apex
> PMC
> > is
> > > > pleased to announce Tushar Gosavi as a new PMC member.
> > > >
> > > > Tushar has been contributing to Apex from the beginning of the
> project
> > > and
> > > > has been working on the codebase for over 3 years. He is among the
> few
> > > who
> > > > have a wide breadth of contribution, including both core and malhar,
> > from
> > > > internal changes to user facing api, from input/output operators to
> > > > components that support operators, and has a good overall
> understanding
> > > of
> > > > the codebase and how it works.
> > > >
> > > > His salient contributions over the years are
> > > >
> > > >- Module support in Apex
> > > >- Operator additions and improvements such as S3, File input and
> > > output,
> > > >partitionable unique count and dynamic partitioning improvements
> > > >- Initial WAL implementation from which subsequent implementations
> > > were
> > > >derived for different use cases
> > > >- Plugin support for Apex
> > > >- Various bug fixes and improvements in both malhar and core that
> > you
> > > >can find in the JIRA
> > > >- Participated in long-term project maintenance tasks such as
> > > >refactoring operators and demos
> > > >- Participated in important feature discussions
> > > >- Reviewed and committed pull requests from contributors
> > > >- Participated in conducting and teaching in an Apex workshop at a
> > > >university and speaking at Apex conference organized by
> DataTorrent
> > > >
> > > > Conference talks & Presentations
> > > >
> > > >1. Presentations at VIIT and PICT Pune
> > > >2. http://www.apexbigdata.com/pune-platform-talk-9.html
> > > >3. http://www.apexbigdata.com/pune-integration-talk-4.html
> > > >4. Webinar on Smart Partitioning with Apex. (
> > > >https://www.youtube.com/watch?v=HCATB1zlLE4
> > > >)
> > > >5. Presented about customer use case at Pune Meetup in 2016
> > > >
> > > > Pramod for the Apache Apex PMC.
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>



-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: [ANNOUNCE] New Apache Apex Committer: Ananth Gundabattula

2017-11-03 Thread Chaitanya Chebolu
Congrats Ananth!!

On Fri, Nov 3, 2017 at 9:49 PM, Bhupesh Chawda 
wrote:

> Congratulations Ananth!
>
> ~ Bhupesh
>
>
> ___
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Fri, Nov 3, 2017 at 9:37 PM, Tushar Gosavi 
> wrote:
>
> > Congratulations Ananth.
> >
> > - Tushar.
> >
> >
> > On Fri, Nov 3, 2017 at 9:28 PM, Ashwin Chandra Putta <
> > ashwinchand...@gmail.com> wrote:
> >
> > > Congratulations Ananth. Well deserved!
> > >
> > > On Fri, Nov 3, 2017 at 2:08 AM, Chinmay Kolhatkar <
> > chin...@datatorrent.com
> > > >
> > > wrote:
> > >
> > > > Congrats Ananth!!
> > > >
> > > > On Fri, Nov 3, 2017 at 2:37 PM, Priyanka Gugale 
> > > wrote:
> > > >
> > > > > Congrats Ananth!!
> > > > >
> > > > > -Priyanka
> > > > >
> > > > > On Fri, Nov 3, 2017 at 2:20 PM, Thomas Weise 
> wrote:
> > > > >
> > > > > > The Project Management Committee (PMC) for Apache Apex is pleased
> > to
> > > > > > announce Ananth Gundabattula as new committer.
> > > > > >
> > > > > > Ananth has been contributing to the project for about a year.
> > > > Highlights:
> > > > > >
> > > > > > * Cassandra and Kudu operators with in-depth analysis/design work
> > > > > > * Good collaboration, adherence to contributor guidelines and
> > > ownership
> > > > > of
> > > > > > work
> > > > > > * Work beyond feature focus such as fixing pre-existing test
> issues
> > > > that
> > > > > > impact CI
> > > > > > * Presented at YOW Data and Dataworks Summit Australia
> > > > > > * Enthusiast, contributes on his own time
> > > > > >
> > > > > > Welcome, Ananth, and congratulations!
> > > > > > Thomas, for the Apache Apex PMC.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>



-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: Malhar release 3.8.0

2017-10-25 Thread Chaitanya Chebolu
+1 on new release.

Thanks,

On Wed, Oct 25, 2017 at 9:09 PM, Vlad Rozov  wrote:

> +1.
>
> Thank you,
>
> Vlad
>
>
> On 10/25/17 08:21, Amol Kekre wrote:
>
>> +1 on a new malhar release.
>>
>> Thks,
>> Amol
>>
>>
>> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>>
>> www.datatorrent.com
>>
>>
>> On Tue, Oct 24, 2017 at 9:12 PM, Tushar Gosavi 
>> wrote:
>>
>> +1 on creating a new malhar release.
>>>
>>> - Tushar.
>>>
>>>
>>> On Wed, Oct 25, 2017 at 4:39 AM, Pramod Immaneni >> >
>>> wrote:
>>>
>>> +1 on creating a new release. I, unfortunately, do not have the time
 currently to participate in the release activities.

 On Mon, Oct 23, 2017 at 7:15 PM, Thomas Weise  wrote:

 The last release was back in March, there are quite a few JIRAs that
>
 have
>>>
 been completed since and should be released.
>
> https://issues.apache.org/jira/issues/?jql=fixVersion%
> 20%3D%203.8.0%20AND%20project%20%3D%20APEXMALHAR%20ORDER%
> 20BY%20status%20ASC
>
>  From looking at the list there is nothing that should stand in the way
>
 of a

> release?
>
> Also, once the release is out it would be a good opportunity to effect
>
 the

> major version change.
>
> Anyone interested to be the release manager?
>
> Thanks,
> Thomas
>
>
>


-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: [VOTE] Major version change for Apex Library (Malhar)

2017-09-01 Thread Chaitanya Chebolu
I think the backward in-compatibility is there in both options.  I think we
should give sufficient time frame for community members/ users for major
changes.
So, I'm -1 on both options.

On Fri, Sep 1, 2017 at 5:43 PM, Sandeep Deshmukh  wrote:

> -1 from my side for "1. Version 4.0 as major version change from 3.x"
>
>
> Was out of station for festival season in India so delay in voting and
> replying.
>
> Reason:
> I don't see a compelling reason for a major change at this moment. As a
> user of Apex, I have developed a system using Apex but now I have to worry
> about a potential disrupting change. I have to worry about porting to a new
> version or stick to the older version and missing on new features.
>
> I found features like Control Tuples very useful. They are recently added
> and I am sure the newer features added would also be cool. I worry is, in
> the startup-mode that I am working on, do I have the bandwidth to move to a
> newer version or have to explore other options leaving out Apex.
>
> Regards
> Sandeep
>
>
>
>
> On Fri, Sep 1, 2017 at 5:32 PM, Priyanka Gugale  wrote:
>
> > Apologies for being late in discussions. I wanted to understand one
> thing.
> > As Thomas mentioned some of our operators are not matured enought or
> lacks
> > operability. Do we know if such operators need any backword incompatible
> > changes? e.g. modification to api etc? Do we have plan to promote
> operators
> > from Evolving to Stable during major version change? I think we should
> > think it through. We should list down all possible backword incompatible
> > changes, and then cut the release. Let's give sometime to developers to
> > come up with such issues in a given time window. A sudden change in major
> > version doesn't give us enough time to identify and address all such
> issues
> > and we may warrent one more major version change shortly.
> >
> > I will propose let's keep this open for sometime and we focus on
> > identifying changes which should go in next major version and then go for
> > it.
> >
> > -1 for immediate release or even making release branch now.
> >
> > -Priyanka
> >
> > On Fri, Sep 1, 2017 at 11:41 AM, Milind Barve  wrote:
> >
> > > Hi
> > >
> > > First of all my apologies for voting late. However, I will still do it
> > > since the mail says the vote would remain open for *at least* 72 hours
> :)
> > > I believe the objective is to do the right things rightly.
> > >
> > > Moving to a new version is something that is a part of any product
> > > lifecycle. While doing so, what is taken into account is -
> > >
> > > 1. What value the proposed changes are adding to the product.
> > > 2. Are the changes big enough to warrant a major version change
> > > 3. How big a disruption would it cause to the existing users or
> dependent
> > > products
> > > 4. Is enough of a heads up and time given to the users/dependent
> products
> > > to plan for the changes being introduced
> > >
> > > There could be other factors as well, but I think the above are the
> most
> > > critical ones.
> > >
> > > As regards the proposed changes, I don't think they satisfy either of
> the
> > > above criteria (expect may be #2 - which is a purely engg. decision)
> > >
> > > Given the changes proposed,
> > >
> > > 1. It is going to break the backward compatibility.
> > > 2. This is a disruptive change which is going to have existing users
> plan
> > > for and make changes in their product(s). They cannot move to 4.0
> without
> > > making these changes.
> > > 3. Don't think enough of a heads up has been giving to achieve what the
> > > users will have to do. As an industry standard, a heads up for at
> least 2
> > > releases is given before the change happens.
> > >
> > > Due to the above reasons, I am voting a -1
> > >
> > > Regards,
> > >
> > > On Sat, Aug 26, 2017 at 10:35 PM, Thomas Weise  wrote:
> > >
> > > > Being against something is not a valid reason. I also want to repeat
> a
> > > > point made earlier regarding discussion style:
> > > >
> > > > To facilitate a constructive, continuous discussion and make
> progress,
> > it
> > > > is necessary that participants take into account what was already
> > > > addressed. A few folks that never participated in the discussions
> > leading
> > > > to this vote don't make or don't want to make that effort.
> > > >
> > > > The list of functional features added by a release is determined when
> > the
> > > > release comes out, not before work starts. Furthermore, unless you
> own
> > a
> > > > crystal ball, you won't know what contributors decide to take up in
> the
> > > > future, as that's entirely up to them.
> > > >
> > > > The reason to start a major release line is to enable breaking
> changes,
> > > as
> > > > stated in the previous response. The top issues on my list don't
> > include
> > > > functional feature, but overall lack of consistency, modularity, too
> > many
> > > > 

Re: [jira] [Commented] (APEXMALHAR-2537) POM bundling issue in apex-malhar/kafka/kafka010

2017-08-11 Thread Chaitanya Chebolu
Hi Srikanth,

  Below dependency also doesn't work. Because the abstract operator is
under kafka-common module.

org.apache.apex
malhar-kafka
3.8.0-SNAPSHOT


*
*




  You need to include the dependency as follows:

org.apache.apex
malhar-kafka
3.8.0-SNAPSHOT


org.apache.apex
malhar-library



  (or)

org.apache.apex
malhar-kafka
3.8.0-SNAPSHOT


Regards,

On Fri, Aug 11, 2017 at 6:26 AM, Vinay Bangalore Srikanth (JIRA) <
j...@apache.org> wrote:

>
> [ https://issues.apache.org/jira/browse/APEXMALHAR-2537?
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&
> focusedCommentId=16122624#comment-16122624 ]
>
> Vinay Bangalore Srikanth commented on APEXMALHAR-2537:
> --
>
> Update:
> 
>   org.apache.apex
> malhar-kafka010
>3.8.0-SNAPSHOT
>works.
>
> 
> org.apache.apex
> malhar-kafka010
> 3.8.0-SNAPSHOT
> 
> 
> *
> *
> 
> 
>   does not work
>
>
> 
> org.apache.apex
> malhar-kafka
> 3.8.0-SNAPSHOT
> 
> 
> *
> *
> 
> 
>works!
>
> Just wondering why this difference between two kafka versions.
>
> > POM bundling issue in apex-malhar/kafka/kafka010
> > 
> >
> > Key: APEXMALHAR-2537
> > URL: https://issues.apache.org/
> jira/browse/APEXMALHAR-2537
> > Project: Apache Apex Malhar
> >  Issue Type: Bug
> >Reporter: Vinay Bangalore Srikanth
> >
> > This error is seen in the file - 
> > *_org.apache.apex.malhar.kafka.AbstractKafkaInputOperator
> cannot be resolved. It is indirectly referenced from required .class files_*
> > Even though, I have included this in the pom.xml
> > 
> >   org.apache.apex
> >   malhar-kafka010
> >   3.8.0-SNAPSHOT
> >   
> > 
> >   *
> >   *
> > 
> >   
> > 
> > I had to add this explicitly to resolve -
> > 
> >   org.apache.apex
> >   malhar-kafka-common
> >   3.8.0-SNAPSHOT
> >   
> > 
> >   org.apache.kafka
> >   kafka_2.11
> > 
> >   
> > 
> > However, I don't see such issue with -
> > 
> >  org.apache.apex
> > malhar-kafka
> > ${malhar.version}
> > 
> >
> > *
> > *
> >  
> >
> >   
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>



-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: [VOTE] Apache Apex Core Release 3.6.0 (RC1)

2017-05-04 Thread Chaitanya Chebolu
+1

1) Verified the file Integrity check.
2) Verified the source code using tar file.
- Extracted tar file and compiled.
- No binary files
- LICENSE, NOTICE, README, CHANGELOG.md exist.
- rat check
3) Launched PI demo.

Regards,
Chaitanya

On Wed, May 3, 2017 at 7:07 AM, Pramod Immaneni 
wrote:

> +1 (binding)
>
> verified file integrity
> no unexpected binary files
> existence of README.md, NOTICE, LICENSE and CHANGELOG.md files
> checked build and licenses
> launched and ran pi demo
>
> Minor nitpicks
>
> CHANGELOG.md has a line in the beginning of the file that has a date in
> future "Version 3.6.0 - 2017-05-04"
> Please have your key signed by some of the others in the KEYS file, right
> now it is self-signed
>
> Thanks
>
> On Mon, May 1, 2017 at 11:49 AM, Tushar Gosavi 
> wrote:
>
> > Dear Community,
> >
> > Please vote on the following Apache Apex Core 3.6.0 release candidate 1.
> >
> > This release adds the support for custom control tuples, experimental
> > support for plugins
> > and other improvements and important bug fixes.
> >
> > This is a source release with binary artifacts published to Maven.
> >
> > List of all issues fixed: https://s.apache.org/HQ0r
> >
> > Staging directory
> > https://dist.apache.org/repos/dist/dev/apex/apache-apex-core-3.6.0-RC1/
> > Source zip:
> > https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > core-3.6.0-RC1/apache-apex-core-3.6.0-source-release.zip
> > Source tar.gz:
> > https://dist.apache.org/repos/dist/dev/apex/apache-apex-
> > core-3.6.0-RC1/apache-apex-core-3.6.0-source-release.tar.gz
> > Maven staging repository:
> > https://repository.apache.org/content/repositories/orgapacheapex-1028
> >
> > Git source:
> > https://git-wip-us.apache.org/repos/asf?p=apex-core.git;a=
> > commit;h=refs/tags/v3.6.0-RC1
> > (commit: 5a517348ae497c06150f32ce39b6915588e92510)
> >
> > PGP key:
> > http://pgp.mit.edu:11371/pks/lookup?op=vindex=tus...@apache.org
> > KEYS file:
> > https://dist.apache.org/repos/dist/release/apex/KEYS
> >
> > More information at:
> > http://apex.apache.org
> >
> > Please try the release and vote; vote will be open for at least 72 hours.
> >
> > [ ] +1 approve (and what verification was done)
> > [ ] -1 disapprove (and reason why)
> >
> > http://www.apache.org/foundation/voting.html
> >
> > How to verify release candidate:
> >
> > http://apex.apache.org/verification.html
> >
> > Thanks,
> > Tushar.
> >
>



-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: Kafka Input Operator for 0.10.x

2017-03-29 Thread Chaitanya Chebolu
Created JIRA : https://issues.apache.org/jira/browse/APEXMALHAR-2459


On Wed, Mar 22, 2017 at 4:11 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Hi All,
>
>I would like to propose the Kafka Input Operator using 0.10.0.0
> consumer APIs.
>
> Changes in 0.10.0.0 version of Kafka
> -
>Message format in 0.10.0 has changed. Messages now include a time-stamp
> field and compressed messages include relative offsets.
>
>Please refer the below link about the changes in Kafka 0.10.0.0 version:
> https://kafka.apache.org/documentation/#upgrade_10_breaking.
>
> Design
> -
>   I'm proposing the below design for new Kafka connectors by refactoring
> the existing Kafka connector(0.9.* version)
> 1) Split up the malhar-kafka to malhar-kafka-common, malhar-kafka_0.9,
> malhar-kafka_0.10. Directory structure would be as follows:
>   malhar-kafka
>   |--malhar-kafka-common
>   |--malhar-kafka_0.9
>   |--malhar-kafka_0.10
> If possible, will add the connectors for 0.8.x version.
> 2) Convert the KafkaConsumerWrapper to AbstractKafkaConsumerWrapper by
> adding the abstract methods over the consumer API's.
> 3) "malhar-kafka-common" package consists of the abstract classes and by
> default it builds over the 0.9.x version of Kafka.
> 4) "malhar-kafka-0.*" consists of concrete classes and its corresponding
> KafkaConsumerWrapper.
>
>   Please share your thoughts on design.
>
> Regards,
> Chaitanya
>
> --
>
> *Chaitanya*
>
> Software Engineer
>
> E: chaita...@datatorrent.com | Twitter: @chaithu1403
>
> www.datatorrent.com  |  apex.apache.org
>
>
>


-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Re: Redshift Output Operator

2017-02-20 Thread Chaitanya Chebolu
Created JIRA for this task: APEXMALHAR-2416

On Mon, Feb 13, 2017 at 4:14 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Hi All,
>
>   I am proposing Amazon Redshift output module.
>   Please refer below link about the Redshift: https://aws.amazon.com/
> redshift/
>
>   Primary functionality of this module is load data into Redshift tables
> from data files using copy command. Refer the below link about the copy
> command:
> http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
>
> Input type to this module is byte[].
>
>   I am proposing the below design:
> 1) Write the tuples into EMR/S3. By default, it writes to S3.
> 2) Once the file is rolled, upload the file into Redshift using copy
> command.
>
> Please share your thoughts on design.
>
> Regards,
> Chaitanya
>



-- 

*Chaitanya*

Software Engineer

E: chaita...@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org


Redshift Output Operator

2017-02-13 Thread Chaitanya Chebolu
Hi All,

  I am proposing Amazon Redshift output module.
  Please refer below link about the Redshift:
https://aws.amazon.com/redshift/

  Primary functionality of this module is load data into Redshift tables
from data files using copy command. Refer the below link about the copy
command:
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Input type to this module is byte[].

  I am proposing the below design:
1) Write the tuples into EMR/S3. By default, it writes to S3.
2) Once the file is rolled, upload the file into Redshift using copy
command.

Please share your thoughts on design.

Regards,
Chaitanya


Re: [VOTE] Apache Apex Malhar Release 3.6.0 (RC1)

2016-12-01 Thread Chaitanya Chebolu
+1

Verified the following:
1) File Integrity Check.
2) Source Code Verification using tar.gz
- Extracted tar file and compiled.
- No binary files
- LICENSE, NOTICE, README, CHANGELOG.md exist.
- rat check
3) Launched PI demo.

On Thu, Dec 1, 2016 at 1:19 PM, Vlad Rozov  wrote:

> +1 (binding)
>
> Verified checksums
> Verified LICENSE, NOTICE and README.md.
> Build with:
>
> mvn clean apache-rat:check verify -Dlicense.skip=false -Pall-modules
> install -DskipTests
>
> Thank you,
>
> Vlad
>
>
>
> On 11/30/16 22:56, Bhupesh Chawda wrote:
>
>> +1
>>
>>   - Verified signatures
>>   - Build and test successful
>>   - LICENSE, NOTICE, README, CHANGELOG.md exist
>>
>> ~ Bhupesh
>>
>>
>>
>> On Thu, Dec 1, 2016 at 11:24 AM, David Yan  wrote:
>>
>> +1 (binding)
>>>
>>> Verified existance of LICENSE, NOTICE, README.md and CHANGELOG.md files
>>> Built with this command:
>>>
>>>mvn clean apache-rat:check verify -Dlicense.skip=false -Pall-modules
>>> install
>>>
>>> with no errors.
>>> Verified pi demo
>>>
>>>
>>> On Wed, Nov 30, 2016 at 11:37 AM, Siyuan Hua 
>>> wrote:
>>>
>>> +1

 Verified checksums
 Verified compilation
 Verified build and test
 Verified pi demo

 On Wed, Nov 30, 2016 at 9:50 AM, Tushar Gosavi 
 wrote:

 +1
>
> Verified checksums
> Verified compilation
>
> - Tushar.
>
>
> On Wed, Nov 30, 2016 at 7:43 PM, Thomas Weise  wrote:
>
>> Can folks please verify the release.
>>
>> Thanks
>>
>> --
>> sent from mobile
>> On Nov 26, 2016 6:32 PM, "Thomas Weise"  wrote:
>>
>> Dear Community,
>>>
>>> Please vote on the following Apache Apex Malhar 3.6.0 release
>>>
>> candidate.

> This is a source release with binary artifacts published to Maven.
>>>
>>> This release is based on Apex Core 3.4 and resolves 69 issues.
>>>
>>> The release adds first iteration of SQL support via Apache Calcite,
>>>
>> an
>>>
 alternative Cassandra output operator (non-transactional, upsert
>>>
>> based),

> enrichment operator, improvements to window storage and new user
>>> documentation for several operators along with many other
>>>
>> enhancements
>>>
 and
>
>> bug fixes.
>>>
>>> List of all issues fixed: https://s.apache.org/9b0t
>>> User documentation: http://apex.apache.org/docs/malhar-3.6/
>>>
>>> Staging directory:
>>> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
>>>
>> malhar-3.6.0-RC1/
>
>> Source zip:
>>> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
>>> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.zip
>>> Source tar.gz:
>>> https://dist.apache.org/repos/dist/dev/apex/apache-apex-
>>> malhar-3.6.0-RC1/apache-apex-malhar-3.6.0-source-release.tar.gz
>>> Maven staging repository:
>>> https://repository.apache.org/content/repositories/
>>>
>> orgapacheapex-1020/

> Git source:
>>> https://git-wip-us.apache.org/repos/asf?p=apex-malhar.git;a=
>>> commit;h=refs/tags/v3.6.0-RC1
>>>   (commit: 43d524dc5d5326b8d94593901cad026528bb62a1)
>>>
>>> PGP key:
>>> http://pgp.mit.edu:11371/pks/lookup?op=vindex=t...@apache.org
>>> KEYS file:
>>> https://dist.apache.org/repos/dist/release/apex/KEYS
>>>
>>> More information at:
>>> http://apex.apache.org
>>>
>>> Please try the release and vote; vote will be open util Wed, 11/30
>>>
>> EOD
>>>
 PST
>
>> considering the US holiday weekend.
>>>
>>> [ ] +1 approve (and what verification was done)
>>> [ ] -1 disapprove (and reason why)
>>>
>>> http://www.apache.org/foundation/voting.html
>>>
>>> How to verify release candidate:
>>>
>>> http://apex.apache.org/verification.html
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>
>


Re: Malhar release 3.6

2016-11-06 Thread Chaitanya Chebolu
Hi Thomas,

   I am working on APEXMALHAR-2284 and will open a PR in couple of days.

Regards,
Chaitanya

On Sun, Nov 6, 2016 at 10:51 PM, Thomas Weise  wrote:

> Is anyone working on APEXMALHAR-2284 ?
>
> On Fri, Oct 28, 2016 at 11:00 AM, Bhupesh Chawda 
> wrote:
>
> > +1
> >
> > ~ Bhupesh
> >
> > On Fri, Oct 28, 2016 at 2:29 PM, Tushar Gosavi 
> > wrote:
> >
> > > +1
> > >
> > > - Tushar.
> > >
> > >
> > > On Fri, Oct 28, 2016 at 12:52 PM, Aniruddha Thombare
> > >  wrote:
> > > > +1
> > > >
> > > > Thanks,
> > > >
> > > > A
> > > >
> > > > _
> > > > Sent with difficulty, I mean handheld ;)
> > > >
> > > > On 28 Oct 2016 12:30 pm, "Siyuan Hua" 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Sent from my iPhone
> > > >>
> > > >> > On Oct 26, 2016, at 13:11, Thomas Weise  wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > I'm proposing another release of Malhar in November. There are 49
> > > issues
> > > >> > marked for the release, including important bug fixes, new
> > > documentation,
> > > >> > SQL support and the work for windowed operator state management:
> > > >> >
> > > >> > https://issues.apache.org/jira/issues/?jql=fixVersion%
> > > >> 20%3D%203.6.0%20AND%20project%20%3D%20APEXMALHAR%20ORDER%
> > > >> 20BY%20status%20ASC
> > > >> >
> > > >> > Currently there is at least one blocker, the join operator is
> broken
> > > >> after
> > > >> > change in managed state. It also affects the SQL feature.
> > > >> >
> > > >> > Thanks,
> > > >> > Thomas
> > > >>
> > >
> >
>


Re: S3 Output Module

2016-10-27 Thread Chaitanya Chebolu
Hi All,

  I am planning to implement the approach (2) of S3 Output Module which I
proposed in my previous email. Performance would be better as compared to
approach (1) because of uploading the blocks without saving it on HDFS.

  Please share your opinions.

Regards,
Chaitanya

On Thu, Oct 20, 2016 at 8:11 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Hi All,
>
> I am proposing the below new design for S3 Output Module using multi part
> upload feature:
>
> Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord
>
> Steps for uploading files using S3 multipart feature:
>
> =
>
>1.
>
>Initiate the upload. S3 will return upload id.
>
> Mandatory : bucket name, file path
>
> Note: Upload id is the unique identifier for multi part upload of a file.
>
>1.
>
>Upload each block using the received upload id. S3 will return ETag in
>response of each upload.
>
> Mandatory: block number, upload id
>
>1.
>
>Send the merge request by providing the upload id and list of ETags .
>
> Mandatory: upload id, file path, block ETags.
>
> Here
> <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html> is
> an example link for uploading a file using multi part feature:
>
>
> I am proposing the below two approaches for S3 output module.
>
>
> (Solution 1)
>
> S3 Output Module consists of the below two operators:
>
> 1) BlockWriter : Write the blocks into the HDFS. Once successfully written
> into HDFS, then this will emit the BlockMetadata.
>
> 2) S3MultiPartUpload: This consists of two parts:
>
>  a) If the number of blocks of a file is > 1 then upload the blocks
> using multi part feature. Otherwise, will upload the block using
> putObject().
>
>  b) Once all the blocks are successfully uploaded then will send the
> merge complete request.
>
>
> (Solution 2)
>
> DAG for this solution as follows:
>
> 1) InitateS3Upload:
>
> Input: FileMetadata
>
> Initiates the upload. This operator emits (filemetadata, uploadId) to
> S3FileMerger and (filePath, uploadId) to S3BlockUpload.
>
> 2) S3BlockUpload:
>
> Input: FileBlockMetadata, ReaderRecord
>
> Upload the blocks into S3. S3 will return ETag for each upload.
> S3BlockUpload emits (path, ETag) to S3FileMerger.
>
> 3) S3FileMerger: Sends the file merge request to S3.
>
> Pros:
>
> (1) Supports the size of file to upload is up to 5 TB.
>
> (2) Reduces the end to end latency. Because, we are not waiting to upload
> until all the blocks of a file written to HDFS.
>
> Please vote and share your thoughts on these approaches.
>
> Regards,
> Chaitanya
>
> On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
>> @ Tushar
>>
>>   S3 Copy Output Module consists of following operators:
>> 1) BlockWriter : Writes the blocks into the HDFS.
>> 2) Synchronizer: Sends trigger to downstream operator, when all the
>> blocks for a file written to HDFS.
>> 3) FileMerger: Merges all the blocks into a file and will upload the
>> merged file into S3 bucket.
>>
>> @ Ashwin
>>
>> Good suggestion. In the first iteration, I will add the proposed
>> design.
>> Multipart support will add it in the next iteration.
>>
>> Regards,
>> Chaitanya
>>
>> On Thu, Mar 24, 2016 at 2:44 AM, Ashwin Chandra Putta <
>> ashwinchand...@gmail.com> wrote:
>>
>>> +1 regarding the s3 upload functionality.
>>>
>>> However, I think we should just focus on multipart upload directly as it
>>> comes with various advantages like higher throughput, faster recovery,
>>> not
>>> needing to wait for entire file being created before uploading each part.
>>> See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusin
>>> gmpu.html
>>>
>>> Also, seems like we can do multipart upload if the file size is more than
>>> 5MB. They do recommend using multipart if the file size is more than
>>> 100MB.
>>> I am not sure if there is a hard lower limit though. See:
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
>>>
>>> This way, it seems like we don't to have to wait until a file is
>>> completely
>>> written to hdfs before performing the upload operation.
>>>
>>> Regards,
>>> Ashwin.
>>>
>>> On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <tus...@datatorrent.com>
>>> wrote:
>>>
>>> > +1 , we need this functionality.
>

Re: [jira] [Assigned] (APEXMALHAR-2303) S3 Line By Line Module

2016-10-20 Thread Chaitanya Chebolu
+1 for new approach i.e, adding the file length to FileBlockMetadata.

On Thu, Oct 20, 2016 at 12:00 PM, Tushar Gosavi 
wrote:

> I think this approach is clean compare to previous two approached you
> have mentioned. Depending on exception/non standard error code to
> determine eof is not
> good approach, as we might consider other valid exception as eof and
> not take corrective actions. Also this will avoid multiple request
> to get file length from each reader.
>
> - Tushar.
>
>
> On Thu, Oct 20, 2016 at 11:45 AM, AJAY GUPTA  wrote:
> > Hi
> >
> > Following is another approach for getting information regarding the file
> > length for S3.
> >
> > We have an existing class FileBlockMetadata which currently contains only
> > filePath. To this, we can add the fileLength field which will then get
> > passed to the module. This approach will be a lot cleaner and no
> additional
> > requests will be made to S3 in this case.
> >
> > Kindly provide your opinion on which approach would be best suited.
> >
> >
> > Regards,
> > Ajay
> >
> > On Wed, Oct 19, 2016 at 6:43 PM, AJAY GUPTA 
> wrote:
> >
> >> Hi
> >>
> >> I need suggestion of Apex dev community on the following.
> >>
> >> For the S3RecordReader approach mentioned in previous mail, I am facing
> an
> >> issue with determining the end of file.
> >> Note that the input to this operator will not contain the file size.
> >>
> >> Following approaches are possible
> >>
> >> 1) The S3 getObject() call which fetches file data within a range will
> >> throw an AmazonS3Exception if the range provided is out of bounds.
> Hence if
> >> file size is 10bytes and if I make a getObject request for 11 to 15, I
> will
> >> get this exception.
> >> Exception in thread "main" com.amazonaws.services.s3.
> model.AmazonS3Exception:
> >> The requested range is not satisfiable (Service: Amazon S3; Status Code:
> >> 416; Error Code: InvalidRange; Request ID:
> >> If this exception gets thrown, I can catch it in the code and conclude
> >> that end of file is reached.
> >>
> >> 2) For every container running this application, maintain a
> map >> filesize>. If the filesize already exists in this map, use from there.
> If
> >> not, fetch the filesize information from S3 and add it to this map.
> >>
> >> My own opinion is to go with the first approach since the number of
> calls
> >> to S3 for getting file length will be less.
> >> Kindly provide with any other approaches you can think of.
> >>
> >>
> >> Thanks,
> >> Ajay
> >>
> >>
> >>
> >> On Wed, Oct 19, 2016 at 11:53 AM, AJAY GUPTA 
> wrote:
> >>
> >>> Hi Apex Dev community,
> >>>
> >>> Kindly provide with feedback if any for the following approach for
> >>> implementing S3RecordReader.
> >>>
> >>> *S3RecordReader(delimited records)*
> >>> *Input *: BlockMetaData containing offset and length
> >>> *Expected Output :* Records in the block
> >>> *Approach : *
> >>> Similar to approach currently being followed in FSRecordReader.
> >>> 1) Fetch the block from S3. S3 block fetch size should ideally be large
> >>> enough, say 64MB to avoid unnecessary network delays.
> >>> 2) Search for newline character in the block and emit the record
> >>> 3) The last record in current block might overflow into subsequent
> block.
> >>> For this, we will get a small part of subsequent block, say 1 MB and
> search
> >>> for newline character and emit the record if newline character is
> found. We
> >>> will fetch additional 1MB blocks till a newline charater is found.
> >>> 4) We will also avoid reading the first record from all blocks (except
> >>> first block) as this set of bytes is a part of last record in previous
> >>> block.
> >>>
> >>>
> >>> Regards,
> >>> Ajay
> >>>
> >>>
> >>>
> >>> On Wed, Oct 19, 2016 at 7:31 AM, Ajay Gupta (JIRA) 
> >>> wrote:
> >>>
> 
>   [ https://issues.apache.org/jira/browse/APEXMALHAR-2303?page=c
>  om.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
>  Ajay Gupta reassigned APEXMALHAR-2303:
>  --
> 
>  Assignee: Ajay Gupta
> 
>  > S3 Line By Line Module
>  > --
>  >
>  > Key: APEXMALHAR-2303
>  > URL: https://issues.apache.org/jira
>  /browse/APEXMALHAR-2303
>  > Project: Apache Apex Malhar
>  >  Issue Type: Bug
>  >Reporter: Ajay Gupta
>  >Assignee: Ajay Gupta
>  >   Original Estimate: 336h
>  >  Remaining Estimate: 336h
>  >
>  > This is a new module which will consist of 2 operators
>  > 1) File Splitter -- Already existing in Malhar library
>  > 2) S3RecordReader -- Read a file from S3 and output the records
>  (delimited or fixed width)
> 
> 
> 
>  --
>  This message was sent by Atlassian JIRA
>  (v6.3.4#6332)
> 
> 

Re: S3 Output Module

2016-10-20 Thread Chaitanya Chebolu
Hi All,

I am proposing the below new design for S3 Output Module using multi part
upload feature:

Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord

Steps for uploading files using S3 multipart feature:

=

   1.

   Initiate the upload. S3 will return upload id.

Mandatory : bucket name, file path

Note: Upload id is the unique identifier for multi part upload of a file.

   1.

   Upload each block using the received upload id. S3 will return ETag in
   response of each upload.

Mandatory: block number, upload id

   1.

   Send the merge request by providing the upload id and list of ETags .

Mandatory: upload id, file path, block ETags.

Here <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html>
is an example link for uploading a file using multi part feature:


I am proposing the below two approaches for S3 output module.


(Solution 1)

S3 Output Module consists of the below two operators:

1) BlockWriter : Write the blocks into the HDFS. Once successfully written
into HDFS, then this will emit the BlockMetadata.

2) S3MultiPartUpload: This consists of two parts:

 a) If the number of blocks of a file is > 1 then upload the blocks
using multi part feature. Otherwise, will upload the block using
putObject().

 b) Once all the blocks are successfully uploaded then will send the
merge complete request.


(Solution 2)

DAG for this solution as follows:

1) InitateS3Upload:

Input: FileMetadata

Initiates the upload. This operator emits (filemetadata, uploadId) to
S3FileMerger and (filePath, uploadId) to S3BlockUpload.

2) S3BlockUpload:

Input: FileBlockMetadata, ReaderRecord

Upload the blocks into S3. S3 will return ETag for each upload.
S3BlockUpload emits (path, ETag) to S3FileMerger.

3) S3FileMerger: Sends the file merge request to S3.

Pros:

(1) Supports the size of file to upload is up to 5 TB.

(2) Reduces the end to end latency. Because, we are not waiting to upload
until all the blocks of a file written to HDFS.

Please vote and share your thoughts on these approaches.

Regards,
Chaitanya

On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> @ Tushar
>
>   S3 Copy Output Module consists of following operators:
> 1) BlockWriter : Writes the blocks into the HDFS.
> 2) Synchronizer: Sends trigger to downstream operator, when all the blocks
> for a file written to HDFS.
> 3) FileMerger: Merges all the blocks into a file and will upload the
> merged file into S3 bucket.
>
> @ Ashwin
>
> Good suggestion. In the first iteration, I will add the proposed
> design.
> Multipart support will add it in the next iteration.
>
> Regards,
> Chaitanya
>
> On Thu, Mar 24, 2016 at 2:44 AM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
>> +1 regarding the s3 upload functionality.
>>
>> However, I think we should just focus on multipart upload directly as it
>> comes with various advantages like higher throughput, faster recovery, not
>> needing to wait for entire file being created before uploading each part.
>> See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusin
>> gmpu.html
>>
>> Also, seems like we can do multipart upload if the file size is more than
>> 5MB. They do recommend using multipart if the file size is more than
>> 100MB.
>> I am not sure if there is a hard lower limit though. See:
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
>>
>> This way, it seems like we don't to have to wait until a file is
>> completely
>> written to hdfs before performing the upload operation.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <tus...@datatorrent.com>
>> wrote:
>>
>> > +1 , we need this functionality.
>> >
>> > Is it going to be a single operator or multiple operators? If multiple
>> > operators, then can you explain what functionality each operator will
>> > provide?
>> >
>> >
>> > Regards,
>> > -Tushar.
>> >
>> >
>> > On Wed, Mar 23, 2016 at 5:01 PM, Yogi Devendra <yogideven...@apache.org
>> >
>> > wrote:
>> >
>> > > Writing to S3 is a common use-case for applications.
>> > > This module will be definitely helpful.
>> > >
>> > > +1 for adding this module.
>> > >
>> > >
>> > > ~ Yogi
>> > >
>> > > On 22 March 2016 at 13:52, Chaitanya Chebolu <
>> chaita...@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > >   I am proposing S3 output copy 

Outer Join

2016-10-05 Thread Chaitanya Chebolu
Hi David,

  I am working on Outer Join Using Managed State. This will support event
time as well as tuple process time. This covers Left Outer, Right Outer,
Full Outer Join. I am planning to start design discussion on dev@apex.

  But, I have seen a PR(#414
) for windowed Join
Operator. I would like to know whether the outer join is also included in
this or not.

  I don't have much idea about this PR. So, I would like to know whether my
task is overlapping the Windowed Join Operator or not.

Regards,
Chaitanya


Re: Kudu store operators

2016-10-03 Thread Chaitanya Chebolu
+1

Regards,
Chaitanya

On Mon, Oct 3, 2016 at 6:01 PM, Sanjay Pujare 
wrote:

> +1
>
> On Oct 3, 2016 5:33 PM, "Sandeep Deshmukh" 
> wrote:
>
> > +1
> >
> > Regards,
> > Sandeep
> >
> > On Mon, Oct 3, 2016 at 10:16 AM, Tushar Gosavi 
> > wrote:
> >
> > > +1, It will be great to have this operator.
> > >
> > > - Tushar.
> > >
> > > On Mon, Oct 3, 2016 at 8:15 AM, Chinmay Kolhatkar
> > >  wrote:
> > > > +1.
> > > >
> > > > - Chinmay.
> > > >
> > > > On 3 Oct 2016 7:25 a.m., "Amol Kekre"  wrote:
> > > >
> > > >> Ananth,
> > > >> This would be great to have. +1
> > > >>
> > > >> Thks
> > > >> Amol
> > > >>
> > > >> On Sun, Oct 2, 2016 at 8:38 AM, Munagala Ramanath <
> > r...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > Kudu looks impressive from the overview, though it seems to still
> be
> > > >> > maturing.
> > > >> >
> > > >> > Ram
> > > >> >
> > > >> >
> > > >> > On Sat, Oct 1, 2016 at 11:42 PM, ananth 
> > > wrote:
> > > >> >
> > > >> > > Hello All,
> > > >> > >
> > > >> > > I was wondering if it would be worthwhile for the community to
> > > consider
> > > >> > > support for Apache Kudu as a store ( as a contrib operator
> inside
> > > >> Apache
> > > >> > > Malhar ) .
> > > >> > >
> > > >> > > Here are some benefits I see:
> > > >> > >
> > > >> > > 1. Kudu is just declared 1.0 and has just been declared
> production
> > > >> ready.
> > > >> > > 2. Kudu as a store might a good a fit for many architectures in
> > the
> > > >> > >years to come because of its capabilities to provide
> mutability
> > > of
> > > >> > >data ( unlike HDFS ) and optimized storage formats for scans.
> > > >> > > 3. It seems to also withstand high-throughput write patterns
> which
> > > >> > >makes it a stable sink for Apex workflows which operate at
> very
> > > high
> > > >> > >volumes.
> > > >> > >
> > > >> > >
> > > >> > > Here are some links
> > > >> > >
> > > >> > >  *  From the recent Strata conference
> > > >> > >https://kudu.apache.org/2016/09/26/strata-nyc-kudu-talks.
> html
> > > >> > >  * https://kudu.apache.org/overview.html
> > > >> > >
> > > >> > > I can implement this operator if the community feels it is worth
> > > adding
> > > >> > it
> > > >> > > to our code base. If so, could someone please assign the JIRA to
> > > me. I
> > > >> > have
> > > >> > > created this JIRA to track this :
> https://issues.apache.org/jira
> > > >> > > /browse/APEXMALHAR-2278
> > > >> > >
> > > >> > >
> > > >> > > Regards,
> > > >> > >
> > > >> > > Ananth
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>


Re: [ANNOUNCE] New Apache Apex PMC Member: Chandni Singh

2016-09-12 Thread Chaitanya Chebolu
Congrats Chandni!

Regards,
Chaitanya

On Tue, Sep 13, 2016 at 7:08 AM, Dongming Liang 
wrote:

> Congrats, Chandni!
>
> Thanks,
> - Dongming
>
> Dongming LIANG
> 
> dongming.li...@gmail.com
>
> On Mon, Sep 12, 2016 at 11:28 AM, Pradeep A. Dalvi 
> wrote:
>
> > Congratulations, Chandni!
> >
> > Thanks,
> > Pradeep A. Dalvi
> >
> > On Mon, Sep 12, 2016 at 9:33 AM, Thomas Weise  wrote:
> >
> > > The Apache Apex PMC is pleased to announce that Chandni Singh is now a
> > PMC
> > > member. We appreciate all her contributions to the project so far, and
> > are
> > > looking forward to more.
> > >
> > > Congrats Chandni!
> > > Thomas, for the Apache Apex PMC.
> > >
> >
>


Re: GenericFileOutputOpeator doesn't work for all hadoop file systems

2016-09-02 Thread Chaitanya Chebolu
Hi All,

  Thanks Priyanka and Yogi for your suggestions.

  @Yogi: 1st option which you suggested is not feasible because in the
later versions of Hadoop library may support append operation. I feel 2nd
is the best option.

  If there are no comments/suggestions from community, I will go through
the 2nd option which yogi is suggested.

Regards,
Chaitanya

On Fri, Aug 26, 2016 at 12:21 PM, Yogi Devendra <yogideven...@apache.org>
wrote:

> I propose alternate approach to than the 3 options mentioned above:
>
> In AbstractFileOutputOperator we can introduce one flag saying
> isFileSystemAppendSupported.
> This flag should be set based on the filePath in setup or activate method.
>
> It can be done in 2 ways:
> 1. Adding if else rules based on filesystem (e.g. true for HDFS, false for
> S3 etc.)
> 2. Attempt for append to temp file and catch the exception.
>
> This flag will decide openStream behavior. Advantage here is that the flow
> is predetermined rather than based on the exception handling.
>
>
> ~ Yogi
>
> On 25 August 2016 at 11:17, Priyanka Gugale <priya...@datatorrent.com>
> wrote:
>
> > I would suggest, we override "openStream" in GenericFileOutputOpeator, as
> > suggested in option 2 and then handle "append" in different way for FS
> > which doesn't support append. Or else create concrete classes for all
> file
> > systems which don't support append and override the required functions.
> >
> > -1 for modifying Abstract class to take care of unsupported operations.
> >
> > -Priyanka
> >
> > On Wed, Aug 24, 2016 at 6:21 PM, Chaitanya Chebolu <
> > chaita...@datatorrent.com> wrote:
> >
> > > Hi All,
> > >
> > > GenericFileOutputOpeator which is in Malhar repository works only
> for
> > > few file systems. GenericFileOutputOpeator is extended from
> > > AbstractFileOutputOperator.
> > >
> > > Reason: openStream() method which is in AbstractFileOutputOperator
> calls
> > > append operation. But, all the file systems doesn't support append
> > > operation. Some of the file systems which are not supported append()
> > > operation are FTP, S3.
> > >
> > >   If the GenericFileOutputOpeator used for file systems which are not
> > > supported append() operation and operator goes down & comes back then
> > file
> > > system throws exception "Not Supported".
> > >
> > > Solution: Following method needs to be called instead of fs.append():
> > >
> > >
> > > protected FSDataOutputStream openStreamForNonAppendFS(Path filepath)
> > throws
> > > IOException{
> > >
> > > Path appendTmpFile = new Path(filepath + “_APPENDING”);
> > >
> > > rename(filepath, appendTmpFile);
> > >
> > > FSDataInputStream fsIn = fs.open(appendTmpFile);
> > >
> > > FSDataOutputStream fsOut = fs.create(filepath);
> > >
> > > IOUtils.copy(fsIn, fsOut);
> > >
> > > flush(fsOut);
> > >
> > > fs.delete(appendTmpFile);
> > >
> > > return fsOut;
> > >
> > > }
> > >
> > >
> > > Below are the options to fix this issue.
> > >
> > > (1) Fix it in AbstractFileOutputOperator - Catch the "Not Supported"
> > > exception and then call the openStreamForNonAppendFS() method.
> > >
> > > (2) Fix it in GenericFileOutputOpeator (Same as approach (1))
> > >
> > > (3) Create a new operator which extends from AbstractFileOutputOperator
> > and
> > > override the openStream() method. This new operator could be used only
> > for
> > > file systems which are not supported append operation.
> > >
> > > Please share your thoughts and vote on above approaches.
> > >
> > > Regards,
> > > Chaitanya
> > >
> >
>


Re: Malhar 3.5.0 release

2016-08-25 Thread Chaitanya Chebolu
Hi Thomas,

   I will fix APEXMALHAR-2134.

Regards,
Chaitanya

On Thu, Aug 25, 2016 at 9:13 PM, Thomas Weise <tho...@datatorrent.com>
wrote:

> We are almost ready to get the RC out.
>
> Can someone fix APEXMALHAR-2134 for the release?
>
>
> On Wed, Aug 17, 2016 at 12:56 PM, Thomas Weise <tho...@datatorrent.com>
> wrote:
>
> > Friendly reminder.. There are still a few unresolved tickets for 3.5:
> >
> > https://issues.apache.org/jira/issues/?jql=fixVersion%
> > 20%3D%203.5.0%20AND%20project%20%3D%20APEXMALHAR%20AND%
> 20resolution%20%3D%
> > 20Unresolved%20ORDER%20BY%20priority%20DESC
> >
> > Please remove the fix version unless you expect to complete the work in
> > the next few days.
> >
> > And we need to complete the PR for high level API:
> >
> > https://issues.apache.org/jira/browse/APEXMALHAR-2142
> >
> > Thanks,
> > Thomas
> >
> >
> > On Fri, Aug 5, 2016 at 2:12 AM, Chaitanya Chebolu <
> > chaita...@datatorrent.com> wrote:
> >
> >> APEXMALHAR-2100. PR is open: https://github.com/apache/apex
> >> -malhar/pull/330
> >>
> >> Regards,
> >> Chaitanya
> >>
> >> On Fri, Aug 5, 2016 at 1:10 PM, Priyanka Gugale <
> priya...@datatorrent.com
> >> >
> >> wrote:
> >>
> >> > APEXMALHAR-2171
> >> > PR is open: https://github.com/apache/apex-malhar/pull/358
> >> >
> >> > -Priyanka
> >> >
> >> > On Fri, Aug 5, 2016 at 12:32 PM, Yogi Devendra <
> >> > devendra.vyavah...@gmail.com
> >> > > wrote:
> >> >
> >> > > Fix for APEXMALHAR-2176
> >> > > PR open: https://github.com/apache/apex-malhar/pull/361
> >> > >
> >> > > ~ Yogi
> >> > >
> >> > > On 1 August 2016 at 11:59, Chinmay Kolhatkar <
> chin...@datatorrent.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > APEXMALHAR-2128 is already done. Somehow the status did not go to
> >> > > resolved.
> >> > > > Changed it now.
> >> > > >
> >> > > > On Fri, Jul 29, 2016 at 1:53 PM, Yogi Devendra <
> >> > > > devendra.vyavah...@gmail.com
> >> > > > > wrote:
> >> > > >
> >> > > > > APEXMALHAR-2116
> >> > > > > https://github.com/apache/apex-malhar/pull/326
> >> > > > >
> >> > > > > @Ram
> >> > > > > Your review comments are incorporated. Could you please have a
> >> look
> >> > at
> >> > > > > this?
> >> > > > >
> >> > > > > ~ Yogi
> >> > > > >
> >> > > > > On 29 July 2016 at 13:49, Thomas Weise <tho...@datatorrent.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I would suggest to target the 3.5.0 release with an RC out
> >> perhaps
> >> > 2
> >> > > > > weeks
> >> > > > > > from now, as there is a good amount of work that we should
> make
> >> > > > available
> >> > > > > > to the users:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > https://issues.apache.org/jira/browse/APEXMALHAR/
> >> > > > fixforversion/12335815/?selectedTab=com.atlassian.
> >> > > > jira.jira-projects-plugin:version-issues-panel
> >> > > > > >
> >> > > > > > I also think that the following should go in:
> >> > > > > >
> >> > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2128
> >> > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2142
> >> > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2063
> >> > > > > >
> >> > > > > > Any other candidates?
> >> > > > > >
> >> > > > > > There are quite a few other PRs open:
> >> > > > > >
> >> > > > > > https://github.com/apache/apex-malhar/pulls
> >> > > > > >
> >> > > > > > Would be good to complete some more reviews for this release.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Thomas
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


GenericFileOutputOpeator doesn't work for all hadoop file systems

2016-08-24 Thread Chaitanya Chebolu
Hi All,

GenericFileOutputOpeator which is in Malhar repository works only for
few file systems. GenericFileOutputOpeator is extended from
AbstractFileOutputOperator.

Reason: openStream() method which is in AbstractFileOutputOperator calls
append operation. But, all the file systems doesn't support append
operation. Some of the file systems which are not supported append()
operation are FTP, S3.

  If the GenericFileOutputOpeator used for file systems which are not
supported append() operation and operator goes down & comes back then file
system throws exception "Not Supported".

Solution: Following method needs to be called instead of fs.append():


protected FSDataOutputStream openStreamForNonAppendFS(Path filepath) throws
IOException{

Path appendTmpFile = new Path(filepath + “_APPENDING”);

rename(filepath, appendTmpFile);

FSDataInputStream fsIn = fs.open(appendTmpFile);

FSDataOutputStream fsOut = fs.create(filepath);

IOUtils.copy(fsIn, fsOut);

flush(fsOut);

fs.delete(appendTmpFile);

return fsOut;

}


Below are the options to fix this issue.

(1) Fix it in AbstractFileOutputOperator - Catch the "Not Supported"
exception and then call the openStreamForNonAppendFS() method.

(2) Fix it in GenericFileOutputOpeator (Same as approach (1))

(3) Create a new operator which extends from AbstractFileOutputOperator and
override the openStream() method. This new operator could be used only for
file systems which are not supported append operation.

Please share your thoughts and vote on above approaches.

Regards,
Chaitanya


Re: Malhar 3.5.0 release

2016-08-05 Thread Chaitanya Chebolu
APEXMALHAR-2100. PR is open: https://github.com/apache/apex-malhar/pull/330

Regards,
Chaitanya

On Fri, Aug 5, 2016 at 1:10 PM, Priyanka Gugale 
wrote:

> APEXMALHAR-2171
> PR is open: https://github.com/apache/apex-malhar/pull/358
>
> -Priyanka
>
> On Fri, Aug 5, 2016 at 12:32 PM, Yogi Devendra <
> devendra.vyavah...@gmail.com
> > wrote:
>
> > Fix for APEXMALHAR-2176
> > PR open: https://github.com/apache/apex-malhar/pull/361
> >
> > ~ Yogi
> >
> > On 1 August 2016 at 11:59, Chinmay Kolhatkar 
> > wrote:
> >
> > > APEXMALHAR-2128 is already done. Somehow the status did not go to
> > resolved.
> > > Changed it now.
> > >
> > > On Fri, Jul 29, 2016 at 1:53 PM, Yogi Devendra <
> > > devendra.vyavah...@gmail.com
> > > > wrote:
> > >
> > > > APEXMALHAR-2116
> > > > https://github.com/apache/apex-malhar/pull/326
> > > >
> > > > @Ram
> > > > Your review comments are incorporated. Could you please have a look
> at
> > > > this?
> > > >
> > > > ~ Yogi
> > > >
> > > > On 29 July 2016 at 13:49, Thomas Weise 
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would suggest to target the 3.5.0 release with an RC out perhaps
> 2
> > > > weeks
> > > > > from now, as there is a good amount of work that we should make
> > > available
> > > > > to the users:
> > > > >
> > > > >
> > > > >
> > > > https://issues.apache.org/jira/browse/APEXMALHAR/
> > > fixforversion/12335815/?selectedTab=com.atlassian.
> > > jira.jira-projects-plugin:version-issues-panel
> > > > >
> > > > > I also think that the following should go in:
> > > > >
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2128
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2142
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2063
> > > > >
> > > > > Any other candidates?
> > > > >
> > > > > There are quite a few other PRs open:
> > > > >
> > > > > https://github.com/apache/apex-malhar/pulls
> > > > >
> > > > > Would be good to complete some more reviews for this release.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > >
> > >
> >
>


Re: Dynamic partition is not working in Kafka Input Operator

2016-07-19 Thread Chaitanya Chebolu
Hi Sandesh,

No.
Kafka Input Operator(0.8 version) supports dynamic partition based on
Kafka partitions.
Created a JIRA (APEXCORE-494
<https://issues.apache.org/jira/browse/APEXCORE-494>) for tracking this
issue.

Regards,
Chaitanya

On Tue, Jul 19, 2016 at 9:47 AM, Sandesh Hegde <sand...@datatorrent.com>
wrote:

> Was this resolved?
>
> My understanding is that, Kafka Input operator doesn't support the changes
> in Kafka partitions after the initial launch.
>
> On Mon, Jul 18, 2016 at 1:54 AM Chaitanya Chebolu <
> chaita...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> >I am facing dynamic partition issues in 0.8 version of Kafka Input
> > Operator. My application has the following DAG:
> >
> >KafkaSinglePortStringInputOperator(Input) ->
> > ConsoleOutputOperator(Output)
> >
> >I launched the application with below configuration:
> > Kafka topic created with single partition and replication factor as 1.
> > Partition Strategy: ONE_TO_ONE
> >
> >Launched the application successfully. After some time, I increased
> the
> > topic partitions to 2. After re-partition, the window of down stream
> > operator is not moving. By looking into the app Physical DAG, it looks
> like
> > there is an issue in construction of Physical DAG after re-partition.
> >
> > Please let me know if any one observed the same behavior. Do we have JIRA
> > for tracking this issue.
> > I am attaching some of the screenshots of this application.
> >
> > Regards,
> > Chaitanya
> >
> >
>


new method emitMessage() in Kafka Operator

2016-06-13 Thread Chaitanya Chebolu
Hi All,

  I am proposing to add new API in 0.8 version of
AbstractKafkaInputOperator:
void emitMessage(KafkaMessage message).

  "message" has details like offset, kafkapartition, value of the message.

By adding this, users have more control over the message.  Callback of this
method would be in emitTuples() API.

   To maintain backward compatibility, definition of this new method as
below:

void emitMessage(KafkaMessage message)
{
emitTuple(message.msg);
}

   Please share your thoughts on this approach.

Regards,
Chaitanya


Re: Inner Join Operator Using Managed State

2016-05-24 Thread Chaitanya Chebolu
Thanks Chandni and Tim for the valuable inputs.
I will use SpillableComplexComponent as common abstraction.

Created a JIRA for this task:
https://issues.apache.org/jira/browse/APEXMALHAR-2100.

Regards,
Chaitanya

On Fri, May 20, 2016 at 11:39 AM, Timothy Farkas <
timothytiborfar...@gmail.com> wrote:

> Hi Chandni,
>
> That is correct. I will provide some explanation about the motivation and
> usage of that interface:
>
> Goals:
>
> The goal of SpillableComplexComponent is to provide an interface for
> creating Spillable datastructures. It is essentially the interface for a
> factory class which produces Spillable datastructures. By having a factory
> interface it allows different backends to be plugged into operators very
> easily.
>
> Going into more detail for your use case SpillableComplexComponent has two
> factory methods newSpillableByteMap and newSpillableByteArrayListMultimap.
> These two methods are factory methods which return an implementation of the
> SpillabeByteMap and SpillableArrayListMultimap interfaces.
>
> Usage:
>
> Setting backend on an operator:
>
> myOperator.setStore(new InMemorySpillableComplexComponent())
> //myOperator.setStore(new ManagedStateSpillableComplexComponent())
> //myOperator.setStore(new HbaseSpillableComplexComponent())
>
> Using the factory in an operator
>
> setup() {
>map = store.newSpillableByteMap()
> }
>
> As you can see you can set the factory on an operator, then the operator
> can use the factory to create a Spillable datastructure. The operator is
> agnostic to the store which manages the data for Spillable datastructures.
> If you want the data to be stored in managed state simply set a
> ManagedState implementation of SpillableComplexComponent, if you want the
> data to be stored in Cassandra simply set a Cassandra implementation of
> SpillableComplexComponent on the operator. The code using the spillable
> datastructures is independent of the backend used to store the data with
> this design
>
> Thanks,
> Tim
>
> On Thu, May 19, 2016 at 9:29 PM, Chandni Singh <singh.chan...@gmail.com>
> wrote:
>
> > Hey Tim,
> >
> > As I understand for Join operator, there needs to be a common abstract
> for
> > a SpillableMap and SpillableArrayListMultipmap.
> >
> > I suggested using SpillableComplexComponent. Is this correct?
> >
> > Thanks,
> > Chandni
> >
> > On Wed, May 18, 2016 at 1:34 AM, Chandni Singh <singh.chan...@gmail.com>
> > wrote:
> >
> > > Hi Chaitanya,
> > >
> > > I am NOT suggesting that you use the interface TimeSliceBucketedState.
> > >
> > > I don't see the need of having a JoinStore abstraction.
> > >
> > > There will be SpillableArrayListMultimap implementation on which you
> can
> > > set "ManagedTimeUnifiedStateImpl" as the persistent store.
> > > This API of SpillableArrayListMultimap is sufficient for the use case.
> > >
> > > You can directly use this implementation of SpillableArrayListMultimap
> in
> > > the Join operator.  Here is a simple example:
> > >
> > > class InnerJoinOperator
> > > {
> > >SpillableArrayListMultiMap stream1Data = new
> > > SpillableArrayListMultiMap(ManagedTimeUnifiedStateImp);
> > >
> > >port1.process (tuple) {
> > >stream1Data.put(tuple.getKey(), tuple.getVal());
> > >}
> > > }
> > >
> > >
> > > Chandni
> > >
> > >
> > >
> > >
> > > On Wed, May 18, 2016 at 1:00 AM, Chaitanya Chebolu <
> > > chaita...@datatorrent.com> wrote:
> > >
> > >> Hi Chandni,
> > >>
> > >>I think you are suggesting about interface
> > "TimeSlicedBucketedState". I
> > >> feel this is tightly coupled with the Managed State.
> > >>In "TimeSlicedBucketedState" abstraction, bucketId parameter
> relates
> > to
> > >> the Managed State and this is not needed for join operator.
> > >>
> > >> Regards,
> > >> Chaitanya
> > >>
> > >> On Wed, May 18, 2016 at 12:51 PM, Chandni Singh <
> > singh.chan...@gmail.com>
> > >> wrote:
> > >>
> > >> > Chaitanya,
> > >> >
> > >> > SpillableArrayListMultimap will provide gives you similar
> abstraction.
> > >> >
> > >> > Why do we need  another abstraction "Join Store" ?
> > >> >
> > >> > Chandni
> > >&