[jira] [Resolved] (APEXMALHAR-2369) S3 output module for tuple based output

2017-02-25 Thread Chaitanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaitanya resolved APEXMALHAR-2369.
---
   Resolution: Fixed
Fix Version/s: 3.7.0

> S3 output module for tuple based output
> ---
>
> Key: APEXMALHAR-2369
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2369
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
> Fix For: 3.7.0
>
>
> Currently, S3 output is available using S3OutputModule which is restricted 
> for copying files from FileSystem to S3. Use-cases where all the 
> tuples/records to be written to S3 cannot use this approach. Thus, we need to 
> develop alternative module which would take care of writing tuples on S3. 
> Design: 
> Sending separate requests to S3 for each tuple would be too expensive. This 
> module can choose to write tuples to HDFS. And then upload HDFS files to S3. 
> This would lead to some end-to-end latency. But, it should OK for the S3 
> output case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2369) S3 output module for tuple based output

2017-02-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884302#comment-15884302
 ] 

ASF GitHub Bot commented on APEXMALHAR-2369:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/542


> S3 output module for tuple based output
> ---
>
> Key: APEXMALHAR-2369
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2369
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
> Fix For: 3.7.0
>
>
> Currently, S3 output is available using S3OutputModule which is restricted 
> for copying files from FileSystem to S3. Use-cases where all the 
> tuples/records to be written to S3 cannot use this approach. Thus, we need to 
> develop alternative module which would take care of writing tuples on S3. 
> Design: 
> Sending separate requests to S3 for each tuple would be too expensive. This 
> module can choose to write tuples to HDFS. And then upload HDFS files to S3. 
> This would lead to some end-to-end latency. But, it should OK for the S3 
> output case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2369) S3 output module for tuple based output

2017-01-24 Thread Yogi Devendra (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837265#comment-15837265
 ] 

Yogi Devendra commented on APEXMALHAR-2369:
---

[~chaithu] Could you please review this?

> S3 output module for tuple based output
> ---
>
> Key: APEXMALHAR-2369
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2369
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>
> Currently, S3 output is available using S3OutputModule which is restricted 
> for copying files from FileSystem to S3. Use-cases where all the 
> tuples/records to be written to S3 cannot use this approach. Thus, we need to 
> develop alternative module which would take care of writing tuples on S3. 
> Design: 
> Sending separate requests to S3 for each tuple would be too expensive. This 
> module can choose to write tuples to HDFS. And then upload HDFS files to S3. 
> This would lead to some end-to-end latency. But, it should OK for the S3 
> output case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (APEXMALHAR-2369) S3 output module for tuple based output

2016-12-14 Thread Yogi Devendra (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yogi Devendra updated APEXMALHAR-2369:
--
Description: 
Currently, S3 output is available using S3OutputModule which is restricted for 
copying files from FileSystem to S3. Use-cases where all the tuples/records to 
be written to S3 cannot use this approach. Thus, we need to develop alternative 
module which would take care of writing tuples on S3. 

Design: 
Sending separate requests to S3 for each tuple would be too expensive. This 
module can choose to write tuples to HDFS. And then upload HDFS files to S3. 
This would lead to some end-to-end latency. But, it should OK for the S3 output 
case.

  was:Currently, S3 output is available using S3OutputModule which is 
restricted for copying files from FileSystem to S3. Use-cases where all the 
tuples/records to be written to S3 cannot use this approach. Thus, we need to 
develop alternative module which would take care of writing tuples on S3. 
Design: Sending separate requests to S3 for each tuple would be too expensive. 
This module can choose to write tuples to HDFS. And then upload HDFS files to 
S3. This would lead to some end-to-end latency. But, it should OK for the S3 
output case.


> S3 output module for tuple based output
> ---
>
> Key: APEXMALHAR-2369
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2369
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Yogi Devendra
>Assignee: Yogi Devendra
>
> Currently, S3 output is available using S3OutputModule which is restricted 
> for copying files from FileSystem to S3. Use-cases where all the 
> tuples/records to be written to S3 cannot use this approach. Thus, we need to 
> develop alternative module which would take care of writing tuples on S3. 
> Design: 
> Sending separate requests to S3 for each tuple would be too expensive. This 
> module can choose to write tuples to HDFS. And then upload HDFS files to S3. 
> This would lead to some end-to-end latency. But, it should OK for the S3 
> output case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (APEXMALHAR-2369) S3 output module for tuple based output

2016-12-14 Thread Yogi Devendra (JIRA)
Yogi Devendra created APEXMALHAR-2369:
-

 Summary: S3 output module for tuple based output
 Key: APEXMALHAR-2369
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2369
 Project: Apache Apex Malhar
  Issue Type: Task
Reporter: Yogi Devendra
Assignee: Yogi Devendra


Currently, S3 output is available using S3OutputModule which is restricted for 
copying files from FileSystem to S3. Use-cases where all the tuples/records to 
be written to S3 cannot use this approach. Thus, we need to develop alternative 
module which would take care of writing tuples on S3. Design: Sending separate 
requests to S3 for each tuple would be too expensive. This module can choose to 
write tuples to HDFS. And then upload HDFS files to S3. This would lead to some 
end-to-end latency. But, it should OK for the S3 output case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread Bhupesh Chawda (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupesh Chawda resolved APEXMALHAR-2022.

   Resolution: Fixed
Fix Version/s: 3.7.0

> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
> Fix For: 3.7.0
>
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710953#comment-15710953
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708479#comment-15708479
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708481#comment-15708481
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya <chai...@apache.org>
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module

----


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya <chai...@apache.org>
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708392#comment-15708392
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya <chai...@apache.org>
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708393#comment-15708393
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit a5e8fa3facca750f5d7402c2c29e7cbabe53bd9e
Author: chaitanya <chai...@apache.org>
Date:   2016-11-30T05:17:36Z

APEXMALHAR-2022 Development of S3 Output Module

----


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-30 Thread chaithu14
Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701535#comment-15701535
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-28 Thread chaithu14
GitHub user chaithu14 reopened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit 6ab63bd92dc93ac4ddb3d6ce70d310cfa9322f82
Author: chaitanya <chai...@apache.org>
Date:   2016-11-28T08:54:54Z

APEXMALHAR-2022 Development of S3 Output ModuleAPEXMALHAR-2022 Development 
of S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-28 Thread chaithu14
Github user chaithu14 closed the pull request at:

https://github.com/apache/apex-malhar/pull/483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-25 Thread Hitesh Kapoor (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Kapoor updated APEXMALHAR-2022:
--
Assignee: Chaitanya  (was: Hitesh Kapoor)

> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Chaitanya
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2022) S3 Output Module for file copy

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636249#comment-15636249
 ] 

ASF GitHub Bot commented on APEXMALHAR-2022:


GitHub user chaithu14 opened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit 24fb5638ecb6f0e45edb5d5f640b220ad9372fcc
Author: chaitanya <chai...@apache.org>
Date:   2016-11-04T12:48:21Z

APEXMALHAR-2022 Developed S3 Output Module

----


> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Hitesh Kapoor
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] apex-malhar pull request #483: APEXMALHAR-2022 Developed S3 Output Module

2016-11-04 Thread chaithu14
GitHub user chaithu14 opened a pull request:

https://github.com/apache/apex-malhar/pull/483

APEXMALHAR-2022 Developed S3 Output Module



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chaithu14/incubator-apex-malhar 
APEXMALHAR-2022-S3Output-multiPart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #483


commit 24fb5638ecb6f0e45edb5d5f640b220ad9372fcc
Author: chaitanya <chai...@apache.org>
Date:   2016-11-04T12:48:21Z

APEXMALHAR-2022 Developed S3 Output Module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: S3 Output Module

2016-10-27 Thread Mohit Jotwani
+1 for Solution 2

Regards,
Mohit
On 27 Oct 2016 2:02 p.m., "Sandeep Deshmukh" <sand...@datatorrent.com>
wrote:

> +1
>
> Regards,
> Sandeep
>
> On Thu, Oct 27, 2016 at 1:53 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
> > Hi All,
> >
> >   I am planning to implement the approach (2) of S3 Output Module which I
> > proposed in my previous email. Performance would be better as compared to
> > approach (1) because of uploading the blocks without saving it on HDFS.
> >
> >   Please share your opinions.
> >
> > Regards,
> > Chaitanya
> >
> > On Thu, Oct 20, 2016 at 8:11 PM, Chaitanya Chebolu <
> > chaita...@datatorrent.com> wrote:
> >
> > > Hi All,
> > >
> > > I am proposing the below new design for S3 Output Module using multi
> part
> > > upload feature:
> > >
> > > Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord
> > >
> > > Steps for uploading files using S3 multipart feature:
> > >
> > > =
> > >
> > >1.
> > >
> > >Initiate the upload. S3 will return upload id.
> > >
> > > Mandatory : bucket name, file path
> > >
> > > Note: Upload id is the unique identifier for multi part upload of a
> file.
> > >
> > >1.
> > >
> > >Upload each block using the received upload id. S3 will return ETag
> in
> > >response of each upload.
> > >
> > > Mandatory: block number, upload id
> > >
> > >1.
> > >
> > >Send the merge request by providing the upload id and list of ETags
> .
> > >
> > > Mandatory: upload id, file path, block ETags.
> > >
> > > Here
> > > <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html>
> > is
> > > an example link for uploading a file using multi part feature:
> > >
> > >
> > > I am proposing the below two approaches for S3 output module.
> > >
> > >
> > > (Solution 1)
> > >
> > > S3 Output Module consists of the below two operators:
> > >
> > > 1) BlockWriter : Write the blocks into the HDFS. Once successfully
> > written
> > > into HDFS, then this will emit the BlockMetadata.
> > >
> > > 2) S3MultiPartUpload: This consists of two parts:
> > >
> > >  a) If the number of blocks of a file is > 1 then upload the blocks
> > > using multi part feature. Otherwise, will upload the block using
> > > putObject().
> > >
> > >  b) Once all the blocks are successfully uploaded then will send
> the
> > > merge complete request.
> > >
> > >
> > > (Solution 2)
> > >
> > > DAG for this solution as follows:
> > >
> > > 1) InitateS3Upload:
> > >
> > > Input: FileMetadata
> > >
> > > Initiates the upload. This operator emits (filemetadata, uploadId) to
> > > S3FileMerger and (filePath, uploadId) to S3BlockUpload.
> > >
> > > 2) S3BlockUpload:
> > >
> > > Input: FileBlockMetadata, ReaderRecord
> > >
> > > Upload the blocks into S3. S3 will return ETag for each upload.
> > > S3BlockUpload emits (path, ETag) to S3FileMerger.
> > >
> > > 3) S3FileMerger: Sends the file merge request to S3.
> > >
> > > Pros:
> > >
> > > (1) Supports the size of file to upload is up to 5 TB.
> > >
> > > (2) Reduces the end to end latency. Because, we are not waiting to
> upload
> > > until all the blocks of a file written to HDFS.
> > >
> > > Please vote and share your thoughts on these approaches.
> > >
> > > Regards,
> > > Chaitanya
> > >
> > > On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
> > > chaita...@datatorrent.com> wrote:
> > >
> > >> @ Tushar
> > >>
> > >>   S3 Copy Output Module consists of following operators:
> > >> 1) BlockWriter : Writes the blocks into the HDFS.
> > >> 2) Synchronizer: Sends trigger to downstream operator, when all the
> > >> blocks for a file written to HDFS.
> > >> 3) FileMerger: Merges all the blocks into a file and will upload the
> > >> merged file into S3 bucket.
> > >>
> > >> @ Ashwin
> > >>
> > >> Good suggestion. In the first iteration

Re: S3 Output Module

2016-10-27 Thread Sandeep Deshmukh
+1

Regards,
Sandeep

On Thu, Oct 27, 2016 at 1:53 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Hi All,
>
>   I am planning to implement the approach (2) of S3 Output Module which I
> proposed in my previous email. Performance would be better as compared to
> approach (1) because of uploading the blocks without saving it on HDFS.
>
>   Please share your opinions.
>
> Regards,
> Chaitanya
>
> On Thu, Oct 20, 2016 at 8:11 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
> > Hi All,
> >
> > I am proposing the below new design for S3 Output Module using multi part
> > upload feature:
> >
> > Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord
> >
> > Steps for uploading files using S3 multipart feature:
> >
> > =
> >
> >1.
> >
> >Initiate the upload. S3 will return upload id.
> >
> > Mandatory : bucket name, file path
> >
> > Note: Upload id is the unique identifier for multi part upload of a file.
> >
> >1.
> >
> >Upload each block using the received upload id. S3 will return ETag in
> >response of each upload.
> >
> > Mandatory: block number, upload id
> >
> >1.
> >
> >Send the merge request by providing the upload id and list of ETags .
> >
> > Mandatory: upload id, file path, block ETags.
> >
> > Here
> > <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html>
> is
> > an example link for uploading a file using multi part feature:
> >
> >
> > I am proposing the below two approaches for S3 output module.
> >
> >
> > (Solution 1)
> >
> > S3 Output Module consists of the below two operators:
> >
> > 1) BlockWriter : Write the blocks into the HDFS. Once successfully
> written
> > into HDFS, then this will emit the BlockMetadata.
> >
> > 2) S3MultiPartUpload: This consists of two parts:
> >
> >  a) If the number of blocks of a file is > 1 then upload the blocks
> > using multi part feature. Otherwise, will upload the block using
> > putObject().
> >
> >  b) Once all the blocks are successfully uploaded then will send the
> > merge complete request.
> >
> >
> > (Solution 2)
> >
> > DAG for this solution as follows:
> >
> > 1) InitateS3Upload:
> >
> > Input: FileMetadata
> >
> > Initiates the upload. This operator emits (filemetadata, uploadId) to
> > S3FileMerger and (filePath, uploadId) to S3BlockUpload.
> >
> > 2) S3BlockUpload:
> >
> > Input: FileBlockMetadata, ReaderRecord
> >
> > Upload the blocks into S3. S3 will return ETag for each upload.
> > S3BlockUpload emits (path, ETag) to S3FileMerger.
> >
> > 3) S3FileMerger: Sends the file merge request to S3.
> >
> > Pros:
> >
> > (1) Supports the size of file to upload is up to 5 TB.
> >
> > (2) Reduces the end to end latency. Because, we are not waiting to upload
> > until all the blocks of a file written to HDFS.
> >
> > Please vote and share your thoughts on these approaches.
> >
> > Regards,
> > Chaitanya
> >
> > On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
> > chaita...@datatorrent.com> wrote:
> >
> >> @ Tushar
> >>
> >>   S3 Copy Output Module consists of following operators:
> >> 1) BlockWriter : Writes the blocks into the HDFS.
> >> 2) Synchronizer: Sends trigger to downstream operator, when all the
> >> blocks for a file written to HDFS.
> >> 3) FileMerger: Merges all the blocks into a file and will upload the
> >> merged file into S3 bucket.
> >>
> >> @ Ashwin
> >>
> >> Good suggestion. In the first iteration, I will add the proposed
> >> design.
> >> Multipart support will add it in the next iteration.
> >>
> >> Regards,
> >> Chaitanya
> >>
> >> On Thu, Mar 24, 2016 at 2:44 AM, Ashwin Chandra Putta <
> >> ashwinchand...@gmail.com> wrote:
> >>
> >>> +1 regarding the s3 upload functionality.
> >>>
> >>> However, I think we should just focus on multipart upload directly as
> it
> >>> comes with various advantages like higher throughput, faster recovery,
> >>> not
> >>> needing to wait for entire file being created before uploading each
> part.
> >>> See: http://docs.aws.amazon.com/AmazonS3/latest/dev/upl

Re: S3 Output Module

2016-10-27 Thread Chaitanya Chebolu
Hi All,

  I am planning to implement the approach (2) of S3 Output Module which I
proposed in my previous email. Performance would be better as compared to
approach (1) because of uploading the blocks without saving it on HDFS.

  Please share your opinions.

Regards,
Chaitanya

On Thu, Oct 20, 2016 at 8:11 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Hi All,
>
> I am proposing the below new design for S3 Output Module using multi part
> upload feature:
>
> Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord
>
> Steps for uploading files using S3 multipart feature:
>
> =
>
>1.
>
>Initiate the upload. S3 will return upload id.
>
> Mandatory : bucket name, file path
>
> Note: Upload id is the unique identifier for multi part upload of a file.
>
>1.
>
>Upload each block using the received upload id. S3 will return ETag in
>response of each upload.
>
> Mandatory: block number, upload id
>
>1.
>
>Send the merge request by providing the upload id and list of ETags .
>
> Mandatory: upload id, file path, block ETags.
>
> Here
> <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html> is
> an example link for uploading a file using multi part feature:
>
>
> I am proposing the below two approaches for S3 output module.
>
>
> (Solution 1)
>
> S3 Output Module consists of the below two operators:
>
> 1) BlockWriter : Write the blocks into the HDFS. Once successfully written
> into HDFS, then this will emit the BlockMetadata.
>
> 2) S3MultiPartUpload: This consists of two parts:
>
>  a) If the number of blocks of a file is > 1 then upload the blocks
> using multi part feature. Otherwise, will upload the block using
> putObject().
>
>  b) Once all the blocks are successfully uploaded then will send the
> merge complete request.
>
>
> (Solution 2)
>
> DAG for this solution as follows:
>
> 1) InitateS3Upload:
>
> Input: FileMetadata
>
> Initiates the upload. This operator emits (filemetadata, uploadId) to
> S3FileMerger and (filePath, uploadId) to S3BlockUpload.
>
> 2) S3BlockUpload:
>
> Input: FileBlockMetadata, ReaderRecord
>
> Upload the blocks into S3. S3 will return ETag for each upload.
> S3BlockUpload emits (path, ETag) to S3FileMerger.
>
> 3) S3FileMerger: Sends the file merge request to S3.
>
> Pros:
>
> (1) Supports the size of file to upload is up to 5 TB.
>
> (2) Reduces the end to end latency. Because, we are not waiting to upload
> until all the blocks of a file written to HDFS.
>
> Please vote and share your thoughts on these approaches.
>
> Regards,
> Chaitanya
>
> On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
>> @ Tushar
>>
>>   S3 Copy Output Module consists of following operators:
>> 1) BlockWriter : Writes the blocks into the HDFS.
>> 2) Synchronizer: Sends trigger to downstream operator, when all the
>> blocks for a file written to HDFS.
>> 3) FileMerger: Merges all the blocks into a file and will upload the
>> merged file into S3 bucket.
>>
>> @ Ashwin
>>
>> Good suggestion. In the first iteration, I will add the proposed
>> design.
>> Multipart support will add it in the next iteration.
>>
>> Regards,
>> Chaitanya
>>
>> On Thu, Mar 24, 2016 at 2:44 AM, Ashwin Chandra Putta <
>> ashwinchand...@gmail.com> wrote:
>>
>>> +1 regarding the s3 upload functionality.
>>>
>>> However, I think we should just focus on multipart upload directly as it
>>> comes with various advantages like higher throughput, faster recovery,
>>> not
>>> needing to wait for entire file being created before uploading each part.
>>> See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusin
>>> gmpu.html
>>>
>>> Also, seems like we can do multipart upload if the file size is more than
>>> 5MB. They do recommend using multipart if the file size is more than
>>> 100MB.
>>> I am not sure if there is a hard lower limit though. See:
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
>>>
>>> This way, it seems like we don't to have to wait until a file is
>>> completely
>>> written to hdfs before performing the upload operation.
>>>
>>> Regards,
>>> Ashwin.
>>>
>>> On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <tus...@datatorrent.com>
>>> wrote:
>>>
>>> > +1 , we need this functionality.
>

Re: S3 Output Module

2016-10-20 Thread Chaitanya Chebolu
Hi All,

I am proposing the below new design for S3 Output Module using multi part
upload feature:

Input to this Module: FileMetadata, FileBlockMetadata, ReaderRecord

Steps for uploading files using S3 multipart feature:

=

   1.

   Initiate the upload. S3 will return upload id.

Mandatory : bucket name, file path

Note: Upload id is the unique identifier for multi part upload of a file.

   1.

   Upload each block using the received upload id. S3 will return ETag in
   response of each upload.

Mandatory: block number, upload id

   1.

   Send the merge request by providing the upload id and list of ETags .

Mandatory: upload id, file path, block ETags.

Here <http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html>
is an example link for uploading a file using multi part feature:


I am proposing the below two approaches for S3 output module.


(Solution 1)

S3 Output Module consists of the below two operators:

1) BlockWriter : Write the blocks into the HDFS. Once successfully written
into HDFS, then this will emit the BlockMetadata.

2) S3MultiPartUpload: This consists of two parts:

 a) If the number of blocks of a file is > 1 then upload the blocks
using multi part feature. Otherwise, will upload the block using
putObject().

 b) Once all the blocks are successfully uploaded then will send the
merge complete request.


(Solution 2)

DAG for this solution as follows:

1) InitateS3Upload:

Input: FileMetadata

Initiates the upload. This operator emits (filemetadata, uploadId) to
S3FileMerger and (filePath, uploadId) to S3BlockUpload.

2) S3BlockUpload:

Input: FileBlockMetadata, ReaderRecord

Upload the blocks into S3. S3 will return ETag for each upload.
S3BlockUpload emits (path, ETag) to S3FileMerger.

3) S3FileMerger: Sends the file merge request to S3.

Pros:

(1) Supports the size of file to upload is up to 5 TB.

(2) Reduces the end to end latency. Because, we are not waiting to upload
until all the blocks of a file written to HDFS.

Please vote and share your thoughts on these approaches.

Regards,
Chaitanya

On Tue, Mar 29, 2016 at 2:35 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> @ Tushar
>
>   S3 Copy Output Module consists of following operators:
> 1) BlockWriter : Writes the blocks into the HDFS.
> 2) Synchronizer: Sends trigger to downstream operator, when all the blocks
> for a file written to HDFS.
> 3) FileMerger: Merges all the blocks into a file and will upload the
> merged file into S3 bucket.
>
> @ Ashwin
>
> Good suggestion. In the first iteration, I will add the proposed
> design.
> Multipart support will add it in the next iteration.
>
> Regards,
> Chaitanya
>
> On Thu, Mar 24, 2016 at 2:44 AM, Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
>
>> +1 regarding the s3 upload functionality.
>>
>> However, I think we should just focus on multipart upload directly as it
>> comes with various advantages like higher throughput, faster recovery, not
>> needing to wait for entire file being created before uploading each part.
>> See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusin
>> gmpu.html
>>
>> Also, seems like we can do multipart upload if the file size is more than
>> 5MB. They do recommend using multipart if the file size is more than
>> 100MB.
>> I am not sure if there is a hard lower limit though. See:
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
>>
>> This way, it seems like we don't to have to wait until a file is
>> completely
>> written to hdfs before performing the upload operation.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <tus...@datatorrent.com>
>> wrote:
>>
>> > +1 , we need this functionality.
>> >
>> > Is it going to be a single operator or multiple operators? If multiple
>> > operators, then can you explain what functionality each operator will
>> > provide?
>> >
>> >
>> > Regards,
>> > -Tushar.
>> >
>> >
>> > On Wed, Mar 23, 2016 at 5:01 PM, Yogi Devendra <yogideven...@apache.org
>> >
>> > wrote:
>> >
>> > > Writing to S3 is a common use-case for applications.
>> > > This module will be definitely helpful.
>> > >
>> > > +1 for adding this module.
>> > >
>> > >
>> > > ~ Yogi
>> > >
>> > > On 22 March 2016 at 13:52, Chaitanya Chebolu <
>> chaita...@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > >   I am proposing S3 output copy 

[jira] [Updated] (APEXMALHAR-2022) S3 Output Module for file copy

2016-10-03 Thread Chaitanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaitanya updated APEXMALHAR-2022:
--
Assignee: Hitesh Kapoor  (was: Chaitanya)

> S3 Output Module for file copy
> --
>
> Key: APEXMALHAR-2022
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2022
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Chaitanya
>Assignee: Hitesh Kapoor
>
> Primary functionality of this module is copy files into S3 bucket using 
> block-by-block approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)