[jira] [Resolved] (BAHIR-214) Improve KuduConnector speed

2019-09-02 Thread Luciano Resende (Jira)


 [ 
https://issues.apache.org/jira/browse/BAHIR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende resolved BAHIR-214.
---
Fix Version/s: Flink-Next
   Resolution: Fixed

> Improve KuduConnector speed
> ---
>
> Key: BAHIR-214
> URL: https://issues.apache.org/jira/browse/BAHIR-214
> Project: Bahir
>  Issue Type: Improvement
>  Components: Flink Streaming Connectors
>Reporter: Joao Boto
>Assignee: Joao Boto
>Priority: Major
> Fix For: Flink-Next
>
>
> kudu connector has some issues on kudu sink with some flush modes that kill 
> sink over time
>  
> this is a refactor to resolve that issues and improve speed on eventual 
> consistence



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BAHIR-214) Improve KuduConnector speed

2019-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/BAHIR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921089#comment-16921089
 ] 

ASF subversion and git services commented on BAHIR-214:
---

Commit 55240a993df999d66aefa36e587be719c29be92a in bahir-flink's branch 
refs/heads/master from Joao Boto
[ https://gitbox.apache.org/repos/asf?p=bahir-flink.git;h=55240a9 ]

[BAHIR-214] Improve speed and solve eventual consistence issues (#64)

* resolve eventual consistency issues
* improve speed special on eventual consistency stream
* Update Readme


> Improve KuduConnector speed
> ---
>
> Key: BAHIR-214
> URL: https://issues.apache.org/jira/browse/BAHIR-214
> Project: Bahir
>  Issue Type: Improvement
>  Components: Flink Streaming Connectors
>Reporter: Joao Boto
>Assignee: Joao Boto
>Priority: Major
>
> kudu connector has some issues on kudu sink with some flush modes that kill 
> sink over time
>  
> this is a refactor to resolve that issues and improve speed on eventual 
> consistence



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BAHIR-214) Improve KuduConnector speed

2019-09-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/BAHIR-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921088#comment-16921088
 ] 

ASF GitHub Bot commented on BAHIR-214:
--

lresende commented on pull request #64: [BAHIR-214]: improve speed and solve 
issues on eventual consistence
URL: https://github.com/apache/bahir-flink/pull/64
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve KuduConnector speed
> ---
>
> Key: BAHIR-214
> URL: https://issues.apache.org/jira/browse/BAHIR-214
> Project: Bahir
>  Issue Type: Improvement
>  Components: Flink Streaming Connectors
>Reporter: Joao Boto
>Assignee: Joao Boto
>Priority: Major
>
> kudu connector has some issues on kudu sink with some flush modes that kill 
> sink over time
>  
> this is a refactor to resolve that issues and improve speed on eventual 
> consistence



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BAHIR-172) Avoid FileInputStream/FileOutputStream

2019-09-02 Thread Luciano Resende (Jira)


 [ 
https://issues.apache.org/jira/browse/BAHIR-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende resolved BAHIR-172.
---
Fix Version/s: Spark-2.4.0
   Resolution: Fixed

> Avoid FileInputStream/FileOutputStream
> --
>
> Key: BAHIR-172
> URL: https://issues.apache.org/jira/browse/BAHIR-172
> Project: Bahir
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Fix For: Spark-2.4.0
>
>
> They rely on finalizers (before Java 11), which create unnecessary GC load.
> The alternatives, {{Files.newInputStream}}, are as easy to use and don't have 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BAHIR-172) Avoid FileInputStream/FileOutputStream

2019-09-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/BAHIR-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921086#comment-16921086
 ] 

ASF subversion and git services commented on BAHIR-172:
---

Commit 68ac1be22ddccecf105aee355a8c2652868e9f7d in bahir's branch 
refs/heads/master from Like
[ https://gitbox.apache.org/repos/asf?p=bahir.git;h=68ac1be ]

[BAHIR-172 ] Replace FileInputStream with Files.newInputStream (#92)



> Avoid FileInputStream/FileOutputStream
> --
>
> Key: BAHIR-172
> URL: https://issues.apache.org/jira/browse/BAHIR-172
> Project: Bahir
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> They rely on finalizers (before Java 11), which create unnecessary GC load.
> The alternatives, {{Files.newInputStream}}, are as easy to use and don't have 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BAHIR-172) Avoid FileInputStream/FileOutputStream

2019-09-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/BAHIR-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921085#comment-16921085
 ] 

ASF GitHub Bot commented on BAHIR-172:
--

lresende commented on pull request #92: [BAHIR-172 ] Create input stream and 
output stream of file with Files
URL: https://github.com/apache/bahir/pull/92
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid FileInputStream/FileOutputStream
> --
>
> Key: BAHIR-172
> URL: https://issues.apache.org/jira/browse/BAHIR-172
> Project: Bahir
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> They rely on finalizers (before Java 11), which create unnecessary GC load.
> The alternatives, {{Files.newInputStream}}, are as easy to use and don't have 
> this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BAHIR-213) Faster S3 file Source for Structured Streaming with SQS

2019-09-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/BAHIR-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921083#comment-16921083
 ] 

ASF GitHub Bot commented on BAHIR-213:
--

lresende commented on issue #91: [BAHIR-213] Faster S3 file Source for 
Structured Streaming with SQS
URL: https://github.com/apache/bahir/pull/91#issuecomment-527259701
 
 
   @steveloughran could you please review this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Faster S3 file Source for Structured Streaming with SQS
> ---
>
> Key: BAHIR-213
> URL: https://issues.apache.org/jira/browse/BAHIR-213
> Project: Bahir
>  Issue Type: New Feature
>  Components: Spark Structured Streaming Connectors
>Affects Versions: Spark-2.4.0
>Reporter: Abhishek Dixit
>Priority: Major
>
> Using FileStreamSource to read files from a S3 bucket has problems both in 
> terms of costs and latency:
>  * *Latency:* Listing all the files in S3 buckets every microbatch can be 
> both slow and resource intensive.
>  * *Costs:* Making List API requests to S3 every microbatch can be costly.
> The solution is to use Amazon Simple Queue Service (SQS) which lets you find 
> new files written to S3 bucket without the need to list all the files every 
> microbatch.
> S3 buckets can be configured to send notification to an Amazon SQS Queue on 
> Object Create / Object Delete events. For details see AWS documentation here 
> [Configuring S3 Event 
> Notifications|https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html]
>  
> Spark can leverage this to find new files written to S3 bucket by reading 
> notifications from SQS queue instead of listing files every microbatch.
> I hope to contribute changes proposed in [this pull 
> request|https://github.com/apache/spark/pull/24934] to Apache Bahir as 
> suggested by [gaborgsomogyi|https://github.com/gaborgsomogyi]  
> [here|https://github.com/apache/spark/pull/24934#issuecomment-511389130]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)