[jira] [Updated] (BAHIR-110) Replace use of _all_docs API with _changes API in all receivers

2017-07-23 Thread Esteban Laver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Laver updated BAHIR-110:

Description: 
Today we use the _changes API for Spark streaming receiver and _all_docs API 
for non-streaming receiver. _all_docs API supports parallel reads (using offset 
and range) but performance of _changes API is still better in most cases (even 
with single threaded support).

With this ticket we want to:
a) implement _changes API for non-streaming receivers
b) allow customers to pick either _all_docs (default) or _changes API endpoint, 
with documentation about pros and cons

_changes performance details:
Successfully loaded Cloudant (using local cloudant-developer docker image) docs 
into Spark (local standalone) with the following database sizes: 15GB (time: 8 
1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).

  was:
Today we use the _changes API for Spark streaming receiver and _all_docs API 
for non-streaming receiver. _all_docs API supports parallel reads (using offset 
and range) but performance of _changes API is still better in most cases (even 
with single threaded support).

With this ticket we want to:
a) re-implement all receivers using _changes API
b) compare performance between the two implementations based on _changes and 
_all_docs

Based on the results in b) we could decide to either
- replace _all_docs implementation with _changes based implementation OR
- allow customers to pick one (with a solid documentation about pros and cons) 


> Replace use of _all_docs API with _changes API in all receivers
> ---
>
> Key: BAHIR-110
> URL: https://issues.apache.org/jira/browse/BAHIR-110
> Project: Bahir
>  Issue Type: Improvement
>Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API 
> for non-streaming receiver. _all_docs API supports parallel reads (using 
> offset and range) but performance of _changes API is still better in most 
> cases (even with single threaded support).
> With this ticket we want to:
> a) implement _changes API for non-streaming receivers
> b) allow customers to pick either _all_docs (default) or _changes API 
> endpoint, with documentation about pros and cons
> _changes performance details:
> Successfully loaded Cloudant (using local cloudant-developer docker image) 
> docs into Spark (local standalone) with the following database sizes: 15GB 
> (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BAHIR-110) Replace use of _all_docs API with _changes API in all receivers

2017-05-25 Thread Esteban Laver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Laver updated BAHIR-110:

Remaining Estimate: 216h
 Original Estimate: 216h

> Replace use of _all_docs API with _changes API in all receivers
> ---
>
> Key: BAHIR-110
> URL: https://issues.apache.org/jira/browse/BAHIR-110
> Project: Bahir
>  Issue Type: Improvement
>Reporter: Esteban Laver
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API 
> for non-streaming receiver. _all_docs API supports parallel reads (using 
> offset and range) but performance of _changes API is still better in most 
> cases (even with single threaded support).
> With this ticket we want to:
> a) re-implement all receivers using _changes API
> b) compare performance between the two implementations based on _changes and 
> _all_docs
> Based on the results in b) we could decide to either
> - replace _all_docs implementation with _changes based implementation OR
> - allow customers to pick one (with a solid documentation about pros and 
> cons) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)