[ https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luciano Resende resolved BAHIR-110. ----------------------------------- Resolution: Fixed Assignee: Esteban Laver Fix Version/s: Spark-2.2.0 > Implement _changes API for non-streaming receiver > ------------------------------------------------- > > Key: BAHIR-110 > URL: https://issues.apache.org/jira/browse/BAHIR-110 > Project: Bahir > Issue Type: Improvement > Reporter: Esteban Laver > Assignee: Esteban Laver > Fix For: Spark-2.2.0 > > Original Estimate: 216h > Remaining Estimate: 216h > > Today we use the _changes API for Spark streaming receiver and _all_docs API > for non-streaming receiver. _all_docs API supports parallel reads (using > offset and range) but performance of _changes API is still better in most > cases (even with single threaded support). > With this ticket we want to: > a) implement _changes API for non-streaming receivers > b) allow customers to pick either _all_docs (default) or _changes API > endpoint, with documentation about pros and cons > _changes performance details: > Successfully loaded Cloudant (using local cloudant-developer docker image) > docs into Spark (local standalone) with the following database sizes: 15GB > (time: 8 1/2 mins), 20GB (17 mins), 46GB (25 mins), and 75GB (48 1/2 mins). -- This message was sent by Atlassian JIRA (v6.4.14#64029)