Github user emlaver commented on a diff in the pull request:
https://github.com/apache/bahir/pull/57#discussion_r157533814
--- Diff:
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
}
private def receive(): Unit = {
- // Get total number of docs in database using _all_docs endpoint
- val limit = new JsonStoreDataAccess(config)
- .getTotalRows(config.getTotalUrl, queryUsed = false)
-
- // Get continuous _changes url
+ // Get normal _changes url
--- End diff --
For our internal implementation, we (myself and Mayya) wanted the user to
have a snapshot of data to load into Spark. For that to be possible, we
decided to use `continuous` style feed with a doc limit. With the new _changes
implementation from Mike's project, the `normal` feed is stable and works as
expected. I've also lowered the amount of requests/load time by removing the
HTTP request for the doc limit since it's not needed with `normal` style
_changes feed.
To work with data in "real-time", you can use `CloudantReciever` which
creates an eternal changes feed within the Spark Streaming context.
---