[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157553098 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala --- @@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: String, dbName: String, } def getChangesReceiverUrl: String = { -var url = dbUrl + "/" + defaultIndex + "?include_docs=true=continuous=" + timeout +var url = dbUrl + "/" + defaultIndex + "?include_docs=true=normal" + + "_interval=1=" + timeout --- End diff -- Yes, I think that's a good idea. Fixed in f758996. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157552982 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala --- @@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig) } private def receive(): Unit = { -// Get total number of docs in database using _all_docs endpoint -val limit = new JsonStoreDataAccess(config) - .getTotalRows(config.getTotalUrl, queryUsed = false) - -// Get continuous _changes url +// Get normal _changes url val url = config.getChangesReceiverUrl.toString val selector: String = { "{\"selector\":" + config.getSelector + "}" } -var count = 0 +// var count = 0 --- End diff -- Remove in f758996. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157533814 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala --- @@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig) } private def receive(): Unit = { -// Get total number of docs in database using _all_docs endpoint -val limit = new JsonStoreDataAccess(config) - .getTotalRows(config.getTotalUrl, queryUsed = false) - -// Get continuous _changes url +// Get normal _changes url --- End diff -- For our internal implementation, we (myself and Mayya) wanted the user to have a snapshot of data to load into Spark. For that to be possible, we decided to use `continuous` style feed with a doc limit. With the new _changes implementation from Mike's project, the `normal` feed is stable and works as expected. I've also lowered the amount of requests/load time by removing the HTTP request for the doc limit since it's not needed with `normal` style _changes feed. To work with data in "real-time", you can use `CloudantReciever` which creates an eternal changes feed within the Spark Streaming context. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ricellis commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157480164 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala --- @@ -125,7 +125,7 @@ class DefaultSource extends RelationProvider /* Create a streaming context to handle transforming docs in * larger databases into Spark datasets */ - val ssc = new StreamingContext(sqlContext.sparkContext, Seconds(10)) + val ssc = new StreamingContext(sqlContext.sparkContext, Seconds(8)) --- End diff -- I'm amazed this parameter was hard-coded as it seems to be a fairly critical tuning parameter for streaming. I guess this is one for another PR. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ricellis commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157484722 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala --- @@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig) } private def receive(): Unit = { -// Get total number of docs in database using _all_docs endpoint -val limit = new JsonStoreDataAccess(config) - .getTotalRows(config.getTotalUrl, queryUsed = false) - -// Get continuous _changes url +// Get normal _changes url --- End diff -- I'm a bit confused about this change. Since Spark Streaming is the basis for "real-time" or "continuous applications" doesn't this need to keep listening to the changes feed to wait for more changes? ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ricellis commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157478713 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala --- @@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: String, dbName: String, } def getChangesReceiverUrl: String = { -var url = dbUrl + "/" + defaultIndex + "?include_docs=true=continuous=" + timeout +var url = dbUrl + "/" + defaultIndex + "?include_docs=true=normal" + + "_interval=1=" + timeout --- End diff -- WDYT about making the `seq_interval` the `bulkSize` instead of hard-coding? ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ricellis commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r157479639 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala --- @@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig) } private def receive(): Unit = { -// Get total number of docs in database using _all_docs endpoint -val limit = new JsonStoreDataAccess(config) - .getTotalRows(config.getTotalUrl, queryUsed = false) - -// Get continuous _changes url +// Get normal _changes url val url = config.getChangesReceiverUrl.toString val selector: String = { "{\"selector\":" + config.getSelector + "}" } -var count = 0 +// var count = 0 --- End diff -- delete? ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r156206675 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java --- @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file + * except in compliance with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the + * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, + * either express or implied. See the License for the specific language governing permissions + * and limitations under the License. + */ +package org.apache.bahir.cloudant.common; + +import com.google.gson.JsonElement; +import com.google.gson.JsonObject; + +import java.util.List; + +/** + * Class representing a single row in a changes feed. Structure: + * + * { + * last_seq": 5 + * "results": [ + * ---*** This next items is the ChangesRow ***--- + * { + * "changes": [ {"rev": "2-eec205a9d413992850a6e32678485900"}, ... ], + * "deleted": true, + * "id": "deleted", + * "seq": 5, + * "doc": ... structure ... + * } + * ] + * } + */ +public class ChangesRow { --- End diff -- Fixed in 5da88e8. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r156123828 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java --- @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. --- End diff -- @ckadner It's valid and the author is OK with this existing in the open-source. From my understanding, we're (Cloudant) allowed to retain the copyright on code we open source, but the license effectively grants free use. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155934996 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRowScanner.java --- @@ -0,0 +1,92 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. + * --- End diff -- same [comment](https://github.com/apache/bahir/pull/57/commits/5e554103bee8162b85948e219dd4b7fdd7707a30#r155934979) as above ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155934979 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java --- @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. --- End diff -- @emlaver -- is this an outdated copyright statement? If it is still valid you may need to check with the author and/or IBM if you can contribute this code (or variations of it) to open-source. I am surprised the RAT check did not catch this. CC @lresende ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user emlaver commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155699572 --- Diff: pom.xml --- @@ -458,7 +458,7 @@ .gitignore .repository/ - .idea/ + **/.idea/** --- End diff -- Yea, I think it's extra files in my IntelliJ setup. I'll rebased my changes. ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155697683 --- Diff: pom.xml --- @@ -458,7 +458,7 @@ .gitignore .repository/ - .idea/ + **/.idea/** --- End diff -- this change should not be necessary. the `.idea/` folder should only get created at the project root level. any nested files and folders are already covered. maybe a mishap when setting up IntelliJ? ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
GitHub user emlaver opened a pull request: https://github.com/apache/bahir/pull/57 [BAHIR-128][WIP] fix failing sql-cloudant test _What_ Fix test that's failing sporadically in sql-cloudant's `CloudantChangesDFSuite`. _How_ - Call stop in receiver when _changes feed completes - Improve performance and decrease testing time by setting batch size to 8 seconds - Use getResource to load json files path See [BAHIR-128](https://issues.apache.org/jira/browse/BAHIR-128) You can merge this pull request into a Git repository by running: $ git pull https://github.com/emlaver/bahir 128-fix-failing-sql-cloudant-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir/pull/57.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #57 commit cab516bbf42a5a40d7fa4f7d1813772ecd39105c Author: Esteban LaverDate: 2017-09-08T14:33:26Z - Call stop in receiver when _changes feed completes - Improved performance and decrease testing time by setting batch size to 8 seconds - Use getResource to load json files path ---