[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157553098
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala
 ---
@@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: 
String, dbName: String,
   }
 
   def getChangesReceiverUrl: String = {
-var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true=continuous=" + timeout
+var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true=normal" +
+  "_interval=1=" + timeout
--- End diff --

Yes, I think that's a good idea.  Fixed in f758996.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157552982
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
 val url = config.getChangesReceiverUrl.toString
 val selector: String = {
   "{\"selector\":" + config.getSelector + "}"
 }
 
-var count = 0
+// var count = 0
--- End diff --

Remove in f758996.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157533814
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
--- End diff --

For our internal implementation, we (myself and Mayya) wanted the user to 
have a snapshot of data to load into Spark.  For that to be possible, we 
decided to use `continuous` style feed with a doc limit.  With the new _changes 
implementation from Mike's project, the `normal` feed is stable and works as 
expected.  I've also lowered the amount of requests/load time by removing the 
HTTP request for the doc limit since it's not needed with `normal` style 
_changes feed.
To work with data in "real-time", you can use `CloudantReciever` which 
creates an eternal changes feed within the Spark Streaming context.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread ricellis
Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157480164
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala ---
@@ -125,7 +125,7 @@ class DefaultSource extends RelationProvider
   /* Create a streaming context to handle transforming docs in
   * larger databases into Spark datasets
   */
-  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(10))
+  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(8))
--- End diff --

I'm amazed this parameter was hard-coded as it seems to be a fairly 
critical tuning parameter for streaming. I guess this is one for another PR.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread ricellis
Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157484722
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
--- End diff --

I'm a bit confused about this change. Since Spark Streaming is the basis 
for "real-time" or "continuous applications" doesn't this need to keep 
listening to the changes feed to wait for more changes?


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread ricellis
Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157478713
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala
 ---
@@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: 
String, dbName: String,
   }
 
   def getChangesReceiverUrl: String = {
-var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true=continuous=" + timeout
+var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true=normal" +
+  "_interval=1=" + timeout
--- End diff --

WDYT about making the `seq_interval` the `bulkSize` instead of hard-coding?


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-18 Thread ricellis
Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157479639
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
 val url = config.getChangesReceiverUrl.toString
 val selector: String = {
   "{\"selector\":" + config.getSelector + "}"
 }
 
-var count = 0
+// var count = 0
--- End diff --

delete?


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-11 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r156206675
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java ---
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2017 IBM Cloudant. All rights reserved.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may 
not use this file
+ * except in compliance with the License. You may obtain a copy of the 
License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the
+ * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
CONDITIONS OF ANY KIND,
+ * either express or implied. See the License for the specific language 
governing permissions
+ * and limitations under the License.
+ */
+package org.apache.bahir.cloudant.common;
+
+import com.google.gson.JsonElement;
+import com.google.gson.JsonObject;
+
+import java.util.List;
+
+/**
+ * Class representing a single row in a changes feed. Structure:
+ *
+ * {
+ *   last_seq": 5
+ *   "results": [
+ * ---*** This next items is the ChangesRow ***---
+ * {
+ *   "changes": [ {"rev": "2-eec205a9d413992850a6e32678485900"}, ... ],
+ *   "deleted": true,
+ *   "id": "deleted",
+ *   "seq": 5,
+ *   "doc": ... structure ...
+ * }
+ *   ]
+ * }
+ */
+public class ChangesRow {
--- End diff --

Fixed in 5da88e8.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-11 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r156123828
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java ---
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2017 IBM Cloudant. All rights reserved.
--- End diff --

@ckadner It's valid and the author is OK with this existing in the 
open-source.  From my understanding, we're (Cloudant) allowed to retain the 
copyright on code we open source, but the license effectively grants free use.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-09 Thread ckadner
Github user ckadner commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r155934996
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRowScanner.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2017 IBM Cloudant. All rights reserved.
+ * 
--- End diff --

same 
[comment](https://github.com/apache/bahir/pull/57/commits/5e554103bee8162b85948e219dd4b7fdd7707a30#r155934979)
 as above


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-09 Thread ckadner
Github user ckadner commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r155934979
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java ---
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2017 IBM Cloudant. All rights reserved.
--- End diff --

@emlaver -- is this an outdated copyright statement? If it is still valid 
you may need to check with the author and/or IBM if you can contribute this 
code (or variations of it) to open-source.

I am surprised the RAT check did not catch this.

CC @lresende


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-07 Thread emlaver
Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r155699572
  
--- Diff: pom.xml ---
@@ -458,7 +458,7 @@
 
   .gitignore
   .repository/
-  .idea/
+  **/.idea/**
--- End diff --

Yea, I think it's extra files in my IntelliJ setup.  I'll rebased my 
changes.


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-07 Thread ckadner
Github user ckadner commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r155697683
  
--- Diff: pom.xml ---
@@ -458,7 +458,7 @@
 
   .gitignore
   .repository/
-  .idea/
+  **/.idea/**
--- End diff --

this change should not be necessary. the `.idea/` folder should only get 
created at the project root level. any nested files and folders are already 
covered. maybe a mishap when setting up IntelliJ?


---


[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test

2017-12-07 Thread emlaver
GitHub user emlaver opened a pull request:

https://github.com/apache/bahir/pull/57

[BAHIR-128][WIP] fix failing sql-cloudant test

_What_
Fix test that's failing sporadically in sql-cloudant's 
`CloudantChangesDFSuite`.

_How_
- Call stop in receiver when _changes feed completes
- Improve performance and decrease testing time by setting batch size to 8 
seconds
- Use getResource to load json files path

See [BAHIR-128](https://issues.apache.org/jira/browse/BAHIR-128)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/emlaver/bahir 
128-fix-failing-sql-cloudant-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/bahir/pull/57.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #57


commit cab516bbf42a5a40d7fa4f7d1813772ecd39105c
Author: Esteban Laver 
Date:   2017-09-08T14:33:26Z

- Call stop in receiver when _changes feed completes
- Improved performance and decrease testing time by setting batch size to 8 
seconds
- Use getResource to load json files path




---