[jira] [Commented] (BAHIR-138) Fix sql-cloudant deprecation messages

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295606#comment-16295606
 ] 

ASF GitHub Bot commented on BAHIR-138:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/59
  

Refer to this link for build results (access rights to CI server needed): 
http://169.45.79.58:8080/job/bahir_spark_pr_builder/143/



> Fix sql-cloudant deprecation messages
> -
>
> Key: BAHIR-138
> URL: https://issues.apache.org/jira/browse/BAHIR-138
> Project: Bahir
>  Issue Type: Task
>  Components: Spark Structured Streaming Connectors
>Affects Versions: Spark-2.2.0
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>  Labels: warnings
>
> Deprecation warnings in {{DefaultSource}}:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to sql-cloudant/target/scala-2.11/classes...
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:59: 
> method json in class DataFrameReader is deprecated: Use json(Dataset[String]) 
> instead.
> [WARNING] val df = sqlContext.read.json(cloudantRDD)
> [WARNING]  ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:115:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] dataFrame = sqlContext.read.json(cloudantRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:121:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] sqlContext.read.json(aRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:152:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   dataFrame = 
> sqlContext.sparkSession.read.json(globalRDD)
> [WARNING]^
> [WARNING] four warnings found
> {code}
> Deprecation warnings in {{CloudantStreaming}} and 
> {{CloudantStreamingSelector}} examples:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to 
> sql-cloudant/target/scala-2.11/test-classes...
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:46:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:67:
>  method registerTempTable in class Dataset is deprecated: Use 
> createOrReplaceTempView(viewName) instead.
> [WARNING]   changesDataFrame.registerTempTable("airportcodemapping")
> [WARNING]^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreamingSelector.scala:50:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] three warnings found
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-138) Fix sql-cloudant deprecation messages

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295605#comment-16295605
 ] 

ASF GitHub Bot commented on BAHIR-138:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/59
  
:white_check_mark: Build successful
 



> Fix sql-cloudant deprecation messages
> -
>
> Key: BAHIR-138
> URL: https://issues.apache.org/jira/browse/BAHIR-138
> Project: Bahir
>  Issue Type: Task
>  Components: Spark Structured Streaming Connectors
>Affects Versions: Spark-2.2.0
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>  Labels: warnings
>
> Deprecation warnings in {{DefaultSource}}:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to sql-cloudant/target/scala-2.11/classes...
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:59: 
> method json in class DataFrameReader is deprecated: Use json(Dataset[String]) 
> instead.
> [WARNING] val df = sqlContext.read.json(cloudantRDD)
> [WARNING]  ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:115:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] dataFrame = sqlContext.read.json(cloudantRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:121:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] sqlContext.read.json(aRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:152:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   dataFrame = 
> sqlContext.sparkSession.read.json(globalRDD)
> [WARNING]^
> [WARNING] four warnings found
> {code}
> Deprecation warnings in {{CloudantStreaming}} and 
> {{CloudantStreamingSelector}} examples:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to 
> sql-cloudant/target/scala-2.11/test-classes...
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:46:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:67:
>  method registerTempTable in class Dataset is deprecated: Use 
> createOrReplaceTempView(viewName) instead.
> [WARNING]   changesDataFrame.registerTempTable("airportcodemapping")
> [WARNING]^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreamingSelector.scala:50:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] three warnings found
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-138) Fix sql-cloudant deprecation messages

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295596#comment-16295596
 ] 

ASF GitHub Bot commented on BAHIR-138:
--

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/59#discussion_r157584262
  
--- Diff: 
sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala
 ---
@@ -27,59 +27,57 @@ import org.apache.bahir.cloudant.CloudantReceiver
 
 object CloudantStreaming {
   def main(args: Array[String]) {
-val sparkConf = new SparkConf().setAppName("Cloudant Spark SQL 
External Datasource in Scala")
+val sparkConf = new SparkConf().setMaster("local[*]")
+  .setAppName("Cloudant Spark SQL External Datasource in Scala")
 // Create the context with a 10 seconds batch size
 val ssc = new StreamingContext(sparkConf, Seconds(10))
 
 val changes = ssc.receiverStream(new CloudantReceiver(sparkConf, Map(
-  "cloudant.host" -> "ACCOUNT.cloudant.com",
-  "cloudant.username" -> "USERNAME",
-  "cloudant.password" -> "PASSWORD",
-  "database" -> "n_airportcodemapping")))
-
+  "cloudant.host" -> "examples.cloudant.com",
+  "database" -> "sales")))
 changes.foreachRDD((rdd: RDD[String], time: Time) => {
   // Get the singleton instance of SparkSession
   val spark = 
SparkSessionSingleton.getInstance(rdd.sparkContext.getConf)
--- End diff --

Added your changes in ff02171.


> Fix sql-cloudant deprecation messages
> -
>
> Key: BAHIR-138
> URL: https://issues.apache.org/jira/browse/BAHIR-138
> Project: Bahir
>  Issue Type: Task
>  Components: Spark Structured Streaming Connectors
>Affects Versions: Spark-2.2.0
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>  Labels: warnings
>
> Deprecation warnings in {{DefaultSource}}:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to sql-cloudant/target/scala-2.11/classes...
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:59: 
> method json in class DataFrameReader is deprecated: Use json(Dataset[String]) 
> instead.
> [WARNING] val df = sqlContext.read.json(cloudantRDD)
> [WARNING]  ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:115:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] dataFrame = sqlContext.read.json(cloudantRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:121:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] sqlContext.read.json(aRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:152:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   dataFrame = 
> sqlContext.sparkSession.read.json(globalRDD)
> [WARNING]^
> [WARNING] four warnings found
> {code}
> Deprecation warnings in {{CloudantStreaming}} and 
> {{CloudantStreamingSelector}} examples:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to 
> sql-cloudant/target/scala-2.11/test-classes...
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:46:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:67:
>  method registerTempTable in class Dataset is deprecated: Use 
> createOrReplaceTempView(viewName) instead.
> [WARNING]   changesDataFrame.registerTempTable("airportcodemapping")
> [WARNING]^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreamingSelector.scala:50:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] three warnings found
> 

[jira] [Updated] (BAHIR-137) Load performance improvements for _changes API in sql-cloudant

2017-12-18 Thread Esteban Laver (JIRA)

 [ 
https://issues.apache.org/jira/browse/BAHIR-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Laver updated BAHIR-137:

Description: 
Items for improving _changes feed load:
- Make Spark streaming batch interval visible to the user for tuning based on 
type/size of document and number of docs in database
- Merge BAHIR-128: Improve stability of _changes receiver
- Merge BAHIR-154: refactor sql-cloudant to use java-cloudant library

> Load performance improvements for _changes API in sql-cloudant
> --
>
> Key: BAHIR-137
> URL: https://issues.apache.org/jira/browse/BAHIR-137
> Project: Bahir
>  Issue Type: Improvement
>Affects Versions: Spark-2.2.0
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>
> Items for improving _changes feed load:
> - Make Spark streaming batch interval visible to the user for tuning based on 
> type/size of document and number of docs in database
> - Merge BAHIR-128: Improve stability of _changes receiver
> - Merge BAHIR-154: refactor sql-cloudant to use java-cloudant library



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295347#comment-16295347
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/57
  

Refer to this link for build results (access rights to CI server needed): 
http://169.45.79.58:8080/job/bahir_spark_pr_builder/142/



> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295346#comment-16295346
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/57
  
:white_check_mark: Build successful
 



> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295322#comment-16295322
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157553098
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala
 ---
@@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: 
String, dbName: String,
   }
 
   def getChangesReceiverUrl: String = {
-var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true&feed=continuous&timeout=" + timeout
+var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true&feed=normal" +
+  "&seq_interval=1&timeout=" + timeout
--- End diff --

Yes, I think that's a good idea.  Fixed in f758996.


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295320#comment-16295320
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157552982
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
 val url = config.getChangesReceiverUrl.toString
 val selector: String = {
   "{\"selector\":" + config.getSelector + "}"
 }
 
-var count = 0
+// var count = 0
--- End diff --

Remove in f758996.


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-138) Fix sql-cloudant deprecation messages

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295295#comment-16295295
 ] 

ASF GitHub Bot commented on BAHIR-138:
--

Github user lresende commented on a diff in the pull request:

https://github.com/apache/bahir/pull/59#discussion_r157549138
  
--- Diff: 
sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala
 ---
@@ -27,59 +27,57 @@ import org.apache.bahir.cloudant.CloudantReceiver
 
 object CloudantStreaming {
   def main(args: Array[String]) {
-val sparkConf = new SparkConf().setAppName("Cloudant Spark SQL 
External Datasource in Scala")
+val sparkConf = new SparkConf().setMaster("local[*]")
+  .setAppName("Cloudant Spark SQL External Datasource in Scala")
 // Create the context with a 10 seconds batch size
 val ssc = new StreamingContext(sparkConf, Seconds(10))
 
 val changes = ssc.receiverStream(new CloudantReceiver(sparkConf, Map(
-  "cloudant.host" -> "ACCOUNT.cloudant.com",
-  "cloudant.username" -> "USERNAME",
-  "cloudant.password" -> "PASSWORD",
-  "database" -> "n_airportcodemapping")))
-
+  "cloudant.host" -> "examples.cloudant.com",
+  "database" -> "sales")))
 changes.foreachRDD((rdd: RDD[String], time: Time) => {
   // Get the singleton instance of SparkSession
   val spark = 
SparkSessionSingleton.getInstance(rdd.sparkContext.getConf)
--- End diff --

I would just create the "spark" instance of SparkSession on the top of the 
method (where we create SparkConf) and do the import of implicits there instead 
of inside of the forEach. This would also remove the necessity of the singleton 
object on the bottom.


> Fix sql-cloudant deprecation messages
> -
>
> Key: BAHIR-138
> URL: https://issues.apache.org/jira/browse/BAHIR-138
> Project: Bahir
>  Issue Type: Task
>  Components: Spark Structured Streaming Connectors
>Affects Versions: Spark-2.2.0
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>  Labels: warnings
>
> Deprecation warnings in {{DefaultSource}}:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to sql-cloudant/target/scala-2.11/classes...
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:59: 
> method json in class DataFrameReader is deprecated: Use json(Dataset[String]) 
> instead.
> [WARNING] val df = sqlContext.read.json(cloudantRDD)
> [WARNING]  ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:115:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] dataFrame = sqlContext.read.json(cloudantRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:121:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING] sqlContext.read.json(aRDD)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:152:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   dataFrame = 
> sqlContext.sparkSession.read.json(globalRDD)
> [WARNING]^
> [WARNING] four warnings found
> {code}
> Deprecation warnings in {{CloudantStreaming}} and 
> {{CloudantStreamingSelector}} examples:
> {code}
> [INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ 
> spark-sql-cloudant_2.11 ---
> [INFO] Compiling 11 Scala sources to 
> sql-cloudant/target/scala-2.11/test-classes...
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:46:
>  method json in class DataFrameReader is deprecated: Use 
> json(Dataset[String]) instead.
> [WARNING]   val changesDataFrame = spark.read.json(rdd)
> [WARNING] ^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:67:
>  method registerTempTable in class Dataset is deprecated: Use 
> createOrReplaceTempView(viewName) instead.
> [WARNING]   changesDataFrame.registerTempTable("airportcodemapping")
> [WARNING]^
> [WARNING] 
> sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreamingSelector.scala:50:
>  method json

[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295212#comment-16295212
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157533814
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
--- End diff --

For our internal implementation, we (myself and Mayya) wanted the user to 
have a snapshot of data to load into Spark.  For that to be possible, we 
decided to use `continuous` style feed with a doc limit.  With the new _changes 
implementation from Mike's project, the `normal` feed is stable and works as 
expected.  I've also lowered the amount of requests/load time by removing the 
HTTP request for the doc limit since it's not needed with `normal` style 
_changes feed.
To work with data in "real-time", you can use `CloudantReciever` which 
creates an eternal changes feed within the Spark Streaming context.


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295203#comment-16295203
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157532385
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala ---
@@ -125,7 +125,7 @@ class DefaultSource extends RelationProvider
   /* Create a streaming context to handle transforming docs in
   * larger databases into Spark datasets
   */
-  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(10))
+  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(8))
--- End diff --

Yes, this was not visible as part of the initial internal `_changes` work.  
I will create a new JIRA item and PR for this.


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294965#comment-16294965
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157480164
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala ---
@@ -125,7 +125,7 @@ class DefaultSource extends RelationProvider
   /* Create a streaming context to handle transforming docs in
   * larger databases into Spark datasets
   */
-  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(10))
+  val ssc = new StreamingContext(sqlContext.sparkContext, 
Seconds(8))
--- End diff --

I'm amazed this parameter was hard-coded as it seems to be a fairly 
critical tuning parameter for streaming. I guess this is one for another PR.


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294963#comment-16294963
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157478713
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantChangesConfig.scala
 ---
@@ -67,7 +67,8 @@ class CloudantChangesConfig(protocol: String, host: 
String, dbName: String,
   }
 
   def getChangesReceiverUrl: String = {
-var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true&feed=continuous&timeout=" + timeout
+var url = dbUrl + "/" + defaultIndex + 
"?include_docs=true&feed=normal" +
+  "&seq_interval=1&timeout=" + timeout
--- End diff --

WDYT about making the `seq_interval` the `bulkSize` instead of hard-coding?


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294966#comment-16294966
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157484722
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
--- End diff --

I'm a bit confused about this change. Since Spark Streaming is the basis 
for "real-time" or "continuous applications" doesn't this need to keep 
listening to the changes feed to wait for more changes?


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-128) Test failing sporadically in sql-cloudant's CloudantChangesDFSuite

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294964#comment-16294964
 ] 

ASF GitHub Bot commented on BAHIR-128:
--

Github user ricellis commented on a diff in the pull request:

https://github.com/apache/bahir/pull/57#discussion_r157479639
  
--- Diff: 
sql-cloudant/src/main/scala/org/apache/bahir/cloudant/internal/ChangesReceiver.scala
 ---
@@ -39,56 +37,38 @@ class ChangesReceiver(config: CloudantChangesConfig)
   }
 
   private def receive(): Unit = {
-// Get total number of docs in database using _all_docs endpoint
-val limit = new JsonStoreDataAccess(config)
-  .getTotalRows(config.getTotalUrl, queryUsed = false)
-
-// Get continuous _changes url
+// Get normal _changes url
 val url = config.getChangesReceiverUrl.toString
 val selector: String = {
   "{\"selector\":" + config.getSelector + "}"
 }
 
-var count = 0
+// var count = 0
--- End diff --

delete?


> Test failing sporadically in sql-cloudant's CloudantChangesDFSuite
> --
>
> Key: BAHIR-128
> URL: https://issues.apache.org/jira/browse/BAHIR-128
> Project: Bahir
>  Issue Type: Bug
>Reporter: Esteban Laver
>Assignee: Esteban Laver
>Priority: Minor
>
> This failure happened during pre-release testing for Bahir RC 2.2.0:
> CloudantChangesDFSuite:
> - load and save data from Cloudant database *** FAILED ***
>   0 did not equal 1967 (CloudantChangesDFSuite.scala:49)
> Partial stack trace:
> {code:java}
> Exception in thread "Cloudant Receiver" org.apache.spark.SparkException: 
> Cannot add data as BlockGenerator has not been started or has been stopped
> at 
> org.apache.spark.streaming.receiver.BlockGenerator.addData(BlockGenerator.scala:173)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushSingle(ReceiverSupervisorImpl.scala:120)
> at org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:119)
> at 
> org.apache.bahir.cloudant.internal.ChangesReceiver$$anonfun$org$apache$bahir$cloudant$internal$ChangesReceiver$$receive$1$$anonfun$apply$1.apply(ChangesReceiver.scala:82)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)