Re: failing build question BAHIR-100 PR

2017-07-07 Thread Ted Yu
Maybe we should upgrade Java 8 - I see 8.0_91 and latest is 8u131.

Other than that, someone with admin privilege can add the suggested ulimit
command at the beginning of the QA run.

Cheers

On Thu, Jul 6, 2017 at 11:16 PM, Rosenstark, David <
david.rosenst...@intel.com> wrote:

> http://169.45.79.58:8080/job/bahir_spark_pr_builder/69/console
>
> On 06/07/2017, 23:08, "Ted Yu"  wrote:
>
> Can you give the URL where the error was reported ?
>
> Thanks
>
> On Thu, Jul 6, 2017 at 1:05 PM, Rosenstark, David <
> david.rosenst...@intel.com> wrote:
>
> > Hi my build passes fine on my machine running oracle java 8.
> > But I see it failed on a different sub-project not affected by
> my PR
> > on Jenkins. But this is the output: (see below)
> > Is this a known intermittent issue? Is there some issue on build
> > machine? Any other hints?
> >
> > 12:48:40 - Send & Receive messages of size 128000 bytes.
> > 12:48:40 AkkaStreamSourceSuite:
> > 12:48:40 BasicAkkaSourceSuite:
> > 12:48:50 - basic usage
> > 12:49:00 - Send and receive 100 messages.
> > 12:49:00 - params not provided
> > 12:49:10 - Recovering offset from the last processed offset
> > 112:49:10 - Recovering offset from the last processed offset
> > 12:49:10 #
> > 12:49:10 # A fatal error has been detected by the Java Runtime
> > Environment:
> > 12:49:10 #
> > 12:49:10 #  SIGSEGV (0xb) at pc=0x7f3970105120, pid=24784,
> > tid=139896151623424
> > 12:49:10 #
> > 12:49:10 # JRE version: OpenJDK Runtime Environment (8.0_91-b14)
> > (build 1.8.0_91-b14)
> > 12:49:10 # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed
> mode
> > linux-amd64 compressed oops)
> > 12:49:10 # Problematic frame:
> > 12:49:10 # C  0x7f3970105120
> > 12:49:10 #
> > 12:49:10 # Failed to write core dump. Core dumps have been
> disabled.
> > To enable core dumping, try "ulimit -c unlimited" before starting
> Java again
> > 12:49:10 #
> > 12:49:10 # An error report file with more information is saved
> as:
> > 12:49:10 # /var/lib/jenkins/workspace/
> bahir_spark_pr_builder/sql-
> > streaming-akka/hs_err_pid24784.log
> > 12:49:10 #
> > 12:49:10 # If you would like to submit a bug report, please
> visit:
> > 12:49:10 #   http://bugreport.java.com/bugreport/crash.jsp
> > 12:49:10 # The crash happened outside the Java Virtual Machine in
> > native code.
> > 12:49:10 # See problematic frame for where to report the bug.
> >
> >
> >
> >
> >
> > 
> -
> > Intel Electronics Ltd.
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
>
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>


[jira] [Created] (BAHIR-121) Review details in sql-cloudant README relating to RDD persistence

2017-07-07 Thread Esteban Laver (JIRA)
Esteban Laver created BAHIR-121:
---

 Summary: Review details in sql-cloudant README relating to RDD 
persistence
 Key: BAHIR-121
 URL: https://issues.apache.org/jira/browse/BAHIR-121
 Project: Bahir
  Issue Type: Improvement
Reporter: Esteban Laver
Priority: Minor


We have one customer that needs to load a Cloudant database of ~15 GB into 
Spark.  As we may get more customers loading similar or larger databases, we'll 
need to add any details or tips for RDD persistence and performance tuning. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-110) Replace use of _all_docs API with _changes API in all receivers

2017-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078105#comment-16078105
 ] 

ASF GitHub Bot commented on BAHIR-110:
--

Github user mayya-sharipova commented on a diff in the pull request:

https://github.com/apache/bahir/pull/45#discussion_r126151630
  
--- Diff: sql-cloudant/README.md ---
@@ -52,39 +51,71 @@ Here each subsequent configuration overrides the 
previous one. Thus, configurati
 
 
 ### Configuration in application.conf
-Default values are defined in 
[here](cloudant-spark-sql/src/main/resources/application.conf).
+Default values are defined in [here](src/main/resources/application.conf).
 
 ### Configuration on SparkConf
 
 Name | Default | Meaning
 --- |:---:| ---
+cloudant.apiReceiver|"_all_docs"| API endpoint for RelationProvider when 
loading or saving data from Cloudant to DataFrames or SQL temporary tables. 
Select between "_all_docs" or "_changes" endpoint.
 cloudant.protocol|https|protocol to use to transfer data: http or https
-cloudant.host||cloudant host url
-cloudant.username||cloudant userid
-cloudant.password||cloudant password
+cloudant.host| |cloudant host url
+cloudant.username| |cloudant userid
+cloudant.password| |cloudant password
 cloudant.useQuery|false|By default, _all_docs endpoint is used if 
configuration 'view' and 'index' (see below) are not set. When useQuery is 
enabled, _find endpoint will be used in place of _all_docs when query condition 
is not on primary key field (_id), so that query predicates may be driven into 
datastore. 
 cloudant.queryLimit|25|The maximum number of results returned when 
querying the _find endpoint.
 jsonstore.rdd.partitions|10|the number of partitions intent used to drive 
JsonStoreRDD loading query result in parallel. The actual number is calculated 
based on total rows returned and satisfying maxInPartition and minInPartition
 jsonstore.rdd.maxInPartition|-1|the max rows in a partition. -1 means 
unlimited
 jsonstore.rdd.minInPartition|10|the min rows in a partition.
 jsonstore.rdd.requestTimeout|90| the request timeout in milliseconds
 bulkSize|200| the bulk save size
-schemaSampleSize| "-1" | the sample size for RDD schema discovery. 1 means 
we are using only first document for schema discovery; -1 means all documents; 
0 will be treated as 1; any number N means min(N, total) docs 
-createDBOnSave|"false"| whether to create a new database during save 
operation. If false, a database should already exist. If true, a new database 
will be created. If true, and a database with a provided name already exists, 
an error will be raised. 
+schemaSampleSize|-1| the sample size for RDD schema discovery. 1 means we 
are using only first document for schema discovery; -1 means all documents; 0 
will be treated as 1; any number N means min(N, total) docs 
+createDBOnSave|false| whether to create a new database during save 
operation. If false, a database should already exist. If true, a new database 
will be created. If true, and a database with a provided name already exists, 
an error will be raised. 
+
+The `cloudant.apiReceiver` option allows for _changes or _all_docs API 
endpoint to be called while loading Cloudant data into Spark DataFrames or SQL 
Tables,
+or saving data from DataFrames or SQL Tables to a Cloudant database.
+
+**Note:** When using `_changes` API, please consider: 
+1. Results are partially ordered and may not be be presented in order in 
+which documents were updated.
+2. In case of shards' unavailability, you may see duplicate results 
(changes that have been seen already)
+3. Can use `selector` option to filter Cloudant docs during load
+4. Supports a real snapshot of the database and represents it in a single 
point of time.
+5. Only supports single threaded
+
+
+When using `_all_docs` API:
+1. Supports parallel reads (using offset and range)
+2. Using partitions may not represent the true snapshot of a database.  
Some docs
+   may be added or deleted in the database between loading data into 
different 
+   Spark partitions.
+
+Performance of `_changes` API is still better in most cases (even with 
single threaded support). 
+During several performance tests using 200 MB to 15 GB Cloudant databases, 
load time from Cloudant to Spark using 
+`_changes` feed was faster to complete every time compared to `_all_docs`.
+ 
+See 
[CloudantChangesDFSuite](src/test/scala/org/apache/bahir/cloudant/CloudantChangesDFSuite.scala)
 
+for examples of loading data into a Spark DataFrame with `_changes` API.
 
 ### Configuration on Spark SQL Temporary Table or DataFrame
 
 Besides all the configurations passed to a temporary table or dataframe 
through SparkConf, it is 

[jira] [Commented] (BAHIR-100) Providing MQTT Spark Streaming to return encoded Byte[] message without corruption

2017-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077686#comment-16077686
 ] 

ASF GitHub Bot commented on BAHIR-100:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/47
  

Refer to this link for build results (access rights to CI server needed): 
http://169.45.79.58:8080/job/bahir_spark_pr_builder/71/



> Providing MQTT Spark Streaming to return encoded Byte[] message without 
> corruption
> --
>
> Key: BAHIR-100
> URL: https://issues.apache.org/jira/browse/BAHIR-100
> Project: Bahir
>  Issue Type: New Feature
>  Components: Spark Streaming Connectors
>Reporter: Anntinu Josy
>Assignee: Anntinu Josy
>  Labels: mqtt, spark, streaming
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now a days Network bandwidth is becoming a serious resource that need to be 
> conserver in IoT ecosystem, For this puropse we are using different byte[] 
> based encoding such as Protocol Buffer and flat Buffer, Once this encoded 
> message is converted into string the data becomes corrupted, So same byte[] 
> format need to be preserved when forwarded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BAHIR-100) Providing MQTT Spark Streaming to return encoded Byte[] message without corruption

2017-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BAHIR-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077685#comment-16077685
 ] 

ASF GitHub Bot commented on BAHIR-100:
--

Github user ApacheBahir commented on the issue:

https://github.com/apache/bahir/pull/47
  
Build successful
 



> Providing MQTT Spark Streaming to return encoded Byte[] message without 
> corruption
> --
>
> Key: BAHIR-100
> URL: https://issues.apache.org/jira/browse/BAHIR-100
> Project: Bahir
>  Issue Type: New Feature
>  Components: Spark Streaming Connectors
>Reporter: Anntinu Josy
>Assignee: Anntinu Josy
>  Labels: mqtt, spark, streaming
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now a days Network bandwidth is becoming a serious resource that need to be 
> conserver in IoT ecosystem, For this puropse we are using different byte[] 
> based encoding such as Protocol Buffer and flat Buffer, Once this encoded 
> message is converted into string the data becomes corrupted, So same byte[] 
> format need to be preserved when forwarded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)