Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
Thanks Hao for the reply. I turn the merge sort join off, the physical plan is below, but the performance is roughly the same as it on... == Physical Plan == TungstenProject [ss_quantity#10,ss_list_price#12,ss_coupon_amt#19,ss_cdemo_sk#4,ss_item_sk#2,ss_promo_sk#8,ss_sold_date_sk#0]

RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Cheng, Hao
You mean the performance is still slow as the SMJ in Spark 1.5? Can you set the spark.shuffle.reduceLocality.enabled=false when you start the spark-shell/spark-sql? It’s a new feature in Spark 1.5, and it’s true by default, but we found it probably causes the performance reduce dramatically.

Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
Yes... At 2015-09-11 14:34:46, "Cheng, Hao" wrote: You mean the performance is still slow as the SMJ in Spark 1.5? Can you set the spark.shuffle.reduceLocality.enabled=false when you start the spark-shell/spark-sql? It’s a new feature in Spark 1.5, and it’s true by

Data lost in spark streaming

2015-09-11 Thread Bin Wang
I'm using spark streaming 1.4.0 and have a DStream that have all the data it received. But today the history data in the DStream seems to be lost suddenly. And the application UI also lost the streaming process time and all the related data. Could any give some hint to debug this? Thanks.

Help with collect() in Spark Streaming

2015-09-11 Thread Holden Karau
A common practice to do this is to use foreachRDD with a local var to accumulate the data (you can see it in the Spark Streaming test code). That being said, I am a little curious why you want the driver to create the file specifically. On Friday, September 11, 2015, allonsy

I'd like to add our company to the Powered by Spark page

2015-09-11 Thread Timothy Snyder
Hello, I'm interested in adding our company to this Powered by Spark page. I've included some information below, but if you have any questions or need any additional information please let me know. Organization name: Hawk

Re: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
I had ran similar benchmark for 1.5, do self join on a fact table with join key that had many duplicated rows (there are N rows for the same join key), say N, after join, there will be N*N rows for each join key. Generating the joined row is slower in 1.5 than 1.4 (it needs to copy left and right

Re: Help with collect() in Spark Streaming

2015-09-11 Thread Luca
Hi, thanks for answering. With the *coalesce() *transformation a single worker is in charge of writing to HDFS, but I noticed that the single write operation usually takes too much time, slowing down the whole computation (this is particularly true when 'unified' is made of several partitions).

Re: Help with collect() in Spark Streaming

2015-09-11 Thread Holden Karau
Having the driver write the data instead of a worker probably won't spread it up, you still need to copy all of the data to a single node. Is there something which forces you to only write from a single node? On Friday, September 11, 2015, Luca wrote: > Hi, > thanks for

Spark monitoring

2015-09-11 Thread prk77
Is there a way to fetch the current spark cluster memory & cpu usage programmatically ? I know that the default spark master web ui has these details but I want to retrieve them through a program and store them for analysis. -- View this message in context:

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Davies Liu
Did this happen immediately after you start the cluster or after ran some queries? Is this in local mode or cluster mode? On Fri, Sep 11, 2015 at 3:00 AM, Jagat Singh wrote: > Hi, > > We have queries which were running fine on 1.4.1 system. > > We are testing upgrade and

Re: Realtime Data Visualization Tool for Spark

2015-09-11 Thread Jo Sunad
I've found Apache Zeppelin to be a good start if you want to visualize spark data. It doesn't come with streaming visualizations, although I've seen people tweak the code so it does let you do real time visualizations with spark streaming Other tools I've heard about are python notebook and spark

Re: Realtime Data Visualization Tool for Spark

2015-09-11 Thread Silvio Fiorito
So if you want to build your own from the ground up, then yes you could go the d3js route. Like Feynman also responded you could use something like Spark Notebook or Zeppelin to create some charts as well. It really depends on your intended audience and ultimate goal. If you just want some

Re: Realtime Data Visualization Tool for Spark

2015-09-11 Thread Dean Wampler
Here's a demonstration video from @noootsab himself (creator of Spark Notebook) showing live charting in Spark Notebook. It's one reason I prefer it over the other options. https://twitter.com/noootsab/status/638489244160401408 Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition

Re: Spark based Kafka Producer

2015-09-11 Thread Atul Kulkarni
Folks, Any help on this? Regards, Atul. On Fri, Sep 11, 2015 at 8:39 AM, Atul Kulkarni wrote: > Hi Raghavendra, > > Thanks for your answers, I am passing 10 executors and I am not sure if > that is the problem. It is still hung. > > Regards, > Atul. > > > On Fri, Sep

Help with collect() in Spark Streaming

2015-09-11 Thread allonsy
Hi everyone, I have a JavaPairDStream object and I'd like the Driver to create a txt file (on HDFS) containing all of its elements. At the moment, I use the /coalesce(1, true)/ method: JavaPairDStream unified = [partitioned stuff] unified.foreachRDD(new

Re: Cassandra row count grouped by multiple columns

2015-09-11 Thread Eric Walker
Hi Chirag, Maybe something like this? import org.apache.spark.sql._ import org.apache.spark.sql.types._ val rdd = sc.parallelize(Seq( Row("A1", "B1", "C1"), Row("A2", "B2", "C2"), Row("A3", "B3", "C2"), Row("A1", "B1", "C1") )) val schema = StructType(Seq("a", "b", "c").map(c =>

Re: Model summary for linear and logistic regression.

2015-09-11 Thread Feynman Liang
Sorry! The documentation is not the greatest thing in the world, but these features are documented here On Fri, Sep 11, 2015 at 6:25 AM, Sebastian Kuepers < sebastian.kuep...@publicispixelpark.de> wrote: > Hey, > > > the 1.5.0 release

Re: Few Conceptual Questions on Spark-SQL and HiveQL

2015-09-11 Thread Narayanan K
Hi there ? Any replied :) -Narayanan On Fri, Sep 11, 2015 at 1:51 AM, Narayanan K wrote: > Hi all, > > We are migrating from Hive to Spark. We used Spark-SQL CLI to run our > Hive Queries for performance testing. I am new to Spark and had few > clarifications. We have :

SparkR connection string to Cassandra

2015-09-11 Thread Austin Trombley
Spark, Do you have a SparkR connection string example of an RJDBC connection to a cassandra cluster? Thanks -- regards, Austin Trombley, MBA Senior Manager – Business Intelligence cell: 415-767-6179 CONFIDENTIALITY STATEMENT: This email message, together with all attachments, is intended

sparksql query hive data error

2015-09-11 Thread stark_summer
start hive metastore service OK hadoop io compression codec is lzo, configure is core-site.xml io.compression.codecs

Exception in Spark-sql insertIntoJDBC command

2015-09-11 Thread Baljeet Singh
Hi, I’m using Spark-Sql to insert the data in the csv in a table in Sql Server as database. The createJDBCTable command is working fine with it. But when i’m trying to insert more records in the same table that I have created in database using insertIntoJDBC, it is throwing an error message –

RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Cheng, Hao
Can you confirm if the query really run in the cluster mode? Not the local mode. Can you print the call stack of the executor when the query is running? BTW: spark.shuffle.reduceLocality.enabled is the configuration of Spark, not Spark SQL. From: Todd [mailto:bit1...@163.com] Sent: Friday,

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek wrote: > Hello , > > > > Is there any way to query multiple collections from mongodb using spark > and java. And i want to create only one Configuration Object. Please help > if anyone has something

Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Jagat Singh
Hi, We have queries which were running fine on 1.4.1 system. We are testing upgrade and even simple query like val t1= sqlContext.sql("select count(*) from table") t1.show This works perfectly fine on 1.4.1 but throws OOM error in 1.5.0 Are there any changes in default memory settings from

Multilabel classification support

2015-09-11 Thread Yasemin Kaya
Hi, I want to use Mllib for multilabel classification, but I find http://spark.apache.org/docs/latest/mllib-classification-regression.html, it is not what I mean. Is there a way to use multilabel classification? Thanks alot. Best, yasemin -- hiç ender hiç

MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Hello , Is there any way to query multiple collections from mongodb using spark and java. And i want to create only one Configuration Object. Please help if anyone has something regarding this. Thank You Abhishek

RE: MongoDB and Spark

2015-09-11 Thread Mishra, Abhishek
Anything using Spark RDD’s ??? Abhishek From: Sandeep Giri [mailto:sand...@knowbigdata.com] Sent: Friday, September 11, 2015 3:19 PM To: Mishra, Abhishek; user@spark.apache.org; d...@spark.apache.org Subject: Re: MongoDB and Spark use map-reduce. On Fri, Sep 11, 2015, 14:32 Mishra, Abhishek

Re: Spark based Kafka Producer

2015-09-11 Thread Raghavendra Pandey
You can pass the number of executors via command line option --num-executors.You need more than 2 executors to make spark-streaming working. For more details on command line option, please go through http://spark.apache.org/docs/latest/running-on-yarn.html. On Fri, Sep 11, 2015 at 10:52 AM,

Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
I add the following two options: spark.sql.planner.sortMergeJoin=false spark.shuffle.reduceLocality.enabled=false But it still performs the same as not setting them two. One thing is that on the spark ui, when I click the SQL tab, it shows an empty page but the header title 'SQL',there is no

Re: Multilabel classification support

2015-09-11 Thread Alexis Gillain
You can try these packages for adaboost.mh : https://github.com/BaiGang/spark_multiboost (scala) or https://github.com/tizfa/sparkboost (java) 2015-09-11 15:29 GMT+08:00 Yasemin Kaya : > Hi, > > I want to use Mllib for multilabel classification, but I find >

Few Conceptual Questions on Spark-SQL and HiveQL

2015-09-11 Thread Narayanan K
Hi all, We are migrating from Hive to Spark. We used Spark-SQL CLI to run our Hive Queries for performance testing. I am new to Spark and had few clarifications. We have : 1. Set up 10 boxes, one master and 9 slaves in standalone mode. Each of the boxes are launchers to our external Hadoop

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-11 Thread Tim Chen
Yes you can create an issue, or actually contribute a patch to update it :) Sorry the docs is a bit light, I'm going to make it more complete along the way. Tim On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) < tomwa...@cisco.com> wrote: > Tim, > > Thank you for the explanation.

Re: selecting columns with the same name in a join

2015-09-11 Thread Michael Armbrust
Here is what I get on branch-1.5: x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF() y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF() x.registerTempTable('x') y.registerTempTable('y') sqlContext.sql("select y.v, x.v FROM x INNER JOIN y ON

Re: New JavaRDD Inside JavaPairDStream

2015-09-11 Thread Cody Koeninger
No, in general you can't make new RDDs in code running on the executors. It looks like your properties file is a constant, why not process it at the beginning of the job and broadcast the result? On Fri, Sep 11, 2015 at 2:09 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: >

Error - Calling a package (com.databricks:spark-csv_2.10:1.0.3) with spark-submit

2015-09-11 Thread Subhajit Purkayastha
I am on spark 1.3.1 When I do the following with spark-shell, it works spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 Then I can create a DF using the spark-csv package import sqlContext.implicits._ import org.apache.spark.sql._ // Return the dataset specified by

which install package type for cassandra use

2015-09-11 Thread beakesland
Hello, Which install package type is suggested to add spark nodes to an existing Cassandra cluster? Will be using to deal with data already stored in Cassandra with connector. I am not currently running any Hadoop/CDH Thank you. Phil -- View this message in context:

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
Thanks, I'm surprised to see there are so much difference (4x), there could be something wrong in Spark (some contention between tasks). On Fri, Sep 11, 2015 at 11:47 AM, Jesse F Chen wrote: > > @Davies...good question.. > > > Just be curious how the difference would be if you

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Jesse F Chen
@Davies...good question.. > Just be curious how the difference would be if you use 20 executors > and 20G memory for each executor.. So I tried the following combinations: (GB X # executors) (query response time in secs) 20X20 415 10X40 230

New JavaRDD Inside JavaPairDStream

2015-09-11 Thread Rachana Srivastava
Hello all, Can we invoke JavaRDD while processing stream from Kafka for example. Following code is throwing some serialization exception. Not sure if this is feasible. JavaStreamingContext jssc = new JavaStreamingContext(jsc, Durations.seconds(5)); JavaPairReceiverInputDStream

Re: java.util.NoSuchElementException: key not found

2015-09-11 Thread Yin Huai
Looks like you hit https://issues.apache.org/jira/browse/SPARK-10422, it has been fixed in branch 1.5. 1.5.1 release will have it. On Fri, Sep 11, 2015 at 3:35 AM, guoqing0...@yahoo.com.hk < guoqing0...@yahoo.com.hk> wrote: > Hi all , > After upgrade spark to 1.5 , Streaming throw >

Re: Training the MultilayerPerceptronClassifier

2015-09-11 Thread Feynman Liang
Rory, I just sent a PR (https://github.com/avulanov/ann-benchmark/pull/1) to bring that benchmark up to date. Hope it helps. On Fri, Sep 11, 2015 at 6:39 AM, Rory Waite wrote: > Hi, > > I’ve been trying to train the new MultilayerPerceptronClassifier in spark > 1.5 for the

countApproxDistinctByKey in python

2015-09-11 Thread LucaMartinetti
Hi, I am trying to use countApproxDistinctByKey in pyspark but cannot find it. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L417 Am I missing something or has not been ported / wrapped yet? Thanks -- View this message in

SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

2015-09-11 Thread Varadhan, Jawahar
Hi all,   I have a coded a custom receiver which receives kafka messages. These Kafka messages have FTP server credentials in them. The receiver then opens the message and uses the ftp credentials in it  to connect to the ftp server. It then streams this huge text file (3.3G) . Finally this

UserDefinedTypes

2015-09-11 Thread Richard Eggert
Greetings, I have recently started using Spark SQL and ran up against two rather odd limitations related to UserDefinedTypes. The first is that there appears to be no way to register a UserDefinefType other than by adding the @SQLUserDefinedType annotation to the class being mapped. This makes

Re: MongoDB and Spark

2015-09-11 Thread Sandeep Giri
I think it should be possible by loading collections as RDD and then doing a union on them. Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. Phone: +1-253-397-1945 (Office) [image: linkedin icon]

Training the MultilayerPerceptronClassifier

2015-09-11 Thread Rory Waite
Hi, I’ve been trying to train the new MultilayerPerceptronClassifier in spark 1.5 for the MNIST digit recognition task. I’m trying to reproduce the work here: https://github.com/avulanov/ann-benchmark The API has changed since this work, so I’m not sure that I’m setting up the task correctly.

Re: Multilabel classification support

2015-09-11 Thread Yanbo Liang
LogisticRegression in MLlib(not ML) package supports both multiclass and multilabel classification. 2015-09-11 16:21 GMT+08:00 Alexis Gillain : > You can try these packages for adaboost.mh : > > https://github.com/BaiGang/spark_multiboost (scala) > or >

Re: MLlib LDA implementation questions

2015-09-11 Thread Carsten Schnober
Hi, I don't have practical experience with the MLlib LDA implementation, but regarding the variations in the topic matrix: LDA make use of stochastic processes. If you use setSeed(seed) with the same value for seed during initialization, your results should be identical though. May I ask what

Re: Is it required to remove checkpoint when submitting a code change?

2015-09-11 Thread Cody Koeninger
Yeah, it makes sense that parameters that are read only during your getOrCCreate function wouldn't be re-read, since that function isn't called if a checkpoint is loaded. I would have thought changing number of executors and other things used by spark-submit would work on checkpoint restart.

Re: Realtime Data Visualization Tool for Spark

2015-09-11 Thread Feynman Liang
Spark notebook does something similar, take a look at their line chart code On Fri, Sep 11, 2015 at 8:56 AM, Shashi Vishwakarma < shashi.vish...@gmail.com> wrote: > Hi > > I have

Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Davies Liu
On Fri, Sep 11, 2015 at 10:31 AM, Jesse F Chen wrote: > > Thanks Hao! > > I tried your suggestion of setting spark.shuffle.reduceLocality.enabled=false > and my initial tests showed queries are on par between 1.5 and 1.4.1. > > Results: > > tpcds-query39b-141.out:query time:

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

2015-09-11 Thread Tom Waterhouse (tomwater)
Tim, Thank you for the explanation. You are correct, my Mesos experience is very light, and I haven’t deployed anything via Marathon yet. What you have stated here makes sense, I will look into doing this. Adding this info to the docs would be great. Is the appropriate action to create an

Re: Implement "LIKE" in SparkSQL

2015-09-11 Thread Richard Eggert
concat and locate are available as of version 1.5.0, according to the Scaladocs. For earlier versions of Spark, and for the operations that are still not supported, it's pretty straightforward to define your own UserDefinedFunctions in either Scala or Java (I don't know about other languages).

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Ted Yu
You may have seen this: https://spark.apache.org/docs/latest/sql-programming-guide.html Please suggest what should be added. Cheers On Fri, Sep 11, 2015 at 3:43 AM, vivek bhaskar wrote: > Hi all, > > I am looking for a reference manual for Spark SQL some thing like many >

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread vivek bhaskar
Hi Ted, The link you mention do not have complete list of supported syntax. For example, few supported syntax are listed as "Supported Hive features" but that do not claim to be exhaustive (even if it is so, one has to filter out a lot many lines from Hive QL reference and still will not be sure

Re: Spark does not yet support its JDBC component for Scala 2.11.

2015-09-11 Thread Ted Yu
Have you looked at: https://issues.apache.org/jira/browse/SPARK-8013 > On Sep 11, 2015, at 4:53 AM, Petr Novak wrote: > > Does it still apply for 1.5.0? > > What actual limitation does it mean when I switch to 2.11? No JDBC > Thriftserver? No JDBC DataSource? No

selecting columns with the same name in a join

2015-09-11 Thread Evert Lammerts
Am I overlooking something? This doesn't seem right: x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF() y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF() x.registerTempTable('x') y.registerTempTable('y') sqlContext.sql("select y.v, x.v FROM x INNER JOIN

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTtPPuSvBu0rj2 > On Sep 11, 2015, at 3:00 AM, Jagat Singh wrote: > > Hi, > > We have queries which were running fine on 1.4.1 system. > > We are testing upgrade and even simple query like > val t1=

Is there any Spark SQL reference manual?

2015-09-11 Thread vivek bhaskar
Hi all, I am looking for a reference manual for Spark SQL some thing like many database vendors have. I could find one for hive ql https://cwiki.apache.org/confluence/display/Hive/LanguageManual but not anything specific to spark sql. Please suggest. SQL reference specific to latest release will

Spark does not yet support its JDBC component for Scala 2.11.

2015-09-11 Thread Petr Novak
Does it still apply for 1.5.0? What actual limitation does it mean when I switch to 2.11? No JDBC Thriftserver? No JDBC DataSource? No JdbcRDD (which is already obsolete I believe)? Some more? What library is the blocker to upgrade JDBC component to 2.11? Is there any estimate when it could be

java.util.NoSuchElementException: key not found

2015-09-11 Thread guoqing0...@yahoo.com.hk
Hi all , After upgrade spark to 1.5 , Streaming throw java.util.NoSuchElementException: key not found occasionally , is the problem of data cause this error ? please help me if anyone got similar problem before , Thanks very much. the exception accur when write into database.

Re: Spark based Kafka Producer

2015-09-11 Thread Atul Kulkarni
Slight update: The following code with "spark context" works, with wild card file paths in hard coded strings but it won't work with a value parsed out of the program arguments as above: val sc = new SparkContext(sparkConf) val zipFileTextRDD =

Re: Multilabel classification support

2015-09-11 Thread Alexis Gillain
Do you mean by running a model on every label ? That's another solution of course. If you mean LogisticRegression natively "supports" multilabel, can you provide me some references. From what I see in the code it uses LabeledPoint which has only one label. 2015-09-11 21:54 GMT+08:00 Yanbo Liang

Re: countApproxDistinctByKey in python

2015-09-11 Thread Ted Yu
It has not been ported yet. On Fri, Sep 11, 2015 at 4:13 PM, LucaMartinetti wrote: > Hi, > > I am trying to use countApproxDistinctByKey in pyspark but cannot find it. > > >

Model summary for linear and logistic regression.

2015-09-11 Thread Sebastian Kuepers
Hey, the 1.5.0 release note say, that there are now model summaries for logistic regressions available. But I can't find them in the current documentary. ? Any help very much appreciated! Thanks Sebastian

Fwd: MLlib LDA implementation questions

2015-09-11 Thread Marko Asplund
Hi, We're considering using Spark MLlib (v >= 1.5) LDA implementation for topic modelling. We plan to train the model using a data set of about 12 M documents and vocabulary size of 200-300 k items. Documents are relatively short, typically containing less than 10 words, but the number can range

Re: Exception Handling : Spark Streaming

2015-09-11 Thread Ted Yu
Was your intention that exception from rdd.saveToCassandra() be caught ? In that case you can place try / catch around that call. Cheers On Fri, Sep 11, 2015 at 7:30 AM, Samya wrote: > Hi Team, > > I am facing this issue where in I can't figure out why the exception is

Re: MongoDB and Spark

2015-09-11 Thread Corey Nolet
Unfortunately, MongoDB does not directly expose its locality via its client API so the problem with trying to schedule Spark tasks against it is that the tasks themselves cannot be scheduled locally on nodes containing query results- which means you can only assume most results will be sent over

Exception Handling : Spark Streaming

2015-09-11 Thread Samya
Hi Team, I am facing this issue where in I can't figure out why the exception is handled the first time an exception is thrown in the stream processing action, but is ignored the second time. PFB my code base. object Boot extends App { //Load the configuration val config =

A way to kill laggard jobs?

2015-09-11 Thread Dmitry Goldenberg
Is there a way to kill a laggard Spark job manually, and more importantly, is there a way to do it programmatically based on a configurable timeout value? Thanks.

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Richard Hillegas
The latest Derby SQL Reference manual (version 10.11) can be found here: https://db.apache.org/derby/docs/10.11/ref/index.html. It is, indeed, very useful to have a comprehensive reference guide. The Derby build scripts can also produce a BNF description of the grammar--but that is not part of

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Peyman Mohajerian
http://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html On Fri, Sep 11, 2015 at 8:15 AM, Richard Hillegas wrote: > The latest Derby SQL Reference manual (version 10.11) can be found here: >

Re: Spark based Kafka Producer

2015-09-11 Thread Atul Kulkarni
Hi Raghavendra, Thanks for your answers, I am passing 10 executors and I am not sure if that is the problem. It is still hung. Regards, Atul. On Fri, Sep 11, 2015 at 12:40 AM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > You can pass the number of executors via command line

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Ted Yu
Very nice suggestion, Richard. I logged SPARK-10561 referencing this discussion. On Fri, Sep 11, 2015 at 8:15 AM, Richard Hillegas wrote: > The latest Derby SQL Reference manual (version 10.11) can be found here: > https://db.apache.org/derby/docs/10.11/ref/index.html. It

Realtime Data Visualization Tool for Spark

2015-09-11 Thread Shashi Vishwakarma
Hi I have got streaming data which needs to be processed and send for visualization. I am planning to use spark streaming for this but little bit confused in choosing visualization tool. I read somewhere that D3.js can be used but i wanted know which is best tool for visualization while dealing

RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Jesse F Chen
Thanks Hao! I tried your suggestion of setting spark.shuffle.reduceLocality.enabled =false and my initial tests showed queries are on par between 1.5 and 1.4.1. Results: tpcds-query39b-141.out:query time: 129.106478631 sec tpcds-query39b-150-reduceLocality-false.out:query time: