Re: spark2.0 how to use sparksession and StreamingContext same time

2016-07-25 Thread kevin
thanks a lot Terry 2016-07-26 12:03 GMT+08:00 Terry Hoo : > Kevin, > > Try to create the StreamingContext as following: > > val ssc = new StreamingContext(spark.sparkContext, Seconds(2)) > > > > On Tue, Jul 26, 2016 at 11:25 AM, kevin wrote: > >>

spark2.0 how to use sparksession and StreamingContext same time

2016-07-25 Thread kevin
hi,all: I want to read data from kafka and regist as a table then join a jdbc table. My sample like this : val spark = SparkSession .builder .config(sparkConf) .getOrCreate() val jdbcDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:mysql://master1:3306/demo",

Re: Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread kevin
thanks a lot .after change to scala 2.11 , it works. 2016-07-25 17:40 GMT+08:00 Tomasz Gawęda : > Hi, > > Please change Scala version to 2.11. As far as I know, Spark packages are > now build with Scala 2.11 and I've got other - 2.10 - version > > > >

Re: Outer Explode needed

2016-07-25 Thread Michael Armbrust
I don't think this would be hard to implement. The physical explode operator supports it (for our HiveQL compatibility). Perhaps comment on this JIRA? https://issues.apache.org/jira/browse/SPARK-13721 It could probably just be another argument to explode() Michael On Mon, Jul 25, 2016 at 6:12

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
Thank you,I can't find spark-streaming-kafka_2.10 jar for spark2 from maven center. so I try the version 1.6.2,it not work ,it need class org.apache.spark.Logging, which can't find in spark2. so I build spark-streaming-kafka_2.10 jar for spark2 from the source code. it's work now. 2016-07-26 2:12

Fwd: Outer Explode needed

2016-07-25 Thread Don Drake
No response on the Users list, I thought I would repost here. See below. -Don -- Forwarded message -- From: Don Drake Date: Sun, Jul 24, 2016 at 2:18 PM Subject: Outer Explode needed To: user I have a nested data structure (array of

Re: Potential Change in Kafka's Partition Assignment Semantics when Subscription Changes

2016-07-25 Thread Cody Koeninger
This seems really low risk to me. In order to be impacted, it'd have to be someone who was using the kafka integration in spark 2.0, which isn't even officially released yet. On Mon, Jul 25, 2016 at 7:23 PM, Vahid S Hashemian wrote: > Sorry, meant to ask if any Apache

Re: Cartesian join between DataFrames

2016-07-25 Thread Nicholas Chammas
Oh, sorry you’re right. I looked at the doc for join() and didn’t realize you could do a cartesian join. But it turns out that df1.join(df2) does the job and matches the SQL equivalent too. ​ On Mon, Jul

Re: Cartesian join between DataFrames

2016-07-25 Thread Reynold Xin
DataFrame can do cartesian joins. On July 25, 2016 at 3:43:19 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: It appears that RDDs can do a cartesian join, but not DataFrames. Is there a fundamental reason why not, or is this just waiting for someone to implement? I know you can get

Cartesian join between DataFrames

2016-07-25 Thread Nicholas Chammas
It appears that RDDs can do a cartesian join, but not DataFrames. Is there a fundamental reason why not, or is this just waiting for someone to implement? I know you can get the RDDs underlying the DataFrames and do the cartesian join that way, but you lose the schema of course. Nick

[build system] jenkins downtime friday afternoon, july 29th 2016

2016-07-25 Thread shane knapp
around 1pm friday, july 29th, we will be taking jenkins down for a rack move and celebrating national systems administrator day. the outage should only last a couple of hours at most, and will be concluded with champagne toasts. yes, the outage and holiday are real, but the champagne in the

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-25 Thread Luciano Resende
When are we planning to push the release maven artifacts ? We are waiting for this in order to push an official Apache Bahir release supporting Spark 2.0. On Sat, Jul 23, 2016 at 7:05 AM, Reynold Xin wrote: > The vote has passed with the following +1 votes and no -1 votes.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Cody Koeninger
For 2.0, the kafka dstream support is in two separate subprojects depending on which version of Kafka you are using spark-streaming-kafka-0-10 or spark-streaming-kafka-0-8 corresponding to brokers that are version 0.10+ or 0.8+ On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin

Re: Nested/Chained case statements generate codegen over 64k exception

2016-07-25 Thread Jonathan Gray
I came back to this to try and investigate further using the latest version of the project. However, I don't have enough experience with the code base to understand fully what is now happening, could someone take a look at the testcase attached to this JIRA and run on the latest version of the

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Reynold Xin
The presentation at Spark Summit SF was probably referring to Structured Streaming. The existing Spark Streaming (dstream) in Spark 2.0 has the same production stability level as Spark 1.6. There is also Kafka 0.10 support in dstream. On July 25, 2016 at 10:26:49 AM, Andy Davidson (

Spark RC5 - OutOfMemoryError: Requested array size exceeds VM limit

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, I am running some tpcds queries (data is Parquet stored in hdfs) with spark 2.0 rc5 and for some queries I get this OOM: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)

Re: orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Thank you! Any chance for this work being reviewed and integrated with next Spark release? Best, Ovidiu > On 25 Jul 2016, at 12:20, Hyukjin Kwon wrote: > > For the question 1., It is possible but not supported yet. Please refer > https://github.com/apache/spark/pull/13775

Re: orc/parquet sql conf

2016-07-25 Thread Hyukjin Kwon
For the question 1., It is possible but not supported yet. Please refer https://github.com/apache/spark/pull/13775 Thanks! 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Hi, > > Assuming I have some data in both ORC/Parquet formats, and some complex >

orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, Assuming I have some data in both ORC/Parquet formats, and some complex workflow that eventually combine results of some queries on these datasets, I would like to get the best execution and looking at the default configs I noticed: 1) Vectorized query execution possible with Parquet

Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread Tomasz Gawęda
Hi, Please change Scala version to 2.11. As far as I know, Spark packages are now build with Scala 2.11 and I've got other - 2.10 - version Od: kevin Wysłane: 25 lipca 2016 11:33 Do: user.spark; dev.spark Temat: spark2.0 can't run

spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread kevin
hi,all: I download spark2.0 per-build. I can run SqlNetworkWordCount test use : bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount master1 but when I use spark2.0 example source code SqlNetworkWordCount.scala and build it to a jar bao with dependencies ( JDK 1.8 AND

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
I have compile it from source code 2016-07-25 12:05 GMT+08:00 kevin : > hi,all : > I try to run example org.apache.spark.examples.streaming.KafkaWordCount , > I got error : > Exception in thread "main" java.lang.NoClassDefFoundError: >