Can someone look at my questions? Thanks again!
From: Haopu Wang
Sent: 2016年6月12日 16:40
To: u...@spark.apache.org
Subject: Should I avoid "state" in an Spark application?
I have a Spark application whose structure is below:
var ts:
Can someone help? Thank you!
From: Haopu Wang
Sent: Monday, June 15, 2015 3:36 PM
To: user; dev@spark.apache.org
Subject: [SparkStreaming] NPE in DStreamCheckPointData.scala:125
I use the attached program to test checkpoint. It's quite simple.
When
I use the attached program to test checkpoint. It's quite simple.
When I run the program second time, it will load checkpoint data, that's
expected, however I see NPE in driver log.
Do you have any idea about the issue? I'm on Spark 1.4.0, thank you very
much!
== logs ==
15/
, 2015 2:41 AM
To: Haopu Wang
Cc: user; dev@spark.apache.org
Subject: Re: [SparkSQL] cannot filter by a DateType column
What version of Spark are you using? It appears that at least in master
we are doing the conversion correctly, but its possible older versions
of applySchema do not. If you can
I want to filter a DataFrame based on a Date column.
If the DataFrame object is constructed from a scala case class, it's
working (either compare as String or Date). But if the DataFrame is
generated by specifying a Schema to an RDD, it doesn't work. Below is
the exception and test code.
D
Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Monday, March 02, 2015 9:05 PM
To: Haopu Wang; user
Subject: RE: Is SQLContext thread-safe?
Yes it is thread safe, at least it's supposed to be.
-Original Message-----
From: Haopu Wang [mailto:hw...@qilinsoft.com]
Sent: Monday, March 2, 2015 4
Great! Thank you!
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Thursday, April 02, 2015 8:11 AM
To: Haopu Wang
Cc: user; dev@spark.apache.org
Subject: Re: Can I call aggregate UDF in DataFrame?
You totally can.
https://github.com/apache/spark
Specifically there are only 5 aggregate functions in class
org.apache.spark.sql.GroupedData: sum/max/min/mean/count.
Can I plugin a function to calculate stddev?
Thank you!
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache
SELECT key,value FROM src")
scala> output.saveAsTable("outputtable")
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Wednesday, March 11, 2015 8:25 AM
To: Haopu Wang; user; dev@spark.apache.org
Subject: RE: [SparkSQL] Reuse HiveContext to diffe
I'm using Spark 1.3.0 RC3 build with Hive support.
In Spark Shell, I want to reuse the HiveContext instance to different
warehouse locations. Below are the steps for my test (Assume I have
loaded a file into table "src").
==
15/03/10 18:22:59 INFO SparkILoop: Created sql context (with
Hi, in the roadmap of Spark in 2015 (link:
http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p
ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark
Streaming and Spark SQL.
My question is: what's the typical usage of SchemaRDD in a Spark
Streaming application? Thank
To: Michael Armbrust
Cc: Haopu Wang; dev@spark.apache.org
Subject: Re: HiveContext cannot be serialized
I submitted a patch
https://github.com/apache/spark/pull/4628
On Mon, Feb 16, 2015 at 10:59 AM, Michael Armbrust
wrote:
I was suggesting you mark the variable that is holding the
When I'm investigating this issue (in the end of this email), I take a
look at HiveContext's code and find this change
(https://github.com/apache/spark/commit/64945f868443fbc59cb34b34c16d782d
da0fb63d#diff-ff50aea397a607b79df9bec6f2a841db):
- @transient protected[hive] lazy val hiveconf = new
Hi, I think a modeling tool may be helpful because sometimes it's
hard/tricky to program Spark. I don't know if there is already such a
tool.
Thanks!
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional comma
Liquan, yes, for full outer join, one hash table on both sides is more
efficient.
For the left/right outer join, it looks like one hash table should be enought.
From: Liquan Pei [mailto:liquan...@gmail.com]
Sent: 2014年9月30日 18:34
To: Haopu Wang
Cc: dev
anks again!
From: Liquan Pei [mailto:liquan...@gmail.com]
Sent: 2014年9月30日 12:31
To: Haopu Wang
Cc: dev@spark.apache.org; user
Subject: Re: Spark SQL question: why build hashtable for both sides in
HashOuterJoin?
Hi Haopu,
My understanding is that the hashtable on both left and
I take a look at HashOuterJoin and it's building a Hashtable for both
sides.
This consumes quite a lot of memory when the partition is big. And it
doesn't reduce the iteration on streamed relation, right?
Thanks!
-
To unsubscrib
FWD to dev mail list for helps
From: Haopu Wang
Sent: 2014年9月22日 16:35
To: u...@spark.apache.org
Subject: Spark SQL 1.1.0: NPE when join two cached table
I have two data sets and want to join them on each first field. Sample data are
below:
data set
18 matches
Mail list logo