I have seen a similar error message when connecting to Hive through JDBC.
This is just a guess on my part, but check your query. The error occurs if
you have a select that includes a null literal with an alias like this:
select a, b, null as c, d from foo
In my case, rewriting the query to use
I'm guessing EC2 support is not there yet?
I was able to build using the binary download on both Windows 7 and RHEL 6
without issues.
I tried to create an EC2 cluster, but saw this:
~/spark-ec2
Initializing spark
~ ~/spark-ec2
ERROR: Unknown Spark version
Initializing shark
~ ~/spark-ec2
I just built rc5 on Windows 7 and tried to reproduce the problem described in
https://issues.apache.org/jira/browse/SPARK-1712
It works on my machine:
14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at console:17) finished
in 4.548 s
14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet
I built rc5 using sbt/sbt assembly on Linux without any problems.
There used to be an sbt.cmd for Windows build, has that been deprecated?
If so, I can document the Windows build steps that worked for me.
--
View this message in context:
with it efficiently and reliably.
Is there another solution for sorting arbitrarily large partitions? If not,
I don't mind developing and contributing a solution.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list
Andrew mentioned covers
the rdd.sortPartitions() use case. Can someone comment on the scope of
SPARK-983?
Thanks!
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java
#ContributingtoSpark-IDESetup
I can't seem to edit that page.
Confluence usually has a an Edit button in the upper right, but it does
not appear for me, even though I am logged in.
Am I missing something?
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache
to build *core* in Eclipse Kepler?
In my view, tool independence is a good thing.
I'll do what I can to support Eclipse.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Buidling-spark-in-Eclipse
How long does it take to get a spark context?
I found that if you don't have a network connection (reverse DNS lookup most
likely), it can take up 30 seconds to start up locally. I think a hosts file
entry is sufficient.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View
, overlapping Jira issues, we probably have to create a meta issue
and assign resources to fix it. I don't mind helping with that also.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale
might be a reason for their
success.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html
Sent from the Apache Spark Developers List mailing list archive
.
Just my $0.02
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Jira-tickets-for-starter-tasks-tp8102p8127.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
Thanks Patrick.
I've been testing some 1.2 features, looks good so far.
I have some example code that I think will be helpful for certain MR-style
use cases (secondary sort).
Can I still add that to the 1.2 documentation, or is that frozen at this
point?
-
--
Madhu
https://www.linkedin.com
-hadoop1.0.4.jar
Ran some of my 1.2 code successfully.
Review some docs, looks good.
spark-shell.cmd works as expected.
Env details:
sbtconfig.txt:
-Xmx1024M
-XX:MaxPermSize=256m
-XX:ReservedCodeCacheSize=128m
sbt --version
sbt launcher version 0.13.1
-
--
Madhu
https://www.linkedin.com
. The
declaration of Partition is throwing me off.
Thanks!
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804.html
Sent from the Apache Spark Developers List mailing list
I'll add this to the docs.
Thanks Patrick!
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804p9820.html
Sent from the Apache Spark Developers List mailing list archive
and
raise an alarm if it's getting too high. Even a warning on the console would
be better than a catastrophic OOM.
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Detecting-configuration-problems-tp1
, but that should be about
it.
Does 1.5.0 pick up HADOOP_INSTALL?
Wouldn't spark-shell --master local override that?
1.5 seemed to completely ignore --master local
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-developers-list
ore
:10: error: not found: value sqlContext
import sqlContext.implicits._
^
:10: error: not found: value sqlContext
import sqlContext.sql
^
-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-sp
Hi,
As I was going through spark source code, SizeEstimator
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
caught my eye. It's a very useful tool to do the size estimations on JVM
which helps in use cases like memory bounded cache.
It
wrote:
I think that your own tutorials and such should live on your blog. The
goal isn't to pull in a bunch of external docs to the site.
On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak phatak@gmail.com
wrote:
Hi,
As I was reading contributing to Spark wiki, it was mentioned that we
Hi,
As I was reading contributing to Spark wiki, it was mentioned that we can
contribute external links to spark tutorials. I have written many
http://blog.madhukaraphatak.com/categories/spark/ of them in my blog. It
will be great if someone can add it to the spark website.
Regards,
Madhukara
Hi,
I have provided a PR around 2 months back to improve the performance of
decision tree by allowing flexible user provided storage class for
intermediate data. I have posted few questions about handling backward
compatibility but there is no answers from long.
Can anybody help me to move this
Hi,
I am testing RandomForestClassification with 50gb of data which is cached
in memory. I have 64gb of ram, in which 28gb is used for original dataset
caching.
When I run random forest, it caches around 300GB of intermediate data which
un caches the original dataset. This caching is triggered
Hi,
I opened a jira.
https://issues.apache.org/jira/browse/SPARK-20723
Can some one have a look?
On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak <phatak@gmail.com> wrote:
> Hi,
>
> I am testing RandomForestClassification with 50gb of data which is cached
> in memory.
Hi,
As I am playing with structured streaming, I observed that window function
always requires a time column in input data.So that means it's event time.
Is it possible to old spark streaming style window function based on
processing time. I don't see any documentation on the same.
--
Regards,
e
>
> import org.apache.spark.sql.functions._
>
> ds.withColumn("processingTime", current_timestamp())
> .groupBy(window("processingTime", "1 minute"))
> .count()
>
>
> On Mon, Aug 28, 2017 at 5:46 AM, madhu phatak <phatak@gmail.com>
> wrote:
27 matches
Mail list logo