Hi all
I run spark on mesos cluster, and meet a problem : when I send 6 spark
drivers *at the same time*, I can get the Information on node3:8081 that
there are 4 drivers in "Launched Drivers" and 2 in "Queueed Drivers". On
mesos:5050, I can see there are 4 active tasks are running, but each task
We are happy to announce the availability of Spark 1.6.2! This maintenance
release includes fixes across several areas of Spark. You can find the list
of changes here: https://s.apache.org/spark-1.6.2
And download the release here: http://spark.apache.org/downloads.html
I tried setting the classpath explicitly in the settings. Classpath gets
printed properly, it has the scala jars in it like
scala-compiler-2.10.4.jar, scala-library-2.10.4.jar.
It did not help. Still runs great with IntelliJ, but runs into issues when
running from the command line.
val cl = t
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hello,
I'm trying to run scala code in a Web Application.
It runs great when I am running it in IntelliJ
Run into error when I run it from the command line.
Command used to run
--
java -Dscala.usejavacp=true -jar target/XYZ.war
--spring.config.name=application,db,l
I guess it has to do with the Tungsten explicit memory management that
builds on sun.misc.Unsafe. The "ConvertToUnsafe" class converts
Java-object-based rows into UnsafeRows, which has the Spark internal
memory-efficient format.
Here is the related code in 1.6:
ConvertToUnsafe is defined in:
http
I notice that there is a dependency from the SparkContext on the
"createLiveUI" functionality.
Is that really required? Or is there a more minimal JAvaSparkContext we
can create?
Im packaging a jar with a spark client and would rather avoid resource/
dependencys as they might be trickier to main
Yes, Alluxio (http://www.alluxio.org/) can be used to store data in-memory
between stages in a pipeline.
Here is more information about running Spark with Alluxio:
http://www.alluxio.org/documentation/v1.1.0/en/Running-Spark-on-Alluxio.html
Hope that helps,
Gene
On Mon, Jun 27, 2016 at 10:38 AM,
MapWithState would not restore from checkpoint. MapRDD code requires non
empty spark contexts, while the context is empty.
ERROR 2016-06-27 11:06:33,236 0 org.apache.spark.streaming.StreamingContext
[run-main-0] Error starting the context, marking it as stopped
org.apache.spark.SparkException: RD
Hi all,
We have some problems while implementing custom Transformers in JAVA (SPARK
1.6.1).
We do override the method copy, but it crashes with an AbstractMethodError.
If we extends the UnaryTransformer, and do not override the copy method, it
works without any error.
We tried to wri
Alluxio off heap memory would help to share cached objects
On Mon, Jun 27, 2016 at 11:14 AM Everett Anderson
wrote:
> Hi,
>
> We have a pipeline of components strung together via Airflow running on
> AWS. Some of them are implemented in Spark, but some aren't. Generally they
> can all talk to a J
Yes I have just realized that the code I was reading was in the
org.apache.spark package related to customer receiver implementations.
Thanks.
Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
Twitter : @ppatierno
Linkedin : pa
AFAICT Utils is private:
private[spark] object Utils extends Logging {
So is Logging:
private[spark] trait Logging {
FYI
On Mon, Jun 27, 2016 at 8:20 AM, Paolo Patierno wrote:
> Hello,
>
> I'm trying to use the Utils.createTempDir() method importing
> org.apache.spark.util.Utils but the scal
Can you show the stack trace for encoding error(s) ?
Have you looked at the following test which involves NestedArray of
primitive type ?
./sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
Cheers
On Mon, Jun 27, 2016 at 8:50 AM, Daniel Imberman
wr
Hi,
We have a pipeline of components strung together via Airflow running on
AWS. Some of them are implemented in Spark, but some aren't. Generally they
can all talk to a JDBC/ODBC end point or read/write files from S3.
Ideally, we wouldn't suffer the I/O cost of writing all the data to HDFS or
S3
Hi all,
So I've been attempting to reformat a project I'm working on to use the
Dataset API and have been having some issues with encoding errors. From
what I've read, I think that I should be able to store Arrays of primitive
values in a dataset. However, the following class gives me encoding err
Hi, all:
Recently, we are trying to compare with spark sql and hive on MR, and I
have tried to run spark (spark1.6 rc2) sql with script transformation, the
spark job faild and get an error message like:
16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code
generated in 19.054534 ms
16
Hello,
I'm trying to use the Utils.createTempDir() method importing
org.apache.spark.util.Utils but the scala compiler says me that :
object Utils in package util cannot be accessed in package org.apache.spark.util
I'm facing the same problem with Logging.
My sbt file has following dependency
Hi All,
I did worked on spark installed on Hadoop cluster but never worked on spark
on standalone cluster.
My question how to set number of partitions in spark when it's running on
spark standalone cluster?
If spark on Hadoop I calculate my formula using hdfs block sizes but how I
calculate with
This might not help, but I once tried Spark's Random Forest on a Kaggle
competition, and its predictions were terrible compared to R. So maybe you
should rather look for an external library instead of using MLLib's Random
Forest.
—
http://mariussoutier.com/blog
> On 27.06.2016, at 07:47, Ne
Hi,
I have been trying to run some algorithms i have implemented using GraphX
and Spark.
I have been running these algorithms locally by starting a local spark
instance through IntelliJ (in scala).
However when I try to run them on a cluster with 10 machines I get
java.lang.ClassNotFoundException
Can't you use `transform` instead of `foreachRDD`?
> On 15.06.2016, at 15:18, Matthias Niehoff
> wrote:
>
> Hi,
>
> i want to subtract 2 DStreams (based on the same Input Stream) to get all
> elements that exist in the original stream, but not in the modified stream
> (the modified Stream i
I'm using SparkSQL to make fact table out of 5 dimensions. I'm facing
performance issue (job is taking several hours to complete), and even after
exhaustive googleing I see no solution. These are settings I have tried turing,
but no sucess.
sqlContext.sql("set spark.sql.shuffle.partitions=10")
I'm using SparkSQL to make fact table out of 5 dimensions. I'm facing
performance issue (job is taking several hours to complete), and even after
exhaustive googleing I see no solution. These are settings I have tried
turing, but no sucess.
sqlContext.sql("set spark.sql.shuffle.partitions=10"); /
Hi,
I have done some extensive tests with Spark querying Hive tables.
It appears to me that Spark does not rely on statistics that are collected
by Hive on say ORC tables. It seems that Spark uses its own optimization to
query the Hive tables irrespective of Hive has collected by way of
statistic
I see below stack trace when trying to run beeline command. I'm using
JDK 7.
Anything wrong? Much thanks!
==
D:\spark\download\spark-1.6.1-bin-hadoop2.4>bin\beeline
Beeline version 1.6.1 by Apache Hive
Exception in thread "main" java.lang.NoSuchMethodError:
org.fusesource.jansi.interna
Hi all!
I am learning Spark SQL and window functions.
The behavior of the last() window function was unexpected for me in one
case(for a person without any previous experience in the window functions).
I define my window specification as follows:
Window.partitionBy('transportType, 'route).orderBy
OK. I see that, but the current (provided) implementations are very naive -
Sum, Count, Average -let's take Max for example: I guess zero() would be
set to some value like Long.MIN_VALUE, but what if you trigger (I assume in
the future Spark streaming will support time-based triggers) for a result
29 matches
Mail list logo