Yes, I mean local here. Thanks for pointing this out. Also thanks for
explaining the problem.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-NoClassDefFoundError-is-this-a-bug-tp18972p19011.html
Sent from the Apache Spark Developers List
Did you try the proposed fix? Would be good to know whether it fixes the
issue.
On Thu, Sep 22, 2016 at 2:49 PM, Asher Krim wrote:
> Does anyone know what the status of SPARK-15717 is? It's a simple enough
> looking PR, but there has been no activity on it since June 16th.
>
Does anyone know what the status of SPARK-15717 is? It's a simple enough
looking PR, but there has been no activity on it since June 16th.
I believe that we are hitting that bug with checkpointed distributed LDA.
It's a blocker for us and we would really appreciate getting it fixed.
Jira:
Hash codes should try to avoid collisions of objects that are not
equal. Integer overflowing is not an issue by itself
On Wed, Sep 21, 2016 at 10:49 PM, WangJianfei
wrote:
> Than you very much sir! but what i want to know is whether the hashcode
> overflow will
I looked into this and found the problem. Will send a PR now to fix this.
If you are curious about what is happening here: When we build the
docs separately we don't have the JAR files from the Spark build in
the same tree. We added a new set of docs recently in SparkR called an
R vignette that
FWIW it worked for me, but I may not be executing the same thing. I
was running the commands given in R/DOCUMENTATION.md
It succeeded for me in creating the vignette, on branch-2.0.
Maybe it's a version or library issue? what R do you have installed,
and are you up to date with packages like
Hi,
I have a Spark resource scheduling order question when I read this code:
github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
In function schedule(), spark start drivers first, then start executors.
I’m wondering why we schedule in this order?
Hi,
I've just discovered* that I can SerDe my case classes. What a nice
feature which I can use in spark-shell, too! Thanks a lot for offering
me so much fun!
What I don't really like about the code is the following part (esp.
that it conflicts with the implicit for Column):
import
I am planning to write a thesis on certain aspects (i.e testing, performance
optimisation, security) of Apache Spark. I need to study some projects that
are based on Apache Spark and are available as open source.
If you know any such project (open source Spark based project), Please share
it
There can be just one published version of the Spark artifacts and they
have to depend on something, though in truth they'd be binary-compatible
with anything 2.2+. So you merely manage the dependency versions up to the
desired version in your .
On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
You should take also into account that spark has different option to represent
data in-memory, such as Java serialized objects, Kyro serialized, Tungsten
(columnar optionally compressed) etc. the tungsten thing depends heavily on the
underlying data and sorting especially if compressed.
Then,
zipWithIndex is fine. It will give you unique row IDs across your various
partitions.
You can also use zipWithUniqueId which saves an extra job that is fired by
zipWithIndex. However, there are some differences as to how indexes are
assigned to the row. You can read more about the two APIs in the
I'm working on packaging 2.0.1 rc but encountered a problem: R doc fails to
build. Can somebody take a look at the issue ASAP?
** knitting documentation of write.parquet
** knitting documentation of write.text
** knitting documentation of year
~/workspace/spark-release-docs/spark/R
I am working on profiling TPCH queries for Spark 2.0. I see lot of
temporary object creation (sometimes size as much as the data size) which
is justified for the kind of processing Spark does. But, from production
perspective, is there a guideline on how much memory should be allocated
for
14 matches
Mail list logo