You are essential doing document clustering. K-means will do it. You do have to
specify the number of clusters up front.
Sent from Email+ secured by MobileIron
From: "Donni Khan"
mailto:prince.don...@googlemail.com>>
Date: Monday, November 27, 2017 at 7:27:33
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Thanks. I wonder why this is not widely reported in the user forum. The RELP
shell is basically broken in 1.5 .0 and 1.5.1
-Yao
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, October 25, 2015 12:01 PM
To: Ge, Yao (Y.)
Cc: user
Subject: Re: Spark scala REPL - Unable to create sqlContext
I have not been able to run spark-shell in yarn-cluster mode since 1.5.0 due to
the same issue described by [SPARK-9776]. Did this pull request fix the issue?
https://github.com/apache/spark/pull/8947
I still have the same problem with 1.5.1 (I am running on HDP 2.2.6 with Hadoop
2.6)
Thanks.
-Y
I'm managing Spark Streaming applications which run on Cloud Dataproc
(https://cloud.google.com/dataproc/). Spark Streaming applications running
on a Cloud Dataproc cluster seem to run in client mode on YARN.
Some of my applications sometimes stop due to the application failure.
I'd like YARN to
When I access the following URL, I often get a 404 error and I cannot get the
POM file of "spark-streaming_2.10-1.5.0.pom".
http://central.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
Are there any problems inside the maven repository? Are there any
Thank you for your reply.
I'm sorry confirmation is slow.
I'll try the tuning 'spark.yarn.executor.memoryOverhead'.
Thanks,
Yuichiro Sakamoto
On 2015/03/25 0:56, Sandy Ryza wrote:
Hi Yuichiro,
The way to avoid this is to boost spark.yarn.executor.memoryOverhead until the
executors have enou
Hello.
I tried `count()`, then `userJavaRDD` and `productJavaRDD` were cached,
and the speed became faster.
Thank you.
On 2015/03/10 4:05, Xiangrui Meng wrote:
cache() is lazy. The data is stored into memory after the first time
it gets materialized. So the first time you call `predict` after
, January 29, 2015 8:02 PM
To: Sun, Vincent Y
Cc: user@spark.apache.org
Subject: Re: Connecting Cassandra by unknow host
Hi,
I am no expert but have a small application working with Spark and Cassandra.
I faced these issues when we were deploying our cluster on EC2 instances with
some machines on public
Thanks. The data is there, I have checked the row count and dump to file.
-Vincent
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, February 05, 2015 2:28 PM
To: Sun, Vincent Y
Cc: user
Subject: Re: get null potiner exception newAPIHadoopRDD.map()
Is it possible that value.get
Can anyone provide an example code of using Categorical Features in Decision
Tree?
Thanks!
-Yao
I am testing decision tree using iris.scale data set
(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris)
In the data set there are three class labels 1, 2, and 3. However in the
following code, I have to make numClasses = 4. I will get an
ArrayIndexOutOfBound Exception
(RemoteTestRunner.java:197)
From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Sunday, October 19, 2014 10:31 AM
To: Ge, Yao (Y.); user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp
Can you provide the exception stack?
Thanks,
Daoyuan
From: Ge, Yao (Y.) [mailto:y...@ford.com
I am working with Spark 1.1.0 and I believe Timestamp is a supported data type
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp
when I try to use reflection to register a Java Bean with Timestamp field.
Anything wrong with my code below?
public static
I need help to better trap Exception in the map functions. What is the best way
to catch the exception and provide some helpful diagnostic information such as
source of the input such as file name (and ideally line number if I am
processing a text file)?
-Yao
much Sean!
-Yao
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Thursday, October 09, 2014 3:04 AM
To: Ge, Yao (Y.)
Cc: user@spark.apache.org
Subject: Re: Dedup
I think the question is about copying the argument. If it's an immutable value
like String, yes
I need to do deduplication processing in Spark. The current plan is to generate
a tuple where key is the dedup criteria and value is the original input. I am
thinking to use reduceByKey to discard duplicate values. If I do that, can I
simply return the first argument or should I return a copy of
Hello,
I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply read
from hbase. The Java code is attached. However the problem is TableInputFormat
does not even exist in hbase-client API, is there any other way I can read from
hbase? Thanks
SparkConf sconf = new SparkConf().set
array will need to be in ascending order.
In many cases, it probably easier to use other two forms of Vectors.sparse
functions if the indices and value positions are not naturally sorted.
-Yao
From: Ge, Yao (Y.)
Sent: Monday, August 11, 2014 11:44 PM
To: 'u...@spark.incubator.apach
I am trying to train a KMeans model with sparse vector with Spark 1.0.1.
When I run the training I got the following exception:
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:221)
at
org.apache.spark.mllib.util.MLUtils$.
20 matches
Mail list logo