And it is an NoSuchMethodError, not a classnofound error
And default I think the spark is only compile against Hadoop 2.2?
For this issue itself, I just check the latest spark (1.3.0), its version can
work (because it package with a newer version of httpclient, I can see the
method is
See https://issues.apache.org/jira/browse/SPARK-6351
~ Jonathan
From: Shuai Zheng szheng.c...@gmail.commailto:szheng.c...@gmail.com
Date: Monday, March 16, 2015 at 11:46 AM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject:
I'm attempting to use the Spark Kinesis Connector, so I've added the following
dependency in my build.sbt:
libraryDependencies += org.apache.spark %% spark-streaming-kinesis-asl %
1.3.0
My app works fine with sbt run, but I can't seem to get sbt assembly to
work without failing with different
You can compute the standard deviations of the training data using
Statistics.colStats and then compare them with model coefficients to
compute feature importance. -Xiangrui
On Fri, Mar 13, 2015 at 11:35 AM, Natalia Connolly
natalia.v.conno...@gmail.com wrote:
Hello,
While running an
Here's my use case:
I read an array into an RDD and I use a hash partitioner to partition the RDD.
This is the array type: Array[(String, Iterable[(Long, Int)])]
topK:Array[(String, Iterable[(Long, Int)])] = ...
import org.apache.spark.HashPartitioner
val hashPartitioner=new HashPartitioner(10)
Actually, they should be INFO or DEBUG. Line search steps are
expected. You can configure log4j.properties to ignore those. A better
solution would be reporting this at
https://github.com/scalanlp/breeze/issues -Xiangrui
On Thu, Mar 12, 2015 at 5:46 PM, cjwang c...@cjwang.us wrote:
I am running
Try this:
val ratings = purchase.map { line =
line.split(',') match { case Array(user, item, rate) =
(user.toInt, item.toInt, rate.toFloat)
}.toDF(user, item, rate)
Doc for DataFrames:
http://spark.apache.org/docs/latest/sql-programming-guide.html
-Xiangrui
On Mon, Mar 16, 2015 at 9:08 AM,
Hi All,
I just upgrade the system to use version 1.3.0, but then the
sqlContext.parquetFile doesn't work with s3n. I have test the same code with
1.2.1 and it works.
A simple test running in spark-shell:
val parquetFile = sqlContext.parquetFile(s3n:///test/2.parq )
I see, but this is really a. big issue. anyway for me to work around? I try
to set the fs.default.name = s3n, but looks like it doesn't work.
I must upgrade to 1.3.0 because I face the package incompatible issue in
1.2.1, and if I must patch something, I rather go with latest version.
Hi,
Your're right, that is, graphx has already be included in a spark default
package.
As a first step, 'Analytics' seems to be suitable for your objective.
# ./bin/run-example graphx.Analytics pagerank graph-file
On Tue, Mar 17, 2015 at 2:21 AM, Khaled Ammar khaled.am...@gmail.com
wrote:
We will be including this fix in Spark 1.3.1 which we hope to make in the
next week or so.
On Mon, Mar 16, 2015 at 12:01 PM, Shuai Zheng szheng.c...@gmail.com wrote:
I see, but this is really a… big issue. anyway for me to work around? I
try to set the fs.default.name = s3n, but looks like it
If you are creating an assembly, make sure spark-streaming is marked as
provided. spark-streaming is already part of the spark installation so will
be present at run time. That might solve some of these, may be!?
TD
On Mon, Mar 16, 2015 at 11:30 AM, Kelly, Jonathan jonat...@amazon.com
wrote:
I just used random numbers.(My ML lib was spark-mllib_2.10-1.2.1)Please see the attached log. In the middle of the log, I dumped the data set before feeding into LogisticRegressionWithLBFGS. The first column false/true was the label (attribute “a”), and columns 2-5 (attributes “x”, “y”, “z”, and
Can you give use your SBT project? Minus the source codes if you don't wish
to expose them.
TD
On Mon, Mar 16, 2015 at 12:54 PM, Kelly, Jonathan jonat...@amazon.com
wrote:
Yes, I do have the following dependencies marked as provided:
libraryDependencies += org.apache.spark %% spark-core %
Hi, were you ever able to determine a satisfactory approach for this problem?
I have a similar situation and would prefer to execute the job directly from
java code within my jms listener and/or servlet container.
--
View this message in context:
Yes, I do have the following dependencies marked as provided:
libraryDependencies += org.apache.spark %% spark-core % 1.3.0 % provided
libraryDependencies += org.apache.spark %% spark-hive % 1.3.0 % provided
libraryDependencies += org.apache.spark %% spark-sql % 1.3.0 % provided
Here's build.sbt, minus blank lines for brevity, and without any of the
exclude/excludeAll options that I've attempted:
name := spark-sandbox
version := 1.0
scalaVersion := 2.10.4
resolvers += Akka Repository at http://repo.akka.io/releases/;
run in Compile = Defaults.runTask(fullClasspath in
Hi,
I try my first steps with Spark but I have problems to access Spark
running on my Linux server from my Mac.
I start Spark with sbin/start-all.sh
When I now open the website at port 8080 I see that all is running and I
can access Spark at port 7077 but this doesn't work.
I scanned the
Yes.. Auto restart is enabled in my low level consumer ..when there is some
unhandled exception comes...
Even if you see KafkaConsumer.java, for some cases ( like broker failure,
kafka leader changes etc ) it can even refresh the Consumer (The
Coordinator which talks to a Leader) which will
I used local[*]. The CPU hits about 80% when there are active jobs, then
it drops to about 13% and hand for a very long time.
Thanks,
David
On Mon, 16 Mar 2015 17:46 Akhil Das ak...@sigmoidanalytics.com wrote:
How many threads are you allocating while creating the sparkContext? like
local[4]
Hi,
Current implementation of map function in spark streaming looks as below.
def map[U: ClassTag](mapFunc: T = U): DStream[U] = {
new MappedDStream(this, context.sparkContext.clean(mapFunc))
}
It creates an instance of MappedDStream which is a subclass of DStream.
The same function can
Try setting SPARK_MASTER_IP and you need to use the Spark URI
(spark://yourlinuxhost:7077) as displayed in the top left corner of Spark
UI (running on port 8080). Also when you are connecting from your mac, make
sure your network/firewall isn't blocking any port between the two machines.
Thanks
Open sbin/slaves.sh and sbin/spark-daemon.sh and then look for ssh command,
pass the port argument to that command in your case *-p 58518* and save
those files, do a start-all.sh :)
Thanks
Best Regards
On Mon, Mar 16, 2015 at 1:37 PM, ZhuGe t...@outlook.com wrote:
Hi all:
I am new to spark
I have a 30GB gzip file (originally that is text file where each line
represents text document) in HDFS and Spark 1.2.0 under YARN cluster with 3
worker nodes with 64GB RAM and 4 cores on each node.
Replictaion factor for my file is 3.
I tried to implement simple pyspark script to parse this file
I have checked Dibyendu's code, it looks that his implementation has
auto-restart mechanism:
I set it in code, not by configuration. I submit my jar file to local. I am
working in my developer environment.
On Mon, 16 Mar 2015 18:28 Akhil Das ak...@sigmoidanalytics.com wrote:
How are you setting it? and how are you submitting the job?
Thanks
Best Regards
On Mon, Mar 16, 2015 at
By default spark.executor.memory is set to 512m, I'm assuming since you are
submiting the job using spark-submit and it is not able to override the
value since you are running in local mode. Can you try it without using
spark-submit as a standalone project?
Thanks
Best Regards
On Mon, Mar 16,
I think these two ways are both OK for you to write streaming job, `transform`
is a more general way for you to transform from one DStream to another if
there’s no related DStream API (but have related RDD API). But using map maybe
more straightforward and easy to understand.
Thanks
Jerry
Hi Akhil,
Yes, you are right. If I ran the program from IDE as a normal java program,
the executor's memory is increased...but not to 2048m, it is set to
6.7GB...Looks like there's some formula to calculate this value.
Thanks,
David
On Mon, Mar 16, 2015 at 7:36 PM Akhil Das
1. I don't think textFile is capable of unpacking a .gz file. You need to
use hadoopFile or newAPIHadoop file for this.
2. Instead of map, do a mapPartitions
3. You need to open the driver UI and see what's really taking time. If
that is running on a remote machine and you are not able to access
How much memory are you having on your machine? I think default value is
0.6 of the spark.executor.memory as you can see from here
http://spark.apache.org/docs/1.2.1/configuration.html#execution-behavior.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 2:26 PM, Xi Shen davidshe...@gmail.com wrote:
I set spark.executor.memory to 2048m. If the executor storage memory is
0.6 of executor memory, it should be 2g * 0.6 = 1.2g.
My machine has 56GB memory, and 0.6 of that should be 33.6G...I hate math xD
On Mon, Mar 16, 2015 at 7:59 PM Akhil Das ak...@sigmoidanalytics.com
wrote:
How much
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Strange, even i'm having it while running in local mode.
[image: Inline image 1]
I set it as .set(spark.executor.memory, 1g)
Thanks
Best Regards
On Mon, Mar 16, 2015 at 2:43 PM, Xi Shen davidshe...@gmail.com wrote:
I set spark.executor.memory to 2048m. If the executor storage memory
is 0.6
Hi,
I created a JIRA and PR for supporting a s3 friendly output committer for
saveAsParquetFile:
https://issues.apache.org/jira/browse/SPARK-6352
https://github.com/apache/spark/pull/5042
My approach is add a DirectParquetOutputCommitter class in spark-sql
package and use a boolean config
101 - 135 of 135 matches
Mail list logo