Thanks, Ted.
Util.Connection.close() should be called only once, so it can NOT be in a
map function
val result = rdd.map(line = {
val table = Util.Connection.getTable(user)
...
Util.Connection.close()
}
As you mentioned:
Calling table.close() is the recommended approach.
I have a spark cluster on mesos and when I run long running GraphX processing
I receive a lot of the following two errors and one by one my slaves stop
doing any work for the process until its idle. Any idea what is happening?
First type of error message:
INFO SendingConnection: Initiating
I may have misunderstood your point.
val result = rdd.map(line = {
val table = Util.Connection.getTable(user)
...
table.close()
}
Did you mean this is enough, and there’s no need to call
Util.Connection.close(),
or HConnectionManager.deleteAllConnections()?
Where is the documentation that
Indeed it was a problem on the executor side… I have to figure out how to fix
it now ;-)
Thanks!
Mehdi
De : Yana Kadiyska [mailto:yana.kadiy...@gmail.com]
Envoyé : mercredi 15 octobre 2014 18:32
À : Mehdi Singer
Cc : user@spark.apache.org
Objet : Re: Problem executing Spark via JBoss
Do you create the application in context of the web service call? Then the
application maybe killed after you return from the web service call.
However, we would need to see what you do during the web service call, how
you invoke the spark application
Le 16 oct. 2014 08:50, Mehdi Singer
Thanks, Soumitra Kumar,
I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add
hbase-protocol.jar, hbase-common.jar, hbase-client.jar, htrace-core.jar in
--jar, but it did work.
Actually, I put all these four jars in SPARK_CLASSPATH along with HBase conf
directory.
I solved my problem. It was due to a library version used by Spark
(snappy-java) that is apparently not compatible with JBoss... I updated the lib
version and it's working now.
Jörn, this is what I'm doing in my web service call:
- Create the Spark context
- Create my JavaJdbcRDD
- Count the
Wow, it really was that easy! The implicit joining works a treat.
Many thanks,
Jon
On 13 October 2014 22:58, Stephen Boesch java...@gmail.com wrote:
is the following what you are looking for?
scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
res2:
Hi,
I have created a JIRA
(SPARK-3967https://issues.apache.org/jira/browse/SPARK-3967), can you please
confirm that you are hit by the same issue?
Thanks,
Christophe.
On 15/10/2014 09:49, Christophe Préaud wrote:
Hi Jimmy,
Did you try my patch?
The problem on my side was that the
Hi,
I have been able to reproduce this problem on our dev environment, I am fairly
sure now that it is indeed a bug.
As a consequence, I have created a JIRA
(SPARK-3967https://issues.apache.org/jira/browse/SPARK-3967) for this issue,
which is triggered when yarn.nodemanager.local-dirs (not
Hi,
I don't think anybody answered this question...
fintis wrote
How do I match the principal components to the actual features since there
is some sorting?
Would anybody be able to shed a little light on it since I too am struggling
with this?
Many thanks!!
--
View this message in
I'm running Spark1.1.0 on YARN(Hadoop-2.4.1) and try to use
spark.yarn.appMasterEnv.* to execute some scripts.
In spark-default.conf, I set environment variables like this, but this
description is redundant.
spark.yarn.appMasterEnv.SCRIPT_DIR /home/kuromtsu/spark-1.1.0/scripts
you can do score.print to see the values, and if you want to do some
operations with these values then you have to do a map on that dstream
(score.map(myInt = myInt + 5))
Thanks
Best Regards
On Thu, Oct 16, 2014 at 5:19 AM, SK skrishna...@gmail.com wrote:
Hi,
As a result of a reduction
Hello Owen,
I used maven build to make use of the guava collections package renaming,
sbt keeps the old Guava package names intact...
Finally it turned out that I have just upgraded to the latest version of
spark-cassandra-connector: 1.1.0-alpha3 and when I step back to
1.1.0-alpha2 everything
Hi,
I am writting to know if there is any performance data on GraphX? I run 4
workes in AWS (c3.xlarge), 4g memory for executor, 85,331,846 edges from(
http://socialcomputing.asu.edu/pages/dataset
http://socialcomputing.asu.edu/pages/datasetss). For PageRank algorithm,
the job can not be
Support for dynamic partitioning is available in master and will be part of
Spark 1.2
On Thu, Oct 16, 2014 at 1:08 AM, Banias H banias4sp...@gmail.com wrote:
I got tipped by an expert that the error of Unsupported language
features in query that I had was due to the fact that SparkSQL does not
Hello all,
I am trying to unit test my classes involved my Spark job. I am trying to
mock out the Spark classes (like SparkContext and Broadcast) so that I can
unit test my classes in isolation. However I have realised that these are
classes instead of traits. My first question is why?
It is
The warehouse location need to be specified before the |HiveContext|
initialization, you can set it via:
|./bin/spark-sql --hiveconf
hive.metastore.warehouse.dir=/home/spark/hive/warehouse
|
On 10/15/14 8:55 PM, Hao Ren wrote:
Hi,
The following query in sparkSQL 1.1.0 CLI doesn't work.
On 10/16/14 12:44 PM, neeraj wrote:
I would like to reiterate that I don't have Hive installed on the Hadoop
cluster.
I have some queries on following comment from Cheng Lian-2:
The Thrift server is used to interact with existing Hive data, and thus
needs Hive Metastore to access Hive catalog.
Why do you need to convert a JavaSchemaRDD to SchemaRDD? Are you trying
to use some API that doesn't exist in JavaSchemaRDD?
On 10/15/14 5:50 PM, Earthson wrote:
I don't know why the JavaSchemaRDD.baseSchemaRDD is private[sql]. And I found
that DataTypeConversions is protected[sql].
Finally I
Hello Terry,
I guess you hit this bug https://issues.apache.org/jira/browse/SPARK-3559.
The list of needed column ids was messed up. Can you try the master branch
or apply the code change
https://github.com/apache/spark/commit/e10d71e7e58bf2ec0f1942cb2f0602396ab866b4
to
your 1.1 and see if the
We execute Spark jobs from a Play application but we don't use
spark-submit. I don't know if you really want to use spark-submit, but if
not you can just create a SparkContext programmatically in your app.
In development I typically run Spark locally. Creating the Spark context is
pretty trivial:
Mocking these things is difficult; executing your unit tests in a local
Spark context is preferred, as recommended in the programming guide
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing.
I know this may not technically be a unit test, but it is hopefully close
enough.
Hi,
I'm exploring an exercise Data Exploratin using Spark SQL from Spark Summit
2014. While running command val wikiData =
sqlContext.parquetFile(data/wiki_parquet).. I'm getting the following
output which doesn't match with the expected output.
*Output i'm getting*:
val wikiData1 =
I just want to pitch in and say that I ran into the same problem with
running with 64GB executors. For example, some of the tasks take 5 minutes
to execute, out of which 4 minutes are spent in GC. I'll try out smaller
executors.
On Mon, Oct 6, 2014 at 6:35 PM, Otis Gospodnetic
Hi,
Does Spark SQL have DDL, DML commands to be executed directly. If yes,
please share the link.
If No, please help me understand why is it not there?
Regards,
Neeraj
--
View this message in context:
Which hbase release are you using ?
Let me refer to 0.94 code hbase.
Take a look at the following method
in src/main/java/org/apache/hadoop/hbase/client/HTable.java :
public void close() throws IOException {
...
if (cleanupConnectionOnClose) {
if (this.connection != null) {
1. I'm trying to use Spark SQL as data source.. is it possible?
2. Please share the link of ODBC/ JDBC drivers at databricks.. i'm not able
to find the same.
--
View this message in context:
Hi,
Can anyone explain how things get captured in a closure when runing through
the REPL. For example:
def foo(..) = { .. }
rdd.map(foo)
sometimes complains about classes not being serializable that are
completely unrelated to foo. This happens even when I write it such:
object Foo {
def
On 10/16/14 10:48 PM, neeraj wrote:
1. I'm trying to use Spark SQL as data source.. is it possible?
Unfortunately Spark SQL ODBC/JDBC support are based on the Thrift
server, so at least you need HDFS and a working Hive Metastore instance
(used to persist catalogs) to make things work.
2.
Hi Neeraj,
The Spark Summit 2014 tutorial uses Spark 1.0. I guess you're using
Spark 1.1? Parquet support got polished quite a bit since then, and
changed the string representation of the query plan, but this output
should be OK :)
Cheng
On 10/16/14 10:45 PM, neeraj wrote:
Hi,
I'm
Hi,
I'm working on a problem where I'd like to sum items in an RDD *in order (*
approximately*)*. I am currently trying to implement this using a fold, but
I'm having some issues because the sorting key of my data is not the same
as the folding key for my data. I have data that looks like this:
Hi,
I'm running into an error on Windows (x64, 8.1) running Spark 1.1.0 (pre-builet
for Hadoop 2.4:
spark-1.1.0-bin-hadoop2.4.tgzhttp://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-hadoop2.4.tgz)
with Java SE Version 8 Update 20 (build 1.8.0_20-b26); just getting started
with Spark.
When
what is your meaning of executed directly”?
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 16, 2014, at 22:50, neeraj neeraj_gar...@infosys.com wrote:
Hi,
Does Spark SQL have DDL, DML commands to be executed directly. If yes,
please share the link.
If No, please help me
Hi, my programming model requires me to generate multiple RDDs for various
datasets across a single run and then run an action on it - E.g.
MyFunc myFunc = ... //It implements VoidFunction
//set some extra variables - all serializable
...
for (JavaRDDString rdd: rddList) {
...
I'm relatively new to Spark and have got a couple of questions:
*
I've got an IntelliJ SBT project that's using Spark Streaming with a
custom RabbitMQ receiver in the same project. When I run it against
local[2], all's well. When I put in spark://masterip:7077, I get a
ClassNotFoundException
Excuse me - the line inside the loop should read: rdd.foreach(myFunc) - not
sc.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580p16581.html
Sent from the Apache
Hi Michael,
I'm not sure I fully understood your question, but I think RDD.aggregate
can be helpful in your case. You can see it as a more general version of
fold.
Cheng
On 10/16/14 11:15 PM, Michael Misiewicz wrote:
Hi,
I'm working on a problem where I'd like to sum items in an RDD /in
I guess you're referring to the simple SQL dialect recognized by the
SqlParser component.
Spark SQL supports most DDL and DML of Hive. But the simple SQL dialect
is still very limited. Usually it's used together with some Spark
application written in Java/Scala/Python. Within a Spark
You can first union them into a single RDD and then call |foreach|. In
Scala:
|rddList.reduce(_.union(_)).foreach(myFunc)
|
For the serialization issue, I don’t have any clue unless more code can
be shared.
On 10/16/14 11:39 PM, /soumya/ wrote:
Hi, my programming model requires me to
Great, it worked.
I don't have an answer what is special about SPARK_CLASSPATH vs --jars, just
found the working setting through trial an error.
- Original Message -
From: Fengyun RAO raofeng...@gmail.com
To: Soumitra Kumar kumar.soumi...@gmail.com
Cc: user@spark.apache.org,
Thanks for the suggestion! That does look really helpful, I see what you
mean about it being more general than fold. I think I will replace my fold
with aggregate - it should give me more control over the process.
I think the problem will still exist though - which is that I can't get the
correct
I note that one of the listing variants of aggregateByKey accepts a
partitioner as an argument:
def aggregateByKey[U](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V)
⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
Would it be possible to extract my sorted parent's
It's a bug, could you file a JIRA for this? Thanks!
Davies
On Thu, Oct 16, 2014 at 8:28 AM, Griffiths, Michael (NYC-RPM)
michael.griffi...@reprisemedia.com wrote:
Hi,
I’m running into an error on Windows (x64, 8.1) running Spark 1.1.0
(pre-builet for Hadoop 2.4:
RDD.aggregate doesn’t require the RDD elements to be pairs, so you don’t
need to use user_id to be the key or the RDD. For example, you can use
an empty Map as the zero value of the aggregation. The key of the Map is
the user_id you extracted from each tuple, and the value is the
aggregated
Mohammed,
Jumping in for Daniel, we actually address the configuration issue by
pulling values from environment variables or command line options. Maybe
that may handle at least some of your needs.
For the akka issue, here is the akka version we include in build.sbt:
com.typesafe.akka %%
Hi Rafal,
Thanks for the explanation and solution! I need to write maybe 100 GB to
s3. I will try your way and see whether it works for me.
Thanks again!
On Wed, Oct 15, 2014 at 1:44 AM, Rafal Kwasny m...@entropy.be wrote:
Hi,
How large is the dataset you're saving into S3?
Actually saving
Hi,
I am trying to use ALS.trainImplicit method in the
pyspark.mllib.recommendation. However it didn't work. So I tried use the
example in the python API documentation such as:
/r1 = (1, 1, 1.0)
r2 = (1, 2, 2.0)
r3 = (2, 1, 2.0)
ratings = sc.parallelize([r1, r2, r3])
model =
Does anyone know if there Spark assemblies are created and available for
download that have been built for CDH5 and YARN?
Thanks,
Philip
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands,
Sorry - I'll furnish some details below. However, union is not an option for
the business logic I have. The function will generate a specific file based
on a variable passed in as the setter for the function. This variable
changes with each RDD. I annotated the log line where the first run
Hi,
You just need add list() in the sorted function.
For example,
map((lambda (x,y): (x, (list(y[0]), list(y[1],
sorted(list(rdd1.cogroup(rdd2).collect(
I think you just forget the list...
PS: your post has NOT been accepted by the mailing list yet.
Best
Gen
pm wrote
Hi ,
I actually only ran into this issue recently after we upgraded to Spark
1.1. Within the REPL for Spark 1.0 everything works fine but within the
REPL for 1.1, it is not. FYI I am also only doing simple regex matching
functions within an RDD... Now when I am running the same code as App
everything
We integrated Spark into Play and use SparkSQL extensively on an ec2 spark
cluster on Hadoop hdfs 1.2.1 and tachyon 0.4.
Step 1: Create a play scala application as usual
Step 2. In Build.sbt put all your spark dependencies. What works for us is Play
2.2.3 Scala 2.10.4 Spark 1.1. We have Akka
hello... what is the best way to iterate through an rdd backward (last
element first, first element last)? thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/reverse-an-rdd-tp16602.html
Sent from the Apache Spark User List mailing list archive at
Thanks Akhil. I tried spark-submit and saw the same issue. I double checked
the versions and they look ok. Are you seeing any obvious issues?
sbt:
name := Simple Project
version := 1.1
scalaVersion := 2.10.4
libraryDependencies ++= Seq(
org.apache.spark %% spark-core % 1.1.0,
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-assembly_2.10/
?
I'm not sure why the 5.2 + 1.1 final artifacts don't show up there yet though.
On Thu, Oct 16, 2014 at 2:12 PM, Philip Ogren philip.og...@oracle.com wrote:
Does anyone know if there Spark
Can you try:
sbt:
name := Simple Project
version := 1.1
scalaVersion := 2.10.4
libraryDependencies ++= Seq(
org.apache.spark %% spark-core % 1.1.0,
org.apache.spark %% spark-streaming % 1.1.0,
org.apache.spark %% spark-streaming-kafka % 1.1.0
)
Thanks
Best Regards
On
Since you're concerned with the particular ordering, you will need to
sort your RDD to ensure the ordering you have in mind. Simply reverse
the Ordering with Ordering.reverse() and sort by that instead, and
then use toLocalIterator() I suppose.
Depending on what you're really trying to achieve,
I tried the same data with scala. It works pretty well.
It seems that it is the problem of pyspark.
In the console, it shows the following logs:
Traceback (most recent call last):
File stdin, line 1, in module
* File /root/spark/python/pyspark/mllib/recommendation.py, line 76, in
trainImplicit
It seems a bug, Could you create a JIRA for it? thanks!
Davies
On Thu, Oct 16, 2014 at 12:27 PM, Gen gen.tan...@gmail.com wrote:
I tried the same data with scala. It works pretty well.
It seems that it is the problem of pyspark.
In the console, it shows the following logs:
Traceback (most
Hi,
I just wanted to say hi all to the Spark community. I'm developing some
stuff right now using Spark (we've started very recently). As the API
documentation of Spark is really really good, I like to get deeper
knowledge of the internal stuff -you know, the goodies. Watching movies
from Spark
Just to have this clear, can you answer with quick yes or no:
Does it mean that when I create RDD from a file and I simply iterate
through it like this:
sc.textFile(some_text_file.txt).foreach(line = println(line))
then the actual lines might come in different order then they are in the
file?
Nevermind, I've just run the code in the REPL. Indeed if we do not sort,
then the order is totally random. Which actually makes sens if you think
about it
On Thu, Oct 16, 2014 at 9:58 PM, Paweł Szulc paul.sz...@gmail.com wrote:
Just to have this clear, can you answer with quick yes or no:
Hello,
I am debugging my code to find out what else to cache.
Following is a line in log:
14/10/16 12:00:01 INFO TransformedDStream: Persisting RDD 6 for time
141348600 ms to StorageLevel(true, true, false, false, 1) at time
141348600 ms
Is there a way to name a DStream? RDD has a
Thanks, Suren and Raju.
Raju – if I remember correctly, Play package command just creates a jar for
your app. That jar file will not include other dependencies. So it is not
really a full jar as you mentioned below. So how you are passing all the other
dependency jars to spark? Can you share
Does anyone know anything re: this error? Thank you!
On Wed, Oct 15, 2014 at 3:38 PM, Jimmy Li jimmy...@bluelabs.com wrote:
Hi there, I'm running spark on ec2, and am running into an error there
that I don't get locally. Here's the error:
11335 [handle-read-write-executor-3] ERROR
Thanks for the response. Appreciate the help!
Burke
On Tue, Oct 14, 2014 at 3:00 PM, Xiangrui Meng men...@gmail.com wrote:
You cannot recover the document from the TF-IDF vector, because
HashingTF is not reversible. You can assign each document a unique ID,
and join back the result after
Hi Philip,
The assemblies are part of the CDH distribution. You can get them here:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
As of Spark 1.1 (and, thus, CDH 5.2), assemblies are not published to
maven repositories anymore (you can see commit [1] for details).
[1]
TL;DR - a spark SQL job fails with an OOM (Out of heap space) error. If
given --executor-memory values, it won't even start. Even (!) if the
values given ARE THE SAME AS THE DEFAULT.
Without --executor-memory:
14/10/16 17:14:58 INFO TaskSetManager: Serialized task 1.0:64 as 14710
bytes in 1
Same error. I saw someone reported the same issue, e.g.
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-kafka-error-td9106.html
Should I use sbt assembly? It failed for deduplicate though.
error] (*:assembly) deduplicate: different file contents found in the
following:
On Thu, Oct 16, 2014 at 9:53 AM, Gen gen.tan...@gmail.com wrote:
Hi,
I am trying to use ALS.trainImplicit method in the
pyspark.mllib.recommendation. However it didn't work. So I tried use the
example in the python API documentation such as:
/r1 = (1, 1, 1.0)
r2 = (1, 2, 2.0)
r3 = (2, 1,
Could you post the code that have problem with pyspark? thanks!
Davies
On Thu, Oct 16, 2014 at 12:27 PM, Gen gen.tan...@gmail.com wrote:
I tried the same data with scala. It works pretty well.
It seems that it is the problem of pyspark.
In the console, it shows the following logs:
Traceback
The plan is to create an EC2 cluster and run the (py) spark on it. Input data
is from s3, output data goes to an hbase in a persistent cluster (also EC2).
My questions are:
1. I need to install some software packages on all the workers (sudo apt-get
install ...). Is there a better way to do this
hello... does anyone know how to resolve this issue? i'm running this
locally on my computer. keep getting this BindException. much appreciated.
14/10/16 17:48:13 WARN component.AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already
in use
I can run the following code against Spark 1.1
sc = SparkContext()
r1 = (1, 1, 1.0)
r2 = (1, 2, 2.0)
r3 = (2, 1, 2.0)
ratings = sc.parallelize([r1, r2, r3])
model = ALS.trainImplicit(ratings, 1)
Davies
On Thu, Oct 16, 2014 at 2:45 PM, Davies Liu dav...@databricks.com wrote:
Could you post the
I have a flatmap function that shouldn't possibly emit duplicates and yet it
does. The output of my function is a HashSet so the function itself cannot
output duplicates and yet I see many copies of keys emmited from it (in one
case up to 62). The curious thing is I can't get this to happen
Apologies if this is something very obvious but I've perused the spark
streaming guide and this isn't very evident to me still. So I have files
with data of the format: timestamp,column1,column2,column3.. etc. and I'd
like to use the spark streaming's window operations on them.
However from what
Maybe I should create a private AMI to use for my question No.1? Assuming I
use the default instance type as the base image.. Anyone tried this?
--
View this message in context:
Hello,
Is there a way to print the dependency graph of complete program or RDD/DStream
as a DOT file? It would be very helpful to have such a thing.
Thanks,
-Soumitra.
-
To unsubscribe, e-mail:
Hi,
Below is the link for a simple Play + SparkSQL example -
http://blog.knoldus.com/2014/07/14/play-with-spark-building-apache-spark-with-play-framework-part-3/
https://github.com/knoldus/Play-Spark-Scala
Manu
On Thu, Oct 16, 2014 at 1:00 PM, Mohammed Guller moham...@glassbeam.com
wrote:
I’ve read several discussions of the error here and so have wiped all cluster
machines and copied the master’s spark build to the rest of the cluster. I’ve
built my job on the master using the correct Spark version as a dependency and
even build that version of Spark. I still get the
Hi,
I have a rdd which is my application data and is huge. I want to join this
with reference data which is also huge to fit in-memory and thus I do not
want to use Broadcast variable.
What other options do I have to perform such joins?
I am using Cassandra as my data store, so should I just
Hi,
When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError:
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error,
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(“select count(1) from q8_national_market_share
sqlContext.sql(select
Yes, I removed my Spark dir and scp’ed the master’s build to all cluster
machines suspecting that problem.
My app (Apache Mahout) had Spark 1.0.1 in the POM but changing it to 1.0.2 (the
Spark version installed) gave another error. I guess I’ll have to install Spark
1.0.1 or get Mahout to
thanks marcelo. i only instantiated sparkcontext once, at the beginning,
in this code. the exception was thrown right at the beginning.
i also tried to run other programs, which worked fine previously, but now
also got the same error.
it looks like it put global block on creating sparkcontext
I need help to better trap Exception in the map functions. What is the best way
to catch the exception and provide some helpful diagnostic information such as
source of the input such as file name (and ideally line number if I am
processing a text file)?
-Yao
i got an exception complaining about serializable. the sample code is
below...
class HelloWorld(val count: Int) {
...
...
}
object Test extends App {
...
val data = sc.parallelize(List(new HelloWorld(1), new HelloWorld(2)))
...
}
what is the best way to serialize HelloWorld so that
you can out a try catch block in the map function and log the exception.
The only tricky part is that the exception log will be located in the
executor machine. Even if you don't do any trapping you should see the
exception stacktrace in the executors' stderr log which is visible through
the UI
Manu,
I had looked at that example before starting this thread. I was specifically
looking for some suggestions on how to run a Play app with the Spark-submit
script on a real cluster.
Mohammed
From: Manu Suryavansh [mailto:suryavanshi.m...@gmail.com]
Sent: Thursday, October 16, 2014 3:32 PM
I'm trying to give API interface to Java users. And I need to accept their
JavaSchemaRDDs, and convert it to SchemaRDD for Scala users.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482p16641.html
Sent from
Hi Arthur,
I think this is a known issue in Spark, you can check
(https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can
you always reproduce this issue, Is this issue related to some specific data
sets, would you mind giving me some information about you workload, Spark
HI All,
I try to build spark 1.1.0 using sbt with command:
sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
but the resulting spark-assembly-1.1.0-hadoop2.2.0.jar still missing the
apache commons math3 classes.
How to add the math3 into package?
Best regards,
Henry
The remaining dependencies (Spark libraries) are available for the context from
the sparkhome. I have installed spark such that all the slaves to have same
sparkhome. Code looks like this.
val conf = new SparkConf()
.setSparkHome(/home/dev/spark)
.setMaster(spark://99.99.99.999:7077)
What about all the play dependencies since the jar created by the ‘Play
package’ won’t include the play jar or any of the 100+ jars on which play
itself depends?
Mohammed
From: US Office Admin [mailto:ad...@vectorum.com]
Sent: Thursday, October 16, 2014 7:05 PM
To: Mohammed Guller;
make it a case class should work.
On Thu, Oct 16, 2014 at 8:30 PM, ll duy.huynh@gmail.com wrote:
i got an exception complaining about serializable. the sample code is
below...
class HelloWorld(val count: Int) {
...
...
}
object Test extends App {
...
val data =
In our case, Play libraries are not required to run spark jobs. Hence they
are available only on master and play runs as a regular scala application.
I can't think of a case where you need play to run on slaves.
Raju
On Thu, Oct 16, 2014 at 10:21 PM, Mohammed Guller moham...@glassbeam.com
computePrincipalComponents returns a local matrix X, whose columns are
the principal components (ordered), while those column vectors are in
the same feature space as the input feature vectors. -Xiangrui
On Thu, Oct 16, 2014 at 2:39 AM, al123 ant.lay...@hotmail.co.uk wrote:
Hi,
I don't think
Thanks, Ted,
We use CDH 5.1 and the HBase version is 0.98.1-cdh5.1.0, in which the
javadoc of HConnectionManager.java still recommends shutdown hook.
I look into val table = Util.Connection.getTable(user), and find it
didn't invoke
public HTable(Configuration conf, final byte[] tableName, final
Looking at Apache 0.98 code, you can follow the example in the class
javadoc (line 144 of HConnectionManager.java):
* HTableInterface table = connection.getTable(table1);
* try {
* // Use the table as needed, for a single operation and a single thread
* } finally {
* table.close();
*
Hi All,
I'm using windows 8.1 to build spark 1.1.0 using this command:
C:\apache-maven-3.0.5\bin\mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0
-DskipTests clean package -e
Below is the error message:
[ERROR] Failed to execute goal
org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default)
1 - 100 of 102 matches
Mail list logo