Try to repartition it to a higher number (at least 3-4 times the total # of
cpu cores). What operation are you doing? It may happen that if you are
doing a join/groupBy sort of operation that task which is taking time is
having all the values, in that case you need to use a Partitioner which
will
This is where you can get started
https://spark.apache.org/docs/latest/sql-programming-guide.html
Thanks
Best Regards
On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar vinodsachin...@gmail.com
wrote:
Hi Everyone,
I am developing application which handles bulk of data around
millions(This may
1. Yes open up the webui running on 8080 to see the memory/cores allocated
to your workers, and open up the ui running on 4040 and click on the
Executor tab to see the memory allocated for the executor.
2. mllib codes can be found over here
https://github.com/apache/spark/tree/master/mllib and
Why not add a trigger to your database table and whenever its updated push
the changes to kafka etc and use normal sparkstreaming? You can also write
a receiver based architecture
https://spark.apache.org/docs/latest/streaming-custom-receivers.html for
this, but that will be a bit time consuming.
This will get you started
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
Thanks
Best Regards
On Mon, Jul 13, 2015 at 5:29 PM, srinivasraghavansr71
sreenivas.raghav...@gmail.com wrote:
Hello everyone,
I am interested to contribute to apache spark. I
Look in the worker logs and see whats going on.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 4:02 PM, Arthur Chan arthur.hk.c...@gmail.com
wrote:
Hi,
I use Spark 1.4. When saving the model to HDFS, I got error?
Please help!
Regards
my scala command:
You can try to resolve some Jira issues, to start with try out some newbie
JIRA's.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 4:10 PM, srinivasraghavansr71
sreenivas.raghav...@gmail.com wrote:
I saw the contribution sections. As a new contibutor, should I try to build
patches or can I add
environment of spark.
I tried spark SQL but it seems it returns data slower than compared to
MsSQL.( I have tested with data which has 4 records)
On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
This is where you can get started
https://spark.apache.org/docs
Someone else also reported this error with spark 1.4.0
Thanks
Best Regards
On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan arthur.hk.c...@gmail.com
wrote:
Hi, Below is the log form the worker.
15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file
Try adding it in your SPARK_CLASSPATH inside conf/spark-env.sh file.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 7:05 AM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm having conf/hive-site.xml pointing to my Hive metastore but sparksql
CLI doesn't pick it up. (copying the same
Can you paste your conf/spark-env.sh file? Put SPARK_MASTER_IP as the
master machine's host name in spark-env.sh file. Also add your slaves
hostnames into conf/slaves file and do a sbin/start-all.sh
Thanks
Best Regards
On Tue, Jul 14, 2015 at 1:26 PM, sivarani whitefeathers...@gmail.com
wrote:
wrote:
Hi Akhil,
It's interesting if RDDs are stored internally in a columnar format as
well?
Or it is only when an RDD is cached in SQL context, it is converted to
columnar format.
What about data frames?
Thanks!
--
Ruslan Dautkhanov
On Fri, Jul 10, 2015 at 2:07 AM, Akhil Das ak
You are a bit confused about master node, slave node and the driver
machine.
1. Master node can be kept as a smaller machine in your dev environment,
mostly in production you will be using Mesos or Yarn cluster manager.
2. Now, if you are running your driver program (the streaming job) on the
Just make sure you are having the same installation of
spark-1.4.0-bin-hadoop2.6 everywhere. (including the slaves, master, and
from where you start the spark-shell).
Thanks
Best Regards
On Mon, Jul 13, 2015 at 4:34 AM, Eduardo erocha@gmail.com wrote:
My installation of spark is not
Yes, that is correct. You can use this boiler plate to avoid spark-submit.
//The configurations
val sconf = new SparkConf()
.setMaster(spark://spark-ak-master:7077)
.setAppName(SigmoidApp)
.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
Did you try setting the HADOOP_CONF_DIR?
Thanks
Best Regards
On Sat, Jul 11, 2015 at 3:17 AM, maxdml maxdemou...@gmail.com wrote:
Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
and higher from the official website.
--
View this message in context:
Can you not use sc.wholeTextFile() and use a custom parser or a regex to
extract out the TransactionIDs?
Thanks
Best Regards
On Sat, Jul 11, 2015 at 8:18 AM, ssbiox sergey.korytni...@gmail.com wrote:
Hello,
I have a very specific question on how to do a search between particular
lines of
Can you dig a bit more in the worker logs? Also make sure that spark has
permission to write to /opt/ on that machine as its one machine always
throwing up.
Thanks
Best Regards
On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma sharmagaura...@gmail.com
wrote:
Hi All,
I am facing this issue in
Here's an example https://github.com/przemek1990/spark-streaming
Thanks
Best Regards
On Thu, Jul 9, 2015 at 4:35 PM, diplomatic Guru diplomaticg...@gmail.com
wrote:
Hello all,
I'm trying to configure the flume to push data into a sink so that my
stream job could pick up the data. My events
When you connect to the machines you can create an ssh tunnel to access the
UI :
ssh -L 8080:127.0.0.1:8080 MasterMachinesIP
And then you can simply open localhost:8080 in your browser and it should
show up the UI.
Thanks
Best Regards
On Thu, Jul 9, 2015 at 7:44 PM, rroxanaioana
It seems an issue with the azure, there was a discussion over here
https://azure.microsoft.com/en-in/documentation/articles/hdinsight-hadoop-spark-install/
Thanks
Best Regards
On Thu, Jul 9, 2015 at 9:42 PM, Daniel Haviv
daniel.ha...@veracity-group.com wrote:
Hi,
I'm running Spark 1.4 on
https://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory
Thanks
Best Regards
On Fri, Jul 10, 2015 at 10:05 AM, vinod kumar vinodsachin...@gmail.com
wrote:
Hi Guys,
Can any one please share me how to use caching feature of spark via spark
sql queries?
-Vinod
that's because sc is already initialized. You can do sc.stop() before you
initialize another one.
Thanks
Best Regards
On Fri, Jul 10, 2015 at 3:54 PM, Prateek . prat...@aricent.com wrote:
Hi,
I am running single spark-shell but observing this error when I give val
sc = new
Looks like a configuration problem with your spark setup, are you running
the driver on a different network? Can you try a simple program from
spark-shell and make sure your setup is proper? (like sc.parallelize(1 to
1000).collect())
Thanks
Best Regards
On Thu, Jul 9, 2015 at 1:02 AM, ÐΞ€ρ@Ҝ
On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt ashish.du...@gmail.com wrote:
Hi,
We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two
days I have been trying to connect my laptop to the server using spark
master ip:port but its been unsucessful.
The server contains data
Did you try sc.shutdown and creating a new one?
Thanks
Best Regards
On Wed, Jul 8, 2015 at 8:12 PM, Terry Hole hujie.ea...@gmail.com wrote:
I am using spark 1.4.1rc1 with default hive settings
Thanks
- Terry
Hi All,
I'd like to use the hive context in spark shell, i need to recreate the
Yes, just to add see the following scenario of rdd lineage:
RDD1 - RDD2 - RDD3 - RDD4
here RDD2 depends on the RDD1's output and the lineage goes till RDD4.
Now, for some reason RDD3 is lost, and spark will recompute it from RDD2.
Thanks
Best Regards
On Thu, Jul 9, 2015 at 5:51 AM, canan
Can you look in the datanode logs and see whats going on? Most likely, you
are hitting the ulimit on open file handles.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 10:55 AM, Pankaj Arora pankaj.ar...@guavus.com
wrote:
Hi,
I am running long running application over yarn using spark and I am
multithread it?
Sincerely,
Ashish Dutt
On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Whats the point of creating them in parallel? You can multi-thread it run
it in parallel though.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 5:34 AM, Brandon White bwwintheho
Its showing connection refused, for some reason it was not able to connect
to the machine either its the machine\s start up time or its with the
security group.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 2:04 AM, Pagliari, Roberto rpagli...@appcomsci.com
wrote:
I'm following the tutorial
Whats the point of creating them in parallel? You can multi-thread it run
it in parallel though.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 5:34 AM, Brandon White bwwintheho...@gmail.com
wrote:
Say I have a spark job that looks like following:
def loadTable1() {
val table1 =
Strange. What are you having in $SPARK_MASTER_IP? It may happen that it is
not able to bind to the given ip but again it should be in the logs.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 12:54 AM, maxdml maxdemou...@gmail.com wrote:
Hi,
I've been compiling spark 1.4.0 with SBT, from the
Did you try kryo? Wrap everything with kryo and see if you are still
hitting the exception. (At least you could see a different exception stack).
Thanks
Best Regards
On Tue, Jul 7, 2015 at 6:05 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, suffering from a pretty strange issue:
UpdatestateByKey?
Thanks
Best Regards
On Wed, Jul 8, 2015 at 1:05 AM, swetha swethakasire...@gmail.com wrote:
Hi,
Suppose I want the data to be grouped by and Id named 12345 and I have
certain amount of data coming out from one batch for 12345 and I have
data
related to 12345 coming after
Can you try adding sc.stop at the end of your program? looks like its
having a hard-time closing off sparkcontext.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 4:08 PM, Hafsa Asif hafsa.a...@matchinguu.com
wrote:
Hi,
I run the following simple Java spark standalone app with maven command
Here's a simplified example:
SparkConf conf = new SparkConf().setAppName(
Sigmoid).setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);
ListString user = new ArrayListString();
user.add(Jack);
user.add(Jill);
instances having successively run on the same
machine?
--
Henri Maxime Demoulin
2015-07-07 4:10 GMT-04:00 Akhil Das ak...@sigmoidanalytics.com:
Strange. What are you having in $SPARK_MASTER_IP? It may happen that it
is not able to bind to the given ip but again it should be in the logs.
Thanks
If you don't want those logs flood your screen, you can disable it simply
with:
import org.apache.log4j.{Level, Logger}
Logger.getLogger(org).setLevel(Level.OFF)
Logger.getLogger(akka).setLevel(Level.OFF)
Thanks
Best Regards
On Sun, Jul 5, 2015 at 7:27 PM, Hellen
Try with *spark.cores.max*, executor cores is used when you usually run it
on yarn mode.
Thanks
Best Regards
On Mon, Jul 6, 2015 at 1:22 AM, nizang ni...@windward.eu wrote:
hi,
We're running spark 1.4.0 on ec2, with 6 machines, 4 cores each. We're
trying to run an application on a number of
While the job is running, just look in the directory and see whats the root
cause of it (is it the logs? is it the shuffle? etc). Here's a few
configuration options which you can try:
- Disable shuffle : spark.shuffle.spill=false (It might end up in OOM)
- Enable log rotation:
You can also set these in the spark-env.sh file :
export SPARK_WORKER_DIR=/mnt/spark/
export SPARK_LOCAL_DIR=/mnt/spark/
Thanks
Best Regards
On Mon, Jul 6, 2015 at 12:29 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
While the job is running, just look in the directory and see whats
Its complaining for a jdbc driver. Add it in your driver classpath like:
./bin/spark-sql --driver-class-path
/home/akhld/sigmoid/spark/lib/mysql-connector-java-5.1.32-bin.jar
Thanks
Best Regards
On Mon, Jul 6, 2015 at 11:42 AM, sandeep vura sandeepv...@gmail.com wrote:
Hi Sparkers,
I am
If you want a long running application, then go with spark streaming (which
kind of blocks your resources). On the other hand, if you use job server
then you can actually use the resources (CPUs) for other jobs also when
your dbjob is not using them.
Thanks
Best Regards
On Sun, Jul 5, 2015 at
Looks like, it spend more time writing/transferring the 40GB of shuffle
when you used kryo. And surpirsingly, JavaSerializer has 700MB of shuffle?
Thanks
Best Regards
On Sun, Jul 5, 2015 at 12:01 PM, Gavin Liu ilovesonsofanar...@gmail.com
wrote:
Hi,
I am using TeraSort benchmark from
With binary i think it might not be possible, although if you can download
the sources and then build it then you can remove this function
https://github.com/apache/spark/blob/master/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L1023
which initializes the SQLContext.
I think you can open up a jira, not sure if this PR
https://github.com/apache/spark/pull/2209/files (SPARK-2890
https://issues.apache.org/jira/browse/SPARK-2890) broke the validation
piece.
Thanks
Best Regards
On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers ko...@tresata.com wrote:
i am
Can you paste the code? Something is missing
Thanks
Best Regards
On Fri, Jul 3, 2015 at 3:14 PM, Jem Tucker jem.tuc...@gmail.com wrote:
In the driver when running spark-submit with --master yarn-client
On Fri, Jul 3, 2015 at 10:23 AM Akhil Das ak...@sigmoidanalytics.com
wrote:
Where does
Did you try:
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Thanks
Best Regards
On Fri, Jul 3, 2015 at 2:27 PM, 1106944...@qq.com 1106944...@qq.com wrote:
Hi all,
Anyone build spark 1.4 source code for sparkR with maven/sbt, what's
comand ? using
Where does it returns null? Within the driver or in the executor? I just
tried System.console.readPassword in spark-shell and it worked.
Thanks
Best Regards
On Fri, Jul 3, 2015 at 2:32 PM, Jem Tucker jem.tuc...@gmail.com wrote:
Hi,
We have an application that requires a username/password to
rdd's which are no longer required will be removed from memory by spark
itself (which you can consider as lazy?).
Thanks
Best Regards
On Wed, Jul 1, 2015 at 7:48 PM, Jem Tucker jem.tuc...@gmail.com wrote:
Hi,
The current behavior of rdd.unpersist() appears to not be lazily executed
and
Have a look at the sc.wholeTextFiles, you can use it to read the whole csv
contents into the value and then split it on \n and add them up to a list
and return it.
*sc.wholeTextFiles:*
Read a directory of text files from HDFS, a local file system (available on
all nodes), or any Hadoop-supported
Looks like a jar conflict to me.
ava.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
You are having multiple versions of the same jars in the classpath.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 6:58 AM, nkd kalidas.nimmaga...@gmail.com
It says:
Caused by: java.net.ConnectException: Connection refused: slave2/...:54845
Could you look in the executor logs (stderr on slave2) and see what made it
shut down? Since you are doing a join there's a high possibility of OOM etc.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 10:20 AM,
Now i'm having a strange feeling to try this on KBOX
http://kevinboone.net/kbox.html :/
Thanks
Best Regards
On Wed, Jul 1, 2015 at 9:10 AM, Exie tfind...@prodevelop.com.au wrote:
FWIW, I had some trouble getting Spark running on a Pi.
My core problem was using snappy for compression as it
Have a look at https://spark.apache.org/docs/latest/job-scheduling.html
Thanks
Best Regards
On Wed, Jul 1, 2015 at 12:01 PM, Nirmal Fernando nir...@wso2.com wrote:
Hi All,
Is there any additional configs that we have to do to perform $subject?
--
Thanks regards,
Nirmal
Associate
Have a look at the window, updateStateByKey operations, if you are looking
for something more sophisticated then you can actually persists these
streams in an intermediate storage (say for x duration) like HBase or
Cassandra or any other DB and you can do global aggregations with these.
Thanks
.addJar works for me when i run it as a stand-alone application (without
using spark-submit)
Thanks
Best Regards
On Tue, Jun 30, 2015 at 7:47 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, running into a pretty strange issue:
I'm setting
spark.executor.extraClassPath
Since its a windows machine, you are very likely to be hitting this one
https://issues.apache.org/jira/browse/SPARK-2356
Thanks
Best Regards
On Wed, Jul 1, 2015 at 12:36 AM, Sourav Mazumder
sourav.mazumde...@gmail.com wrote:
Hi,
I'm running Spark 1.4.0 without Hadoop. I'm using the binary
Have a look at the StageInfo
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.StageInfo
class,
it has method stageFailed. You could make use of it. I don't understand the
point of restarting the entire application.
Thanks
Best Regards
On Tue, Jun 30, 2015 at
How much memory you have on that machine? You can increase the heap-space
by *export _JAVA_OPTIONS=-Xmx2g*
Thanks
Best Regards
On Tue, Jun 30, 2015 at 11:00 AM, Chintan Bhatt
chintanbhatt...@charusat.ac.in wrote:
Facing following error message while performing sbt/sbt assembly
Error
This:
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds]
Could happen for many reasons, one of them could be because of insufficient
memory. Are you running all 20 apps on the same node? How are you
submitting the apps? (with spark-submit?). I see you have
Try this way:
val data = sc.textFile(s3n://ACCESS_KEY:SECRET_KEY@mybucket/temp/)
Thanks
Best Regards
On Mon, Jun 29, 2015 at 11:59 PM, didi did...@gmail.com wrote:
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val
Cool.
On 29 Jun 2015 21:10, 郭谦 buptguoq...@gmail.com wrote:
Akhil Das,
You give me a new idea to solve the problem.
Vova provides me a way to solve the problem just before
Vova Shelgunovvvs...@gmail.com
Sample code for submitting job from any other java app, e.g. servlet:
http
Here's a bunch of configuration for that
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior
Thanks
Best Regards
On Fri, Jun 26, 2015 at 10:37 PM, igor.berman igor.ber...@gmail.com wrote:
Hi,
wanted to get some advice regarding tunning spark application
I see for some of
Which version of spark are you using? You can try changing the heap size
manually by *export _JAVA_OPTIONS=-Xmx5g *
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:52 PM, Yifan LI iamyifa...@gmail.com wrote:
Hi,
I just encountered the same problem, when I run a PageRank program which
has lots
You can create a SparkContext in your program and run it as a standalone
application without using spark-submit.
Here's something that will get you started:
//Create SparkContext
val sconf = new SparkConf()
.setMaster(spark://spark-ak-master:7077)
.setAppName(Test)
.
The input size is 512.0 MB (hadoop) / 4159106. Can this be reduced to 64
MB so as to increase the number of tasks. Similar to split size that
increases the number of mappers in Hadoop M/R.
On Thu, Jun 25, 2015 at 12:06 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Look in the tuning
Which distributed database are you referring here? Spark can connect with
almost all those databases out there (You just need to pass the
Input/Output Format classes or there are a bunch of connectors also
available).
Thanks
Best Regards
On Fri, Jun 26, 2015 at 12:07 PM, louis.hust
Try to add them in the SPARK_CLASSPATH in your conf/spark-env.sh file
Thanks
Best Regards
On Thu, Jun 25, 2015 at 9:31 PM, Bin Wang binwang...@gmail.com wrote:
I am trying to run the Spark example code HBaseTest from command line
using spark-submit instead run-example, in that case, I can
You just need to set your HADOOP_HOME which appears to be null in the
stackstrace. If you are not having the winutils.exe, then you can download
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
and put it there.
Thanks
Best Regards
On Thu, Jun 25, 2015 at 11:30 PM, Ashic
Which distributed database are you referring here? Spark can connect with
almost all those databases out there (You just need to pass the
Input/Output Format classes or there are a bunch of connectors also
available).
Thanks
Best Regards
On Fri, Jun 26, 2015 at 12:07 PM, louis.hust
Why do you want to do that?
Thanks
Best Regards
On Thu, Jun 25, 2015 at 10:16 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
Apparently, sc.paralleize (..) operation is performed in the driver
program not in the workers ! Is it possible to do this in worker process
for the sake of
Its a scala version conflict, can you paste your build.sbt file?
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:05 AM, stati srikanth...@gmail.com wrote:
Hello,
When I run a spark job with spark-submit it fails with below exception for
code line
/*val webLogDF =
JavaPairInputDStreamString, String messages =
KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
Here:
jssc = JavaStreamingContext
String.class = Key ,
/releases/;
)
On Fri, Jun 26, 2015 at 4:13 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its a scala version conflict, can you paste your build.sbt file?
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:05 AM, stati srikanth...@gmail.com wrote:
Hello,
When I run a spark job with spark-submit
(๏̯͡๏) deepuj...@gmail.com wrote:
Its taking an hour and on Hadoop it takes 1h 30m, is there a way to make
it run faster ?
On Wed, Jun 24, 2015 at 11:39 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Cool. :)
On 24 Jun 2015 23:44, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Its running now
a different guava dependency but the error
does go away this way
On Wed, Jun 24, 2015 at 10:04 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you try to add those jars in the SPARK_CLASSPATH and give it a try?
Thanks
Best Regards
On Wed, Jun 24, 2015 at 12:07 AM, Yana Kadiyska yana.kadiy
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:39 PM, 1106944...@qq.com 1106944...@qq.com
wrote:
Hi all
I have installed spark1.4, then want to use sparkR . assueme spark
master ip= node1, how to start sparkR ? and summit job to
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:39 PM, 1106944...@qq.com 1106944...@qq.com
wrote:
Hi all
I have installed spark1.4, then want to use sparkR . assueme spark
master ip= node1, how to start sparkR ? and summit job to
Can you look in the worker logs and see whats going on? It may happen that
you ran out of diskspace etc.
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:08 PM, barmaley o...@solver.com wrote:
I'm running Spark 1.3.1 on AWS... Having long-running application (spark
context) which accepts and
That totally depends on the way you extract the data. It will be helpful if
you can paste your code so that we will understand it better.
Thanks
Best Regards
On Wed, Jun 24, 2015 at 2:32 PM, William Ferrell wferr...@gmail.com wrote:
Hello -
I am using Apache Spark 1.2.1 via pyspark. Thanks
,
Is this the official R Package?
It is written : *NOTE: The API from the upcoming Spark release (1.4)
will not have the same API as described here. *
Thanks,
JC
ᐧ
2015-06-25 10:55 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com:
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best
Depending the size of the memory you are having, you ccould allocate 60-80%
of the memory for the spark worker process. Datanode doesn't require too
much memory.
On 23 Jun 2015 21:26, maxdml max...@cs.duke.edu wrote:
I'm wondering if there is a real benefit for splitting my memory in two for
Can you look a bit more in the error logs? It could be getting killed
because of OOM etc. One thing you can try is to set the
spark.shuffle.blockTransferService to nio from netty.
Thanks
Best Regards
On Wed, Jun 24, 2015 at 5:46 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I have a Spark job
Can you try to add those jars in the SPARK_CLASSPATH and give it a try?
Thanks
Best Regards
On Wed, Jun 24, 2015 at 12:07 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, I have been using Spark against an external Metastore service
which runs Hive with Cdh 4.6
In Spark 1.2, I was
A screenshot of your framework running would also be helpful. How many
cores does it have?
Did you try running it in coarse grained mode?
Try to add these to the conf:
sparkConf.set(spark.mesos.coarse, true)
sparkConfset(spark.cores.max, 2)
Thanks
Best Regards
On Wed, Jun 24, 2015 at 1:35 AM,
)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
On Wed, Jun 24, 2015 at 7:16 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you look a bit more in the error logs? It could be getting
Why don't you do a normal .saveAsTextFiles?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla anshushuk...@gmail.com
wrote:
Thanx for reply !!
YES , Either it should write on any machine of cluster or Can you please
help me ... that how to do this . Previously i was
Looks like a hostname conflict to me.
15/06/22 17:04:45 WARN Utils: Your hostname, datasci01.dev.abc.com resolves
to a loopback address: 127.0.0.1; using 10.0.3.197 instead (on interface
eth0)
15/06/22 17:04:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Can you paste
May be while producing the messages, you can make it as a keyedMessage with
the timestamp as key and on the consumer end you can easily identify the
key (which will be the timestamp) from the message. If the network is fast
enough, then i think there could be a small millisecond lag.
Thanks
Best
Well, you could that (Stage information) is an ASCII representation of the
WebUI (running on port 4040). Since you set local[4] you will have 4
threads for your computation, and since you are having 2 receivers, you are
left with 2 threads to process ((0 + 2) -- This 2 is your 2 threads.) And
the
Did you happened to try this?
JavaPairRDDInteger, String hadoopFile = sc.hadoopFile(
/sigmoid, DataInputFormat.class, LongWritable.class,
Text.class)
Thanks
Best Regards
On Tue, Jun 23, 2015 at 6:58 AM, 付雅丹 yadanfu1...@gmail.com wrote:
Hello, everyone! I'm new in spark.
Use *spark.cores.max* to limit the CPU per job, then you can easily
accommodate your third job also.
Thanks
Best Regards
On Tue, Jun 23, 2015 at 5:07 PM, Wojciech Pituła w.pit...@gmail.com wrote:
I have set up small standalone cluster: 5 nodes, every node has 5GB of
memory an 8 cores. As you
Yes.
Thanks
Best Regards
On Mon, Jun 22, 2015 at 8:33 PM, Murthy Chelankuri kmurt...@gmail.com
wrote:
I have more than one jar. can we set sc.addJar multiple times with each
dependent jar ?
On Mon, Jun 22, 2015 at 8:30 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Try sc.addJar instead
You can use fileStream for that, look at the XMLInputFormat
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java
of mahout. It should give you full XML object as on record, (as opposed to
an XML
Could you elaborate a bit more? What do you meant by set up a standalone
server? and what is leading you to that exceptions?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 2:22 AM, nizang ni...@windward.eu wrote:
hi,
I'm trying to setup a standalone server, and in one of my tests, I got the
Totally depends on the use-case that you are solving with Spark, for
instance there was some discussion around the same which you could read
over here
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-of-executors-cores-memory-allocation-td23326.html
Thanks
Best Regards
Its pretty straight forward, this would get you started
http://stackoverflow.com/questions/24896233/how-to-save-apache-spark-schema-output-in-mysql-database
Thanks
Best Regards
On Mon, Jun 22, 2015 at 12:39 PM, Manohar753
manohar.re...@happiestminds.com wrote:
Hi Team,
How to split and
How are you submitting the application? Could you paste the code that you
are running?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 5:37 PM, Sean Barzilay sesnbarzi...@gmail.com
wrote:
I am trying to run a function on every line of a parquet file. The
function is in an object. When I run the
to use
XmlInputFormatof mahout in Spark Streaming (I am not Spark Streaming
Expert yet ;-)). Can you show me some sample code for explanation.
Thanks in advance,
Yong
On Mon, Jun 22, 2015 at 6:44 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You can use fileStream for that, look
401 - 500 of 1386 matches
Mail list logo