It seems that node is not getting allocated with enough tasks, try
increasing your level of parallelism or do a manual repartition so that
everyone gets even tasks to operate on.
Thanks
Best Regards
On Fri, Mar 20, 2015 at 8:05 PM, Yiannis Gkoufas johngou...@gmail.com
wrote:
Hi all,
I have 6
Did you try ssh -L 4040:127.0.0.1:4040 user@host
Thanks
Best Regards
On Mon, Mar 23, 2015 at 1:12 PM, sergunok ser...@gmail.com wrote:
Is it a way to tunnel Spark UI?
I tried to tunnel client-node:4040 but my browser was redirected from
localhost to some cluster locally visible domain
What do you mean not distinct?
It does works for me:
[image: Inline image 1]
Code:
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkContext, SparkConf}
val ssc = new StreamingContext(sc, Seconds(1))
val data =
From IntelliJ, you can use the remote debugging feature.
http://stackoverflow.com/questions/19128264/how-to-remote-debug-in-intellij-12-1-4
For remote debugging, you need to pass the following:
-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=4000,suspend=n
jvm options and configure your
clues why it happens only after v1.2.0 and above? Nothing
else changes.
Thanks,
Eason
On Tue, Mar 17, 2015 at 8:39 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its clearly saying:
java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId;
local class incompatible: stream
You could do a cache and see the memory usage under Storage tab in the
driver UI (runs on port 4040)
Thanks
Best Regards
On Fri, Mar 20, 2015 at 12:02 PM, anu anamika.guo...@gmail.com wrote:
Hi All
I would like to measure Bytes Read and Peak Memory Usage for a Spark SQL
Query.
Please
1. If you are consuming data from Kafka or any other receiver based
sources, then you can start 1-2 receivers per worker (assuming you'll have
min 4 core per worker)
2. If you are having single receiver or is a fileStream then what you can
do to distribute the data across machines is to do a
Isn't that a feature? Other than running a buggy pipeline, just kills all
executors? You can always handle exceptions with proper try catch in your
code though.
Thanks
Best Regards
On Fri, Mar 20, 2015 at 3:51 PM, mrm ma...@skimlinks.com wrote:
Hi,
I recently changed from Spark 1.1. to Spark
Totally depends on your database, if that's a NoSQL database like
MongoDB/HBase etc then you can use the native .saveAsNewAPIHAdoopFile or
.saveAsHadoopDataSet etc.
For a SQL databases, i think people usually puts the overhead on driver
like you did.
Thanks
Best Regards
On Wed, Mar 18, 2015 at
Can you see where exactly it is spending time? Like you said it goes to
Stage 2, then you will be able to see how much time it spend on Stage 1.
See if its a GC time, then try increasing the level of parallelism or
repartition it like sc.getDefaultParallelism*3.
Thanks
Best Regards
On Thu, Mar
for the model? I have a Spark Master and 2 Workers running on
CDH 5.3...what would the default spark-shell level of parallelism be...I
thought it would be 3?
Thank you for the help!
-Su
On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you see where
How are you running the application? Can you try running the same inside
spark-shell?
Thanks
Best Regards
On Wed, Mar 18, 2015 at 10:51 PM, sprookie cug12...@gmail.com wrote:
Hi All,
I am using Saprk version 1.2 running locally. When I try to read a paquet
file I get below exception, what
You can always throw more machines at this and see if the performance is
increasing. Since you haven't mentioned anything regarding your # cores etc.
Thanks
Best Regards
On Wed, Mar 18, 2015 at 11:42 AM, nvrs nvior...@gmail.com wrote:
Hi all,
We are having a few issues with the performance
You can simply turn it on using:
./sbin/start-history-server.sh
Read more here http://spark.apache.org/docs/1.3.0/monitoring.html.
Thanks
Best Regards
On Wed, Mar 18, 2015 at 4:00 PM, patcharee patcharee.thong...@uni.no
wrote:
Hi,
I am using spark 1.3. I would like to use Spark Job
Did you try ssh tunneling instead of SOCKS?
Thanks
Best Regards
On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan jonat...@amazon.com
wrote:
I'm trying to figure out how I might be able to use Spark with a SOCKS
proxy. That is, my dream is to be able to write code in my IDE then run it
Did you try ssh tunneling instead of SOCKS?
Thanks
Best Regards
On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan jonat...@amazon.com
wrote:
I'm trying to figure out how I might be able to use Spark with a SOCKS
proxy. That is, my dream is to be able to write code in my IDE then run it
I think you can disable it with spark.shuffle.spill=false
Thanks
Best Regards
On Wed, Mar 18, 2015 at 3:39 PM, Darren Hoo darren@gmail.com wrote:
Thanks, Shao
On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai saisai.s...@intel.com
wrote:
Yeah, as I said your job processing time is much
)
at java.lang.Class.forName(Class.java:191)
at
org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:183)
at
org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Patcharee
On 18. mars 2015 11:35, Akhil Das wrote:
You can simply turn it on using
Create SparkContext set master as yarn-cluster then run it as a standalone
program?
Thanks
Best Regards
On Tue, Mar 17, 2015 at 1:27 AM, rrussell25 rrussel...@gmail.com wrote:
Hi, were you ever able to determine a satisfactory approach for this
problem?
I have a similar situation and would
Did you launch the cluster using spark-ec2 script? Just make sure all ports
are open for master, slave instances security group. From the error, it
seems its not able to connect to the driver program (port 58360)
Thanks
Best Regards
On Tue, Mar 17, 2015 at 3:26 AM, Otis Gospodnetic
both versions on the project and the cluster. Any clues?
Even the sample code from Spark website failed to work.
Thanks,
Eason
On Sun, Mar 15, 2015 at 11:56 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Did you change both the versions? The one in your build file of your
project
There's one on Freenode, You can join #Apache-Spark There's like 60 people
idling. :)
Thanks
Best Regards
On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote:
Hi, everyone,
I'm wondering whether there is a possibility to setup an official IRC
channel on freenode.
I
One approach would be, If you are using fileStream you can access the
individual filenames from the partitions and with that filename you can
apply your uncompression logic/parsing logic and get it done.
Like:
UnionPartition upp = (UnionPartition)
ds.values().getPartitions()[i];
Did you change both the versions? The one in your build file of your
project and the spark version of your cluster?
Thanks
Best Regards
On Sat, Mar 14, 2015 at 6:47 AM, EH eas...@gmail.com wrote:
Hi all,
I've been using Spark 1.1.0 for a while, and now would like to upgrade to
Spark 1.1.1
Not sure if this will help, but can you try setting the following:
set(spark.core.connection.ack.wait.timeout,6000)
Thanks
Best Regards
On Sat, Mar 14, 2015 at 4:08 AM, Chen Song chen.song...@gmail.com wrote:
When I ran Spark SQL query (a simple group by query) via hive support, I
have seen
If you want more partitions then you have specify it as:
Rdd.groupByKey(*10*).mapValues...
I think if you don't specify anything, the # partitions will be the #
cores that you have for processing.
Thanks
Best Regards
On Sat, Mar 14, 2015 at 12:28 AM, Adrian Mocanu amoc...@verticalscope.com
If you use fileStream, there's an option to filter out files. In your case
you can easily create a filter to remove _temporary files. In that case,
you will have to move your codes inside foreachRDD of the dstream since the
application will become a streaming app.
Thanks
Best Regards
On Sat, Mar
How are you setting it? and how are you submitting the job?
Thanks
Best Regards
On Mon, Mar 16, 2015 at 12:52 PM, Xi Shen davidshe...@gmail.com wrote:
Hi,
I have set spark.executor.memory to 2048m, and in the UI Environment
page, I can see this value has been set correctly. But in the
How many threads are you allocating while creating the sparkContext? like
local[4] will allocate 4 threads. You can try increasing it to a higher
number also try setting level of parallelism to a higher number.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 9:55 AM, Xi Shen davidshe...@gmail.com
You need to figure out why the receivers failed in the first place. Look in
your worker logs and see what really happened. When you run a streaming job
continuously for longer period mostly there'll be a lot of logs (you can
enable log rotation etc.) and if you are doing a groupBy, join, etc type
Try setting SPARK_MASTER_IP and you need to use the Spark URI
(spark://yourlinuxhost:7077) as displayed in the top left corner of Spark
UI (running on port 8080). Also when you are connecting from your mac, make
sure your network/firewall isn't blocking any port between the two machines.
Thanks
Open sbin/slaves.sh and sbin/spark-daemon.sh and then look for ssh command,
pass the port argument to that command in your case *-p 58518* and save
those files, do a start-all.sh :)
Thanks
Best Regards
On Mon, Mar 16, 2015 at 1:37 PM, ZhuGe t...@outlook.com wrote:
Hi all:
I am new to spark
, 2015 at 1:52 PM, Xi Shen davidshe...@gmail.com wrote:
I set it in code, not by configuration. I submit my jar file to local. I
am working in my developer environment.
On Mon, 16 Mar 2015 18:28 Akhil Das ak...@sigmoidanalytics.com wrote:
How are you setting it? and how are you submitting the job
1. I don't think textFile is capable of unpacking a .gz file. You need to
use hadoopFile or newAPIHadoop file for this.
2. Instead of map, do a mapPartitions
3. You need to open the driver UI and see what's really taking time. If
that is running on a remote machine and you are not able to access
:
Hi Akhil,
Yes, you are right. If I ran the program from IDE as a normal java
program, the executor's memory is increased...but not to 2048m, it is set
to 6.7GB...Looks like there's some formula to calculate this value.
Thanks,
David
On Mon, Mar 16, 2015 at 7:36 PM Akhil Das ak
of executor memory, it should be 2g * 0.6 = 1.2g.
My machine has 56GB memory, and 0.6 of that should be 33.6G...I hate math
xD
On Mon, Mar 16, 2015 at 7:59 PM Akhil Das ak...@sigmoidanalytics.com
wrote:
How much memory are you having on your machine? I think default value is
0.6
That totally depends on your data size and your cluster setup.
Thanks
Best Regards
On Thu, Mar 12, 2015 at 7:32 PM, Udbhav Agarwal udbhav.agar...@syncoms.com
wrote:
Hi,
What is query time for join query on hbase with spark sql. Say tables in
hbase have 0.5 million records each. I am
You could also add SPARK_MASTER_IP to bind to a specific host/IP so that it
won't get confused with those hosts in your /etc/hosts file.
Thanks
Best Regards
On Fri, Mar 13, 2015 at 12:00 PM, Du Li l...@yahoo-inc.com.invalid wrote:
Hi Spark community,
I searched for a way to configure a
,*
*Udbhav Agarwal*
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* 13 March, 2015 12:01 PM
*To:* Udbhav Agarwal
*Cc:* user@spark.apache.org
*Subject:* Re: spark sql performance
That totally depends on your data size and your cluster setup.
Thanks
Best Regards
,*
*Udbhav Agarwal*
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* 13 March, 2015 12:27 PM
*To:* Udbhav Agarwal
*Cc:* user@spark.apache.org
*Subject:* Re: spark sql performance
So you can cache upto 8GB of data in memory (hope your data size of one
table is 2GB
was running the query on
one machine with 3gm ram and the join query was taking around 6 seconds.
*Thanks,*
*Udbhav Agarwal*
*From:* Udbhav Agarwal
*Sent:* 13 March, 2015 12:45 PM
*To:* 'Akhil Das'
*Cc:* user@spark.apache.org
*Subject:* RE: spark sql performance
Okay Akhil! Thanks
Here's a simple consumer which does that
https://github.com/dibbhatt/kafka-spark-consumer/
Thanks
Best Regards
On Thu, Mar 12, 2015 at 10:28 PM, ColinMc colin.mcqu...@shiftenergy.com
wrote:
Hi,
How do you use KafkaUtils to specify a specific partition? I'm writing
customer Marathon jobs
Make sure your hadoop is running on port 8020, you can check it in your
core-site.xml file and use that URI like:
sc.textFile(hdfs://myhost:myport/data)
Thanks
Best Regards
On Fri, Mar 13, 2015 at 5:15 AM, Lau, Kawing (GE Global Research)
kawing@ge.com wrote:
Hi
I was running with
:
Lets say am using 4 machines with 3gb ram. My data is customers records
with 5 columns each in two tables with 0.5 million records. I want to
perform join query on these two tables.
*Thanks,*
*Udbhav Agarwal*
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* 13 March, 2015 12
Like this?
dtream.repartition(1).mapPartitions(it = it.take(5))
Thanks
Best Regards
On Fri, Mar 13, 2015 at 4:11 PM, Laeeq Ahmed laeeqsp...@yahoo.com.invalid
wrote:
Hi,
I normally use dstream.transform whenever I need to use methods which are
available in RDD API but not in streaming
SAP hana can be integrated with hadoop
http://saphanatutorial.com/sap-hana-and-hadoop/, so you will be able to
read/write to it using newAPIHadoopFile api of spark by passing the correct
Configurations etc.
Thanks
Best Regards
On Thu, Mar 12, 2015 at 1:15 PM, Hafiz Mujadid
))
}
val baseStatus = fs.getFileStatus(basePath)
if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
}
—
Best Regards!
Yijie Shen
On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com)
wrote:
Hi
We have a custom build to read directories
3.Call sc.newAPIHadoopFile(…) with
sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file:
10.196.119.230/folder1/abc.txt”,
classOf[UncTextInputFormat],
classOf[LongWritable],
classOf[Text], conf)
Ningjun
*From:* Akhil Das
Like this?
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result]).cache()
Here's a complete example
Spark 1.3.0 is not officially out yet, so i don't think sbt will download
the hadoop dependencies for your spark by itself. You could try manually
adding the hadoop dependencies yourself (hadoop-core, hadoop-common,
hadoop-client)
Thanks
Best Regards
On Wed, Mar 11, 2015 at 9:07 PM, Patcharee
At the end of foreachrdd i believe.
Thanks
Best Regards
On Thu, Mar 12, 2015 at 6:48 AM, Corey Nolet cjno...@gmail.com wrote:
Given the following scenario:
dstream.map(...).filter(...).window(...).foreachrdd()
When would the onBatchCompleted fire?
Hi
We have a custom build to read directories recursively, Currently we use it
with fileStream like:
val lines = ssc.fileStream[LongWritable, Text,
TextInputFormat](/datadumps/,
(t: Path) = true, true, *true*)
Making the 4th argument true to read recursively.
You could give it a try
After setting SPARK_LOCAL_DIRS/SPARK_WORKER_DIR you need to restart your
spark instances (stop-all.sh and start-all.sh), You can also try setting
java.io.tmpdir while creating the SparkContext.
Thanks
Best Regards
On Wed, Mar 11, 2015 at 1:47 AM, Justin Yip yipjus...@prediction.io wrote:
Can you paste your complete spark-submit command? Also did you try
specifying *--worker-cores*?
Thanks
Best Regards
On Tue, Mar 10, 2015 at 9:00 PM, htailor hemant.tai...@live.co.uk wrote:
Hi All,
I need some help with a problem in pyspark which is causing a major issue.
Recently I've
wrote:
I am running on a 4 workers cluster each having between 16 to 30 cores and
50 GB of ram
On Wed, 11 Mar 2015 8:55 am Akhil Das ak...@sigmoidanalytics.com wrote:
Depending on your cluster setup (cores, memory), you need to specify the
parallelism/repartition the data.
Thanks
Best
...@lexisnexis.com wrote:
This sounds like the right approach. Is there any sample code showing
how to use sc.newAPIHadoopFile ? I am new to Spark and don’t know much
about Hadoop. I just want to read a text file from UNC path into an RDD.
Thanks
*From:* Akhil Das [mailto:ak
Depending on your cluster setup (cores, memory), you need to specify the
parallelism/repartition the data.
Thanks
Best Regards
On Wed, Mar 11, 2015 at 12:18 PM, Sean Barzilay sesnbarzi...@gmail.com
wrote:
Hi I am currently using spark 1.3.0-snapshot to run the fpg algorithm from
the mllib
May be you can use this code for your purpose
https://gist.github.com/akhld/4286df9ab0677a555087 It basically sends the
content of the given file through Socket (both IO/NIO), i used it for a
benchmark between IO and NIO.
Thanks
Best Regards
On Wed, Mar 11, 2015 at 11:36 AM, Cui Lin
Does it write anything in BUCKET/SUB_FOLDER/output?
Thanks
Best Regards
On Wed, Mar 11, 2015 at 10:15 AM, cpalm3 cpa...@gmail.com wrote:
Hi All,
I am hoping someone has seen this issue before with S3, as I haven't been
able to find a solution for this problem.
When I try to save as Text
Here's a Java version
https://github.com/cloudera/parquet-examples/tree/master/MapReduce It won't
be that hard to make that in Scala.
Thanks
Best Regards
On Mon, Mar 9, 2015 at 9:55 PM, Shuai Zheng szheng.c...@gmail.com wrote:
Hi All,
I have a lot of parquet files, and I try to open them
Don't you think 1000 is too less for 160GB of data? Also you could try
using KryoSerializer, Enabling RDD Compression.
Thanks
Best Regards
On Mon, Mar 9, 2015 at 11:01 PM, mingweili0x m...@spokeo.com wrote:
I'm basically running a sorting using spark. The spark program will read
from
HDFS,
It will be good if you can explain the entire usecase like what kind of
requests, what sort of processing etc.
Thanks
Best Regards
On Mon, Mar 9, 2015 at 11:18 PM, Tarun Garg bigdat...@live.com wrote:
Hi,
I have a existing web base system which receives the request and process
that. This
Are you using SparkSQL for the join? In that case I'm not quiet sure you
have a lot of options to join on the nearest co-ordinate. If you are using
the normal Spark code (by creating key-pair on lat,lon) you can apply
certain logic like trimming the lat,lon etc. If you want more specific
computing
Make sure you don't have two master instances running on the same machine.
It could happen like you were running the job and in the middle you tried
to stop the cluster which didn't completely stopped it and you did a
start-all again which will eventually end up having 2 master instances
running,
Did you try something like:
myRDD.saveAsObjectFile(tachyon://localhost:19998/Y)
val newRDD = sc.objectFile[MyObject](tachyon://localhost:19998/Y)
Thanks
Best Regards
On Sun, Mar 8, 2015 at 3:59 PM, Yijie Shen henry.yijies...@gmail.com
wrote:
Hi,
I would like to share a RDD in several Spark
Mostly, when you use different versions of jars, it will throw up
incompatible version errors.
Thanks
Best Regards
On Fri, Mar 6, 2015 at 7:38 PM, Zsolt Tóth toth.zsolt@gmail.com wrote:
Hi,
I submit spark jobs in yarn-cluster mode remotely from java code by
calling
You could do it like this:
val transformedFileAndTime = fileAndTime.transformWith(anomaly, (rdd1:
RDD[(String,String)], rdd2 : RDD[Int]) = {
var first
= ; var second = ; var third = 0
Did you follow these steps? https://wiki.apache.org/hadoop/AmazonS3 Also
make sure your jobtracker/mapreduce processes are running fine.
Thanks
Best Regards
On Sun, Mar 8, 2015 at 7:32 AM, roni roni.epi...@gmail.com wrote:
Did you get this to work?
I got pass the issues with the cluster not
Can you paste the complete code?
Thanks
Best Regards
On Sat, Mar 7, 2015 at 2:25 AM, Ulanov, Alexander alexander.ula...@hp.com
wrote:
Hi,
I've implemented class MyClass in MLlib that does some operation on
LabeledPoint. MyClass extends serializable, so I can map this operation on
data of
Looks like an issue with your yarn setup, could you try doing a simple
example with spark-shell?
Start the spark shell as:
$*MASTER=yarn-client bin/spark-shell*
*spark-shell *sc.parallelize(1 to 1000).collect
If that doesn't work, then make sure your yarn services are up and running
and in
It works pretty fine for me with the script comes with 1.2.0 release.
Here's a few things which you can try:
- Add your s3 credentials to the core-site.xml
property namefs.s3.awsAccessKeyId/name
valueID/value/propertyproperty
namefs.s3.awsSecretAccessKey/name
valueSECRET/value/property
- Do a
Why not setup HDFS?
Thanks
Best Regards
On Thu, Mar 5, 2015 at 4:03 PM, didmar marin.did...@gmail.com wrote:
Hi,
I'm having a problem involving file permissions on the local filesystem.
On a first machine, I have two different users :
- launcher, which launches my job from an uber jar
You may exclude the log4j dependency while building. You can have a look at
this build file to see how to exclude libraries
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/missing_dependencies_in_jar_files.html
Thanks
Best Regards
On Thu, Mar 5, 2015 at 1:20
When you use KafkaUtils.createStream with StringDecoders, it will return
String objects inside your messages stream. To access the elements from the
json, you could do something like the following:
val mapStream = messages.map(x= {
val mapper = new ObjectMapper() with ScalaObjectMapper
When you say multiple directories, make sure those directories are
available and spark have permission to write to those directories. You can
look at the worker logs to see the exact reason of failure.
Thanks
Best Regards
On Tue, Mar 3, 2015 at 6:45 PM, lisendong lisend...@163.com wrote:
As
You can check in the mesos logs and see whats really happening.
Thanks
Best Regards
On Wed, Mar 4, 2015 at 3:10 PM, lisendong lisend...@163.com wrote:
15/03/04 09:26:36 INFO ClientCnxn: Client session timed out, have not heard
from server in 26679ms for sessionid 0x34bbf3313a8001b, closing
You can look at the following
- spark.akka.timeout
- spark.akka.heartbeat.pauses
from http://spark.apache.org/docs/1.2.0/configuration.html
Thanks
Best Regards
On Tue, Mar 3, 2015 at 4:46 PM, twinkle sachdeva twinkle.sachd...@gmail.com
wrote:
Hi,
Is there any relation between removing
You may look at https://issues.apache.org/jira/browse/SPARK-4516
Thanks
Best Regards
On Wed, Mar 4, 2015 at 12:25 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I got this error message:
15/03/03 10:22:41 ERROR OneForOneBlockFetcher: Failed while starting block
fetches
Looks like you are having 2 netty jars in the classpath.
Thanks
Best Regards
On Wed, Mar 4, 2015 at 5:14 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
From the lines pointed in the exception log, I figured out that my code is
unable to get the spark context. To isolate
You need to increase the parallelism/repartition the data to a higher
number to get ride of those.
Thanks
Best Regards
On Tue, Mar 3, 2015 at 2:26 PM, lisendong lisend...@163.com wrote:
why does the gc time so long?
i 'm using als in mllib, while the garbage collection time is too long
communication issue. If i try to
take thread dump of the executor, once it appears to be in trouble, then
time out happens.
Can it be something related to* spark.akka.threads?*
On Fri, Feb 27, 2015 at 3:55 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Mostly, that particular executor
Can you try increasing your driver memory, reducing the executors and
increasing the executor memory?
Thanks
Best Regards
On Tue, Mar 3, 2015 at 10:09 AM, Gustavo Enrique Salazar Torres
gsala...@ime.usp.br wrote:
Hi there:
I'm using LBFGS optimizer to train a logistic regression model. The
Not sure, but It could be related to th netty off heap access as described
here https://issues.apache.org/jira/browse/SPARK-4516, but the message was
different though.
Thanks
Best Regards
On Mon, Mar 2, 2015 at 12:51 AM, Zalzberg, Idan (Agoda)
idan.zalzb...@agoda.com wrote:
Thanks,
We
Here's the whole tech stack around it:
[image: Inline image 1]
For a bit more details you can refer this slide
http://www.slideshare.net/jeykottalam/spark-sqlamp-camp2014?related=1
Previous project was Shark (SQL over spark), you can read about it from
here
Wouldn't it be possible with .saveAsNewHadoopAPIFile? How are you pushing
the filters and projections currently?
Thanks
Best Regards
On Tue, Mar 3, 2015 at 1:11 AM, Addanki, Santosh Kumar
santosh.kumar.adda...@sap.com wrote:
Hi Colleagues,
Currently we have implemented External Data
I think you can do simple operations like foreachRDD or transform to get
access to the RDDs in the stream and then you can do SparkSQL over it.
Thanks
Best Regards
On Sat, Feb 28, 2015 at 3:27 PM, Ashish Mukherjee
ashish.mukher...@gmail.com wrote:
Hi,
I have been looking at Spark Streaming
You can use persist(StorageLevel.MEMORY_AND_DISK) if you are not having
sufficient memory to cache everything.
Thanks
Best Regards
On Fri, Feb 27, 2015 at 7:20 PM, Siddharth Ubale
siddharth.ub...@syncoms.com wrote:
Hi,
How do we manage putting partial data in to memory and partial into
You could be hitting this issue
https://issues.apache.org/jira/browse/SPARK-4516
Apart from that little more information about your job would be helpful.
Thanks
Best Regards
On Wed, Feb 25, 2015 at 11:34 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Hi Experts,
My Spark Job is failing with
Mostly, that particular executor is stuck on GC Pause, what operation are
you performing? You can try increasing the parallelism if you see only 1
executor is doing the task.
Thanks
Best Regards
On Fri, Feb 27, 2015 at 11:39 AM, twinkle sachdeva
twinkle.sachd...@gmail.com wrote:
Hi,
I am
at which the
system appears to hang. I'm worried about some sort of message loss or
inconsistency.
* Yes, we are using Kryo.
* I'll try that, but I'm again a little confused why you're recommending
this. I'm stumped so might as well?
On Wed, Feb 25, 2015 at 11:13 PM, Akhil Das ak
? Would I need to create a new tab and add the
metrics? Any good or simple examples showing how this can be done?
On Wed, Feb 25, 2015 at 12:07 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Did you have a look at
https://spark.apache.org/docs/1.0.2/api/scala/index.html
By throughput you mean Number of events processed etc?
[image: Inline image 1]
Streaming tab already have these statistics.
Thanks
Best Regards
On Wed, Feb 25, 2015 at 9:59 PM, Josh J joshjd...@gmail.com wrote:
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com
wrote
You can easily add a function (say setup_pig) inside the function
setup_cluster in this script
https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L649
Thanks
Best Regards
On Thu, Feb 26, 2015 at 7:08 AM, Sameer Tilak ssti...@live.com wrote:
Hi,
I was looking at the documentation
Which version of spark are you having? It seems there was a similar Jira
https://issues.apache.org/jira/browse/SPARK-2474
Thanks
Best Regards
On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:
Hi,
I need to find top 10 most selling samples. So query looks like:
select
What operation are you trying to do and how big is the data that you are
operating on?
Here's a few things which you can try:
- Repartition the RDD to a higher number than 222
- Specify the master as local[*] or local[10]
- Use Kryo Serializer (.set(spark.serializer,
Did you try setting .set(spark.cores.max, 20)
Thanks
Best Regards
On Wed, Feb 25, 2015 at 10:21 PM, Akshat Aranya aara...@gmail.com wrote:
I have Spark running in standalone mode with 4 executors, and each
executor with 5 cores each (spark.executor.cores=5). However, when I'm
processing an
anamika.guo...@gmail.com
wrote:
Hi Akhil
I guess it skipped my attention. I would definitely give it a try.
While I would still like to know what is the issue with the way I have
created schema?
On Tue, Feb 24, 2015 at 4:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Did you happen
Did you have a look at
https://spark.apache.org/docs/1.0.2/api/scala/index.html#org.apache.spark.scheduler.SparkListener
And for Streaming:
https://spark.apache.org/docs/1.0.2/api/scala/index.html#org.apache.spark.streaming.scheduler.StreamingListener
Thanks
Best Regards
On Tue, Feb 24,
Did you happen to have a look at
https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
Thanks
Best Regards
On Tue, Feb 24, 2015 at 3:39 PM, anu anamika.guo...@gmail.com wrote:
My issue is posted here on stack-overflow. What am I doing wrong
If you signup for Google Compute Cloud, you will get free $300 credits for
3 months and you can start a pretty good cluster for your testing purposes.
:)
Thanks
Best Regards
On Tue, Feb 24, 2015 at 8:25 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Hi,
I have just signed up for Amazon AWS
801 - 900 of 1386 matches
Mail list logo