Hi,
Try to clean your temp dir, System.getProperty(java.io.tmpdir)
Also, Can you paste a longer stacktrace?
Thanks
Best Regards
On Tue, Mar 4, 2014 at 2:55 PM, goi cto goi@gmail.com wrote:
Hi,
I am running a spark java program on a local machine. when I try to write
the output to
Hi Chieh,
You can increase the heap size by exporting the java options (See below,
will increase the heap size to 10Gb)
export _JAVA_OPTIONS=-Xmx10g
On Mon, Apr 21, 2014 at 11:43 AM, Chieh-Yen r01944...@csie.ntu.edu.twwrote:
Can anybody help me?
Thanks.
Chieh-Yen
On Wed, Apr 16, 2014
Hi,
Would you mind sharing the piece of code that caused this exception? As per
Javadoc NoSuchElementException is thrown if you call nextElement() method
of Enumeration and there is no more element in Enumeration.
Thanks
Best Regards.
On Tue, Apr 22, 2014 at 8:50 AM, gogototo
Hi
SparkContext launches the web interface at 4040, if you have multiple
sparkContext's on the same machine then the ports will be bind to
successive ports beginning with 4040.
Here's the documentation:
https://spark.apache.org/docs/0.9.0/monitoring.html
And here's a simple scala program to
Hi Jacob,
This post might give you a brief idea about the ports being used
https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA
On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger jeis...@us.ibm.com wrote:
Howdy,
We tried running Spark 0.9.1 stand-alone inside docker containers
You can always increase the sbt memory by setting
export JAVA_OPTS=-Xmx10g
Thanks
Best Regards
On Sat, Apr 26, 2014 at 2:17 AM, Williams, Ken
ken.willi...@windlogics.comwrote:
No, I haven't done any config for SBT. Is there somewhere you might be
able to point me toward for how to do
Hi
The reason you saw that warning is the native Hadoop library
$HADOOP_HOME/lib/native/libhadoop.so.1.0.0 was actually compiled on 32 bit.
Anyway, it's just a warning, and won't impact Hadoop's functionalities.
Here is the way if you do want to eliminate this warning, download the
source code
Hi Sparkers,
We have created a quick spark_gce script which can launch a spark cluster
in the Google Cloud. I'm sharing it because it might be helpful for someone
using the Google Cloud for deployment rather than AWS.
Here's the link to the script
https://github.com/sigmoidanalytics/spark_gce
Hi Aureliano,
You might want to check this script out,
https://github.com/sigmoidanalytics/spark_gce
Let me know if you need any help around that.
Thanks
Best Regards
On Tue, Apr 22, 2014 at 7:12 PM, Aureliano Buendia buendia...@gmail.comwrote:
On Tue, Apr 22, 2014 at 10:50 AM, Andras
I wonder why is your / is full. Try clearing out /tmp and also make sure in
the spark-env.sh you have put SPARK_JAVA_OPTS+=
-Dspark.local.dir=/mnt/spark
Thanks
Best Regards
On Tue, May 6, 2014 at 9:35 PM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I've a `no space left on device` exception
Hi Prabeesh,
Do a export _JAVA_OPTIONS=-Xmx10g before starting the shark. Also you can
do a ps aux | grep shark and see how much memory it is being allocated,
mostly it should be 512mb, in that case increase the limit.
Thanks
Best Regards
On Fri, May 23, 2014 at 10:22 AM, prabeesh k
Hi Gianluca,
I believe your cluster setup wasn't complete. Do check the ec2 script
console for more details. Also micro instances will be having only 600mb
memory.
Thanks
Best Regards
On Tue, Jun 3, 2014 at 1:59 AM, Gianluca Privitera
gianluca.privite...@studio.unibo.it wrote:
Hi everyone,
As Andrew said, your application is running on Standalone mode. You need
to pass
MASTER=spark://sanjar-local-machine-1:7077
before running your sparkPi example.
Thanks
Best Regards
On Tue, Jun 3, 2014 at 1:12 PM, MrAsanjar . afsan...@gmail.com wrote:
Thanks for your reply Andrew. I am
1. Make sure your spark-*.tgz that you created by make_distribution.sh is
accessible by all the slaves nodes.
2. Check the worker node logs.
Thanks
Best Regards
On Tue, Jun 3, 2014 at 8:13 PM, praveshjain1991 praveshjain1...@gmail.com
wrote:
I set up Spark-0.9.1 to run on mesos-0.13.0
ctrl + z will stop the job from being executed ( If you do a *fg/bg *you
can resume the job). You need to press ctrl + c to terminate the job!
Thanks
Best Regards
On Wed, Jun 4, 2014 at 10:24 AM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:
Hi,
I want to know how I can stop a running
http://spark.apache.org/docs/latest/running-on-mesos.html#troubleshooting-and-debugging
If you are not able to find the logs in /var/log/mesos
Do check in /tmp/mesos/ and you can see your applications id and all just
like in the $SPARK_HOME/work directory.
Thanks
Best Regards
On Wed,
you can comment out this function and Create a new one which will return
your ami-id and the rest of the script will run fine.
def get_spark_ami(opts):
instance_types = {
m1.small:pvm,
m1.medium: pvm,
m1.large:pvm,
m1.xlarge: pvm,
t1.micro:pvm,
c1.medium:
be installed? Do certain directories need to
exist? etc...
On Fri, Jun 6, 2014 at 4:40 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
you can comment out this function and Create a new one which will return
your ami-id and the rest of the script will run fine.
def get_spark_ami(opts
Can you paste the piece of code!?
Thanks
Best Regards
On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:
Hi,
I am getting ArrayIndexOutOfBoundsException while reading from bz2 files
in HDFS.I have come across the same issue in JIRA at
You can use the master's IP address (Or whichever machine you chose to run
the nc command) instead of localhost.
Hi
Check in your driver programs Environment, (eg:
http://192.168.1.39:4040/environment/). If you don't see this
commons-codec-1.7.jar jar then that's the issue.
Thanks
Best Regards
On Mon, Jun 16, 2014 at 5:07 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I'm trying to use Accumulo
Open your webUI in the browser and see the spark url in the top left corner
of the page and use it while starting your spark shell instead of
localhost:7077.
Thanks
Best Regards
On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi
Can someone help me with
Can you paste the stderr from the worker logs? (Found in work/
app-20140625133031-0002/ directory)
Most likely you might need to set SPARK_MASTER_IP in your spark-env.sh file
(Not sure why i'm seeing akka.tcp://spark@localhost:56569 instead of
akka.tcp://spark@*serverip*:56569)
Thanks
Best
Try deleting the .iv2 directory in your home and then do a sbt clean
assembly would solve this issue i guess.
Thanks
Best Regards
On Thu, Jun 26, 2014 at 3:10 AM, Robert James srobertja...@gmail.com
wrote:
In case anyone else is having this problem, deleting all ivy's cache,
then doing a sbt
You cannot read image files with wholeTextFiles because it uses
CombineFileInputFormat which cannot read gripped files because they are not
splittable http://www.bigdataspeak.com/2013_01_01_archive.html (source
proving it):
override def createRecordReader(
split: InputSplit,
Hi Shannon,
It should be a configuration issue, check in your /etc/hosts and make sure
localhost is not associated with the SPARK_MASTER_IP you provided.
Thanks
Best Regards
On Thu, Jun 26, 2014 at 6:37 AM, Shannon Quinn squ...@gatech.edu wrote:
Hi all,
I have a 2-machine Spark network
Hi Jamborta,
You can use the following options in your application to limit the usage of
resources, like
- spark.cores.max
- spark.executor.memory
Its better to use Mesos if you want to run multiple applications on the
same cluster smoothly.
Thanks
Best Regards
On Thu, Jun 26,
Yep, it does.
Thanks
Best Regards
On Thu, Jun 26, 2014 at 6:11 PM, jamborta jambo...@gmail.com wrote:
thanks a lot. I have tried restricting the memory usage before, but it
seems
it was the issue with the number of cores available.
I am planning to run this on a yarn cluster, I assume
the master crashes immediately due to the address already being
in use.
Any ideas? Thanks!
Shannon
On 6/26/14, 10:14 AM, Akhil Das wrote:
Can you paste your spark-env.sh file?
Thanks
Best Regards
On Thu, Jun 26, 2014 at 7:01 PM, Shannon Quinn squ...@gatech.edu wrote:
Both /etc/hosts
Hi
Not sure, if this will help you.
1. Create one application that will put files to your S3 bucket from public
data source (You can use public wiki-data)
2. Create another application (SparkStreaming one) which will listen on
that bucket ^^ and perform some operation (Caching, GroupBy etc) as
Something like this???
import java.util.List;
import org.apache.commons.configuration.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import
Is this command working??
java -cp ::/usr/local/spark-1.0.0/conf:/usr/local/spark-1.0.0/
assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar
-XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077
Thanks
Are you having sbt directory inside your spark directory?
Thanks
Best Regards
On Wed, Jul 2, 2014 at 10:17 PM, Imran Akbar im...@infoscoutinc.com wrote:
Hi,
I'm trying to install spark 1 on my hadoop cluster running on EMR. I
didn't have any problem installing the previous versions, but
If you have downloaded the pre-compiled binary, it will not have sbt
directory inside it.
Thanks
Best Regards
On Thu, Jul 3, 2014 at 12:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Are you having sbt directory inside your spark directory?
Thanks
Best Regards
On Wed, Jul 2, 2014
Hi Singh!
For this use-case its better to have a Streaming context listening to that
directory in hdfs where the files are being dropped and you can set the
Streaming interval as 15 minutes and let this driver program run
continuously, so as soon as new files are arrived they are taken for
Most likely you are missing the hadoop configuration files (present in
conf/*.xml).
Thanks
Best Regards
On Fri, Jul 4, 2014 at 7:38 AM, Steven Cox s...@renci.org wrote:
They weren't. They are now and the logs look a bit better - like perhaps
some serialization is completing that wasn't
Are you sure this is your master URL spark://pzxnvm2018:7077 ?
You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left
corner. Also make sure you are able to telnet pzxnvm2018 7077 from the
machines where you are running the spark shell.
Thanks
Best Regards
On Tue, Jul 8, 2014
Can you try setting SPARK_MASTER_IP in the spark-env.sh file?
Thanks
Best Regards
On Wed, Jul 9, 2014 at 10:58 AM, amin mohebbi aminn_...@yahoo.com wrote:
Hi all,
I have one master and two slave node, I did not set any ip for spark
driver because I thought it uses its default (
...
Amin Mohebbi
PhD candidate in Software Engineering
at university of Malaysia
H#x2F;P : +60 18 2040 017
E-Mail : tp025...@ex.apiit.edu.my
amin_...@me.com
On Wednesday, July 9, 2014 2:32 PM, Akhil Das
ak...@sigmoidanalytics.com wrote:
Can you try setting
You can use the spark-ec2/bdutil scripts to set it up on the AWS/GCE cloud
quickly.
If you want to set it up on your own then these are the things that you
will need to do:
1. Make sure you have java (7) installed on all machines.
2. Install and configure spark (add all slave nodes in
Try this out:
JavaStreamingContext sc = new
JavaStreamingContext(...);JavaDStreamString lines =
ctx.fileStream(whatever);JavaDStreamString words = lines.flatMap(
new FlatMapFunctionString, String() {
public IterableString call(String s) {
return Arrays.asList(s.split( ));
}
});
Hi Bertrand,
We've updated the document
http://docs.sigmoidanalytics.com/index.php/Setting_up_spork_with_spark_0.9.0
This is our working Github repo
https://github.com/sigmoidanalytics/spork/tree/spork-0.9
Feel free to open issues over here
https://github.com/sigmoidanalytics/spork/issues
Easiest fix would be adding the kafka jars to the SparkContext while
creating it.
Thanks
Best Regards
On Fri, Jul 11, 2014 at 4:39 AM, Dilip dilip_ram...@hotmail.com wrote:
Hi,
I am trying to run a program with spark streaming using Kafka on a stand
alone system. These are my details:
You simply use the *nc* command to do this. like:
nc -p 12345
will open the 12345 port and from the terminal you can provide whatever
input you require for your StreamingCode.
Thanks
Best Regards
On Fri, Jul 11, 2014 at 2:41 AM, kytay kaiyang@gmail.com wrote:
Hi
I am learning spark
Sorry, the command is
nc -lk 12345
Thanks
Best Regards
On Fri, Jul 11, 2014 at 6:46 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You simply use the *nc* command to do this. like:
nc -p 12345
will open the 12345 port and from the terminal you can provide whatever
input you require
Can you try this piece of code?
SparkConf sparkConf = new SparkConf().setAppName(JavaNetworkWordCount
);
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new
Duration(1000));
JavaReceiverInputDStreamString lines = ssc.socketTextStream(
args[0],
Is this what you are looking for?
https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html
According to the doc, it says Operator that acts as a sink for queries on
RDDs and can be used to store the output inside a directory of Parquet
files. This
You can try the following in the spark-shell:
1. Run it in *Clustermode* by going inside the spark directory:
$ SPARK_MASTER=spark://masterip:7077 ./bin/spark-shell
val textFile = sc.textFile(hdfs://masterip/data/blah.csv)
textFile.take(10).foreach(println)
2. Now try running in *Localmode:*
Hi Neethu,
Your application is running on local mode and that's the reason why you are
not seeing the driver app in the 8080 webUI. You can pass the Master IP to
your pyspark and get it running in cluster mode.
eg: IPYTHON_OPTS=notebook --pylab inline $SPARK_HOME/bin/pyspark --master
Can you paste the piece of code?
Thanks
Best Regards
On Wed, Jul 23, 2014 at 1:22 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am running a spark streaming job. The job hangs on one stage, which
shows as follows:
Details for Stage 4
Summary Metrics No tasks have started
AFAIK you can use the --hadoop-major-version parameter with the spark-ec2
https://github.com/apache/spark/blob/master/ec2/spark_ec2.py script to
switch the hadoop version.
Thanks
Best Regards
On Wed, Jul 23, 2014 at 6:07 AM, durga durgak...@gmail.com wrote:
Hi,
I am trying to create spark
Hi
Currently this is not supported out of the Box. But you can of course
add/remove workers in a running cluster. Better option would be to use a
Mesos cluster where adding/removing nodes are quiet simple. But again, i
believe adding new worker in the middle of a task won't give you better
Are you sure the RDD that you were saving isn't empty!?
Are you seeing a _SUCCESS file in this location? hdfs://
masteripaddress:9000/root/test-app/test1/
(Do hadoop fs -ls hdfs://masteripaddress:9000/root/test-app/test1/)
Thanks
Best Regards
On Thu, Jul 24, 2014 at 4:24 PM, lmk
Here's the complete overview http://spark.apache.org/docs/latest/
And Here's the quick start guidelines
http://spark.apache.org/docs/latest/quick-start.html
I would suggest you downloading the Spark pre-compiled binaries
This piece of code
saveAsHadoopFile[TextOutputFormat[NullWritable,Text]](hdfs://
masteripaddress:9000/root/test-app/test1/)
Saves the RDD into HDFS, and yes you can physically see the files using the
hadoop command (hadoop fs -ls /root/test-app/test1 - yes you need to login
to the cluster). In
Most likely you are closing the connection with HDFS. Can you paste the
piece of code that you are executing?
We were having similar problem when we closed the FileSystem object in our
code.
Thanks
Best Regards
On Thu, Jul 24, 2014 at 11:00 PM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Try without the *
val avroRdd = sc.newAPIHadoopFile(hdfs://url:8020/my dir/,
classOf[AvroSequenceFileInputFormat[AvroKey[GenericRecord],NullWritable]],
classOf[AvroKey[GenericRecord]], classOf[NullWritable])
avroRdd.collect()
Thanks
Best Regards
On Fri, Jul 25, 2014 at 7:22 PM, Sparky
A quick fix would be to implement java.io.Serializable in those classes
which are causing this exception.
Thanks
Best Regards
On Mon, Jul 28, 2014 at 9:21 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi all,
I was wondering if someone has conceived a method for
You need to increase the wait time, (-w) the default is 120 seconds, you
may set it to a higher number like 300-400. The problem is that EC2 takes
some time to initiate the machine (which is 120 seconds sometimes.)
Thanks
Best Regards
On Wed, Jul 30, 2014 at 8:52 PM, William Cox
at 12:17 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You can use a map function like the following and do whatever you want
with the Result.
FunctionTuple2ImmutableBytesWritable, Result, IteratorString{
public IteratorString call(Tuple2ImmutableBytesWritable,
Result test
You need to use persist or cache those rdds to appear in the Storage.
Unless you do it, those rdds will be computed again.
Thanks
Best Regards
On Tue, Aug 5, 2014 at 8:03 AM, binbinbin915 binbinbin...@live.cn wrote:
Actually, if you don’t use method like persist or cache, it even not
store
Are you able to see the job on the WebUI (8080)? If yes, how much memory
are you seeing there specifically for this job?
[image: Inline image 1]
Here you can see i have 11.8Gb RAM on both workers and my app is using
11GB.
1. What are all the memory that you are seeing in your case?
2. Make sure
You can always start your spark-shell by specifying the master as
MASTER=spark://*whatever*:7077 $SPARK_HOME/bin/spark-shell
Then it will connect to that *whatever* master.
Thanks
Best Regards
On Tue, Aug 5, 2014 at 8:51 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Hi
Write 1
add1 0 0.0 B / 1766.4 MB 0.0 B 0 0 0 0 0 ms 0.0 B 0.0 B 2add2 0 0.0 B
/ 1766.4 MB 0.0 B0 0 00 0 ms0.0 B 0.0 B driver add3 0 0.0 B / 294.6 MB
0.0 B 0 0 0 0 0 ms 0.0 B 0.0 B
On Tue, Aug 5, 2014 at 11:32 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Are you able to see the job
Are you sure that you were not running SparkPi in local mode?
Thanks
Best Regards
On Wed, Aug 6, 2014 at 12:43 AM, Sunny Khatri sunny.k...@gmail.com wrote:
Well I was able to run the SparkPi, that also does the similar stuff,
successfully.
On Tue, Aug 5, 2014 at 11:52 AM, Akhil Das ak
Looks like a netty conflict there, most likely you are having mutiple
versions of netty jars (eg:
netty-3.6.6.Final.jar, netty-3.2.2.Final.jar, netty-all-4.0.13.Final.jar),
you only require 3.6.6 i believe. a quick fix would be to remove the rest
of them.
Thanks
Best Regards
On Wed, Aug 6, 2014
You can download and compile spark against your existing hadoop version.
Here's a quick start
https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types
You can also read a bit here
http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster
( the
Could be some issues with the way you access it.
If you are able to see http://master-ip-public-ip:8080 then ideally the
application UI (if you havent changed the default) will be available on
http://master-public-ip:4040,
Similarly, you can see the worker UIs at http://worker-public-ip:8081
This is how i used to do it:
*// Create a list of jars*
ListString jars =
Lists.newArrayList(/home/akhld/mobi/localcluster/x/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar,ADD-All-The-Jars-Here
);
Hi Darin,
This is the piece of code
https://github.com/mesos/spark-ec2/blob/v3/deploy_templates.py doing the
actual work (Setting the memory). As you can see, it leaves 15Gb of ram for
OS on a 100Gb machine... 2Gb RAM on a 10-20Gb machine etc.
You can always set
Hi Ghousia,
You can try the following:
1. Increase the heap size
https://spark.apache.org/docs/0.9.0/configuration.html
2. Increase the number of partitions
http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine
3. You could try
Looks like your hiveContext is null. Have a look at this documentation.
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
Thanks
Best Regards
On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com
wrote:
Hello:
I am trying to setup Spark to
18, 2014 at 12:02 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Hi Ghousia,
You can try the following:
1. Increase the heap size
https://spark.apache.org/docs/0.9.0/configuration.html
2. Increase the number of partitions
http://stackoverflow.com/questions/21698443/spark-best-practice
You can create an RDD and then you can do a map or mapPartitions on that
where in the top you will create the database connection and all, then do
the operations and at the end close the connections.
Thanks
Best Regards
On Mon, Aug 18, 2014 at 12:34 PM, Henry Hung ythu...@winbond.com wrote:
:00 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Looks like your hiveContext is null. Have a look at this documentation.
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
Thanks
Best Regards
On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo
ce
Spark Streaming
https://spark.apache.org/docs/latest/streaming-programming-guide.html is
the best fit for this use case. Basically you create a streaming context
pointing to that directory, also you can set the streaming interval (in
your case its 5 minutes). SparkStreaming will only process the
Looks like 1 worker is doing the job. Can you repartition the RDD? Also
what is the number of cores that you allocated? Things like this, you can
easily identify by looking at the workers webUI (default worker:8081)
Thanks
Best Regards
On Tue, Aug 19, 2014 at 6:35 PM, Laird, Benjamin
One approach would be to set these environment variables inside the
spark-env.sh in all workers then you can access them using the
System.getEnv(WHATEVER)
Thanks
Best Regards
On Wed, Aug 20, 2014 at 9:49 PM, Darin McBeath ddmcbe...@yahoo.com.invalid
wrote:
Can't seem to figure this out. I've
What operation are you performing before doing the saveAsTextFile? If you
are doing a groupBy/sortBy/mapPartition/reduceByKey operations then you can
specify the number of partitions. We were facing these kind of problems and
specifying the correct partition solved the issue.
Thanks
Best Regards
I think your *sparkUrl *points to an invalid cluster url. Just make sure
you are giving the correct url (the one you see on top left in the
master:8080 webUI).
Thanks
Best Regards
On Tue, Aug 26, 2014 at 11:07 AM, Forest D dev24a...@gmail.com wrote:
Hi Jonathan,
Thanks for the reply. I ran
Have a look at the history server, looks like you have enabled history
server on your local and not on the remote server.
http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/monitoring.html
Thanks
Best Regards
On Tue, Aug 26, 2014 at 7:01 AM, SK skrishna...@gmail.com wrote:
Hi,
I am
You need to run your app in localmode ( aka master=local[2]) to get it
debugged locally. If you are running it on a cluster, then you can use
the remote
debugging feature.
http://stackoverflow.com/questions/19128264/how-to-remote-debug-in-intellij-12-1-4
For remote debugging, you need to pass the
Hi
Not sure this is the right way of doing it, but if you can create a
PairRDDFunction from that RDD then you can use the following piece of code
to access the filenames from the RDD.
PairRDDFunctionsK, V ds = .;
//getting the
Yes, you can open a jdbc connection at the beginning of the map method then
close this connection at the end of map() and in between you can use this
connection.
Thanks
Best Regards
On Tue, Aug 26, 2014 at 6:12 PM, Ravi Sharma raviprincesha...@gmail.com
wrote:
Hello People,
I'm using java
really impact your performance.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, Aug 26, 2014 at 6:45 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Yes, you can open a jdbc connection at the beginning
The statement java.io.IOException: Could not locate executable
null\bin\winutils.exe
explains that the null is received when expanding or replacing an
Environment Variable.
I'm guessing that you are missing *HADOOP_HOME* in the environment
variables.
Thanks
Best Regards
On Wed, Aug 27, 2014
of that environment variable? I want to run
the scripts locally on my machine and do not have any Hadoop installed.
Thank you
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Mittwoch, 27. August 2014 12:54
*To:* Hingorani, Vineet
*Cc:* user@spark.apache.org
*Subject:* Re: Example
am running it on local
machine and it is not able to find some dependencies of Hadoop. Please tell
me what file should I download to work on my local machine (pre-built, so
that I don’t have to build it again).
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Mittwoch, 27
It bundles all these src's
https://github.com/apache/spark/tree/master/examples together and also it
uses the pom file to get the dependencies list if I'm not wrong.
Thanks
Best Regards
On Fri, Aug 29, 2014 at 12:39 AM, filipus floe...@gmail.com wrote:
hey guys
i still try to get used to
Hi
You can see this doc
https://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
for all the available webUI ports.
Yes there are ways to get the data metrics in Json format, One of them is
below:
*http://webUI:8080/json/ http://webUI:8080/json/* Or
You can bring those classes out of the library and Serialize it (implements
Serializable). It is not the right way of doing it though it solved few of
my similar problems.
Thanks
Best Regards
On Fri, Sep 5, 2014 at 7:36 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Hi,
the value
across the cluster.
Please correct me if I'm wrong.
Thanks,
Cheers,
Ravi Sharma
On Wed, Sep 10, 2014 at 7:31 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Have a look at Broadcasting variables
http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
Which version of spark are you having?
Thanks
Best Regards
On Thu, Sep 11, 2014 at 3:10 PM, mrm ma...@skimlinks.com wrote:
Hi,
I have been launching Spark in the same ways for the past months, but I
have
only recently started to have problems with it. I launch Spark using
spark-ec2
like this?
var temp = ...
for (i - num)
{
temp = ..
{
do something
}
temp.unpersist()
}
Thanks
Best Regards
On Thu, Sep 11, 2014 at 3:26 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
I want to create a temporary variables in a spark code.
Can I do this?
for (i - num)
{
Hi Jim,
This approach will not work right out of the box. You need to understand a
few things. A driver program and the master will be communicating with each
other, for that you need to open up certain ports for your public ip (Read
about port forwarding http://portforward.com/). Also on the
What is your system setup? Can you paste the spark-env.sh? Looks like you
have some issues with your configuration.
Thanks
Best Regards
On Fri, Sep 12, 2014 at 6:31 PM, 남윤민 rony...@dgist.ac.kr wrote:
I got this error from the executor's stderr:
Using Spark's default log4j profile:
Try increasing the number of partitions while doing a reduceByKey()
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD
Thanks
Best Regards
On Sun, Sep 14, 2014 at 5:11 PM, richiesgr richie...@gmail.com wrote:
Hi
I've written a job (I think not very
up.
:-(
On Mon, Sep 15, 2014 at 1:20 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you give this a try:
conf = SparkConf().set(spark.executor.memory,
32G)*.set(spark.akka.frameSize
,
1000).set(spark.broadcast.factory,org.apache.spark.broadcast.TorrentBroadcastFactory)*
sc
Ganglia does give you a cluster wide and per machine utilization of
resources, but i don't think it gives your per Spark Job. If you want to
build something from scratch then you can follow up like :
1. Login to the machine
2. Get the PIDs
3. For network IO per process, you can have a look at
Can you dump out a small piece of data? while doing rdd.collect and
rdd.foreach(println)
Thanks
Best Regards
On Wed, Sep 17, 2014 at 12:26 PM, vasiliy zadonsk...@gmail.com wrote:
it also appears in streaming hdfs fileStream
--
View this message in context:
1 - 100 of 1302 matches
Mail list logo