Re: Problem with delete spark temp dir on spark 0.8.1

2014-03-04 Thread Akhil Das
Hi, Try to clean your temp dir, System.getProperty(java.io.tmpdir) Also, Can you paste a longer stacktrace? Thanks Best Regards On Tue, Mar 4, 2014 at 2:55 PM, goi cto goi@gmail.com wrote: Hi, I am running a spark java program on a local machine. when I try to write the output to

Re: Java heap space and spark.akka.frameSize Inbox x

2014-04-21 Thread Akhil Das
Hi Chieh, You can increase the heap size by exporting the java options (See below, will increase the heap size to 10Gb) export _JAVA_OPTIONS=-Xmx10g On Mon, Apr 21, 2014 at 11:43 AM, Chieh-Yen r01944...@csie.ntu.edu.twwrote: Can anybody help me? Thanks. Chieh-Yen On Wed, Apr 16, 2014

Re: how to solve this problem?

2014-04-22 Thread Akhil Das
Hi, Would you mind sharing the piece of code that caused this exception? As per Javadoc NoSuchElementException is thrown if you call nextElement() method of Enumeration and there is no more element in Enumeration. Thanks Best Regards. On Tue, Apr 22, 2014 at 8:50 AM, gogototo

Re: no response in spark web UI

2014-04-22 Thread Akhil Das
Hi SparkContext launches the web interface at 4040, if you have multiple sparkContext's on the same machine then the ports will be bind to successive ports beginning with 4040. Here's the documentation: https://spark.apache.org/docs/0.9.0/monitoring.html And here's a simple scala program to

Re: Securing Spark's Network

2014-04-25 Thread Akhil Das
Hi Jacob, This post might give you a brief idea about the ports being used https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger jeis...@us.ibm.com wrote: Howdy, We tried running Spark 0.9.1 stand-alone inside docker containers

Re: Build times for Spark

2014-04-25 Thread Akhil Das
You can always increase the sbt memory by setting export JAVA_OPTS=-Xmx10g Thanks Best Regards On Sat, Apr 26, 2014 at 2:17 AM, Williams, Ken ken.willi...@windlogics.comwrote: No, I haven't done any config for SBT. Is there somewhere you might be able to point me toward for how to do

Re: the spark configuage

2014-04-30 Thread Akhil Das
Hi The reason you saw that warning is the native Hadoop library $HADOOP_HOME/lib/native/libhadoop.so.1.0.0 was actually compiled on 32 bit. Anyway, it's just a warning, and won't impact Hadoop's functionalities. Here is the way if you do want to eliminate this warning, download the source code

Spark GCE Script

2014-05-05 Thread Akhil Das
Hi Sparkers, We have created a quick spark_gce script which can launch a spark cluster in the Google Cloud. I'm sharing it because it might be helpful for someone using the Google Cloud for deployment rather than AWS. Here's the link to the script https://github.com/sigmoidanalytics/spark_gce

Re: Using google cloud storage for spark big data

2014-05-05 Thread Akhil Das
Hi Aureliano, You might want to check this script out, https://github.com/sigmoidanalytics/spark_gce Let me know if you need any help around that. Thanks Best Regards On Tue, Apr 22, 2014 at 7:12 PM, Aureliano Buendia buendia...@gmail.comwrote: On Tue, Apr 22, 2014 at 10:50 AM, Andras

Re: No space left on device error when pulling data from s3

2014-05-06 Thread Akhil Das
I wonder why is your / is full. Try clearing out /tmp and also make sure in the spark-env.sh you have put SPARK_JAVA_OPTS+= -Dspark.local.dir=/mnt/spark Thanks Best Regards On Tue, May 6, 2014 at 9:35 PM, Han JU ju.han.fe...@gmail.com wrote: Hi, I've a `no space left on device` exception

Re: java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-23 Thread Akhil Das
Hi Prabeesh, Do a export _JAVA_OPTIONS=-Xmx10g before starting the shark. Also you can do a ps aux | grep shark and see how much memory it is being allocated, mostly it should be 512mb, in that case increase the limit. Thanks Best Regards On Fri, May 23, 2014 at 10:22 AM, prabeesh k

Re: EC2 Simple Cluster

2014-06-03 Thread Akhil Das
Hi Gianluca, I believe your cluster setup wasn't complete. Do check the ec2 script console for more details. Also micro instances will be having only 600mb memory. Thanks Best Regards On Tue, Jun 3, 2014 at 1:59 AM, Gianluca Privitera gianluca.privite...@studio.unibo.it wrote: Hi everyone,

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Akhil Das
​As Andrew said, your application is running on Standalone mode. You need to pass MASTER=spark://sanjar-local-machine-1:7077 before running your sparkPi example. Thanks Best Regards On Tue, Jun 3, 2014 at 1:12 PM, MrAsanjar . afsan...@gmail.com wrote: Thanks for your reply Andrew. I am

Re: Spark not working with mesos

2014-06-03 Thread Akhil Das
1. Make sure your spark-*.tgz that you created by make_distribution.sh is accessible by all the slaves nodes. 2. Check the worker node logs. Thanks Best Regards On Tue, Jun 3, 2014 at 8:13 PM, praveshjain1991 praveshjain1...@gmail.com wrote: I set up Spark-0.9.1 to run on mesos-0.13.0

Re: How to stop a running SparkContext in the proper way?

2014-06-04 Thread Akhil Das
ctrl + z will stop the job from being executed ( If you do a *fg/bg *you can resume the job). You need to press ctrl + c to terminate the job! Thanks Best Regards On Wed, Jun 4, 2014 at 10:24 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I want to know how I can stop a running

Re: Spark not working with mesos

2014-06-04 Thread Akhil Das
http://spark.apache.org/docs/latest/running-on-mesos.html#troubleshooting-and-debugging ​​ If you are not able to find the logs in /var/log/mesos Do check in /tmp/mesos/ and you can see your applications id and all just like in the $SPARK_HOME/work directory. Thanks Best Regards On Wed,

Re: creating new ami image for spark ec2 commands

2014-06-06 Thread Akhil Das
you can comment out this function and Create a new one which will return your ami-id and the rest of the script will run fine. def get_spark_ami(opts): instance_types = { m1.small:pvm, m1.medium: pvm, m1.large:pvm, m1.xlarge: pvm, t1.micro:pvm, c1.medium:

Re: creating new ami image for spark ec2 commands

2014-06-06 Thread Akhil Das
be installed? Do certain directories need to exist? etc... On Fri, Jun 6, 2014 at 4:40 AM, Akhil Das ak...@sigmoidanalytics.com wrote: you can comment out this function and Create a new one which will return your ami-id and the rest of the script will run fine. def get_spark_ami(opts

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread Akhil Das
Can you paste the piece of code!? Thanks Best Regards On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am getting ArrayIndexOutOfBoundsException while reading from bz2 files in HDFS.I have come across the same issue in JIRA at

Re: Spark Streaming socketTextStream

2014-06-10 Thread Akhil Das
You can use the master's IP address (Or whichever machine you chose to run the nc command) instead of localhost.

Re: Need help. Spark + Accumulo = Error: java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String

2014-06-16 Thread Akhil Das
Hi Check in your driver programs Environment, (eg: http://192.168.1.39:4040/environment/). If you don't see this commons-codec-1.7.jar jar then that's the issue. Thanks Best Regards On Mon, Jun 16, 2014 at 5:07 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, I'm trying to use Accumulo

Re: hi

2014-06-22 Thread Akhil Das
Open your webUI in the browser and see the spark url in the top left corner of the page and use it while starting your spark shell instead of localhost:7077. Thanks Best Regards On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek kartheek.m...@gmail.com wrote: Hi Can someone help me with

Re: Worker nodes: Error messages

2014-06-26 Thread Akhil Das
Can you paste the stderr from the worker logs? (Found in work/ app-20140625133031-0002/ directory) Most likely you might need to set SPARK_MASTER_IP in your spark-env.sh file (Not sure why i'm seeing akka.tcp://spark@localhost:56569 instead of akka.tcp://spark@*serverip*:56569) Thanks Best

Re: Upgrading to Spark 1.0.0 causes NoSuchMethodError

2014-06-26 Thread Akhil Das
Try deleting the .iv2 directory in your home and then do a sbt clean assembly would solve this issue i guess. Thanks Best Regards On Thu, Jun 26, 2014 at 3:10 AM, Robert James srobertja...@gmail.com wrote: In case anyone else is having this problem, deleting all ivy's cache, then doing a sbt

Re: wholeTextFiles like for binary files ?

2014-06-26 Thread Akhil Das
You cannot read image files with wholeTextFiles because it uses CombineFileInputFormat which cannot read gripped files because they are not splittable http://www.bigdataspeak.com/2013_01_01_archive.html (source proving it): override def createRecordReader( split: InputSplit,

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
Hi Shannon, It should be a configuration issue, check in your /etc/hosts and make sure localhost is not associated with the SPARK_MASTER_IP you provided. Thanks Best Regards On Thu, Jun 26, 2014 at 6:37 AM, Shannon Quinn squ...@gatech.edu wrote: Hi all, I have a 2-machine Spark network

Re: running multiple applications at the same time

2014-06-26 Thread Akhil Das
​​Hi Jamborta, You can use the following options in your application to limit the usage of resources, like - spark.cores.max - spark.executor.memory Its better to use Mesos if you want to run multiple applications on the same cluster smoothly. Thanks Best Regards On Thu, Jun 26,

Re: running multiple applications at the same time

2014-06-26 Thread Akhil Das
Yep, it does. Thanks Best Regards On Thu, Jun 26, 2014 at 6:11 PM, jamborta jambo...@gmail.com wrote: thanks a lot. I have tried restricting the memory usage before, but it seems it was the issue with the number of cores available. I am planning to run this on a yarn cluster, I assume

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
the master crashes immediately due to the address already being in use. Any ideas? Thanks! Shannon On 6/26/14, 10:14 AM, Akhil Das wrote: Can you paste your spark-env.sh file? Thanks Best Regards On Thu, Jun 26, 2014 at 7:01 PM, Shannon Quinn squ...@gatech.edu wrote: Both /etc/hosts

Re: Memory/Network Intensive Workload

2014-06-30 Thread Akhil Das
Hi Not sure, if this will help you. 1. Create one application that will put files to your S3 bucket from public data source (You can use public wiki-data) 2. Create another application (SparkStreaming one) which will listen on that bucket ^^ and perform some operation (Caching, GroupBy etc) as

Re: Spark Streaming with HBase

2014-06-30 Thread Akhil Das
Something like this??? import java.util.List; import org.apache.commons.configuration.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import

Re: Failed to launch Worker

2014-07-01 Thread Akhil Das
Is this command working?? java -cp ::/usr/local/spark-1.0.0/conf:/usr/local/spark-1.0.0/ assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://x.x.x.174:7077 Thanks

Re: installing spark 1 on hadoop 1

2014-07-03 Thread Akhil Das
Are you having sbt directory inside your spark directory? Thanks Best Regards On Wed, Jul 2, 2014 at 10:17 PM, Imran Akbar im...@infoscoutinc.com wrote: Hi, I'm trying to install spark 1 on my hadoop cluster running on EMR. I didn't have any problem installing the previous versions, but

Re: installing spark 1 on hadoop 1

2014-07-03 Thread Akhil Das
If you have downloaded the pre-compiled binary, it will not have sbt directory inside it. Thanks Best Regards On Thu, Jul 3, 2014 at 12:35 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you having sbt directory inside your spark directory? Thanks Best Regards On Wed, Jul 2, 2014

Re: Reading text file vs streaming text files

2014-07-03 Thread Akhil Das
Hi Singh! For this use-case its better to have a Streaming context listening to that directory in hdfs where the files are being dropped and you can set the Streaming interval as 15 minutes and let this driver program run continuously, so as soon as new files are arrived they are taken for

Re: No FileSystem for scheme: hdfs

2014-07-03 Thread Akhil Das
​Most likely you are missing the hadoop configuration files (present in conf/*.xml).​ Thanks Best Regards On Fri, Jul 4, 2014 at 7:38 AM, Steven Cox s...@renci.org wrote: They weren't. They are now and the logs look a bit better - like perhaps some serialization is completing that wasn't

Re: Spark: All masters are unresponsive!

2014-07-08 Thread Akhil Das
Are you sure this is your master URL spark://pzxnvm2018:7077 ? You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left corner. Also make sure you are able to telnet pzxnvm2018 7077 from the machines where you are running the spark shell. Thanks Best Regards On Tue, Jul 8, 2014

Re: spark Driver

2014-07-09 Thread Akhil Das
Can you try setting SPARK_MASTER_IP in the spark-env.sh file? Thanks Best Regards On Wed, Jul 9, 2014 at 10:58 AM, amin mohebbi aminn_...@yahoo.com wrote: Hi all, I have one master and two slave node, I did not set any ip for spark driver because I thought it uses its default (

Re: spark Driver

2014-07-09 Thread Akhil Das
... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia H#x2F;P : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com On Wednesday, July 9, 2014 2:32 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Can you try setting

Re: Requirements for Spark cluster

2014-07-09 Thread Akhil Das
You can use the spark-ec2/bdutil scripts to set it up on the AWS/GCE cloud quickly. If you want to set it up on your own then these are the things that you will need to do: 1. Make sure you have java (7) installed on all machines. 2. Install and configure spark (add all slave nodes in

Re: Spark Streaming using File Stream in Java

2014-07-09 Thread Akhil Das
Try this out: JavaStreamingContext sc = new JavaStreamingContext(...);JavaDStreamString lines = ctx.fileStream(whatever);JavaDStreamString words = lines.flatMap( new FlatMapFunctionString, String() { public IterableString call(String s) { return Arrays.asList(s.split( )); } });

Re: Re: Pig 0.13, Spark, Spork

2014-07-09 Thread Akhil Das
Hi Bertrand, We've updated the document http://docs.sigmoidanalytics.com/index.php/Setting_up_spork_with_spark_0.9.0 This is our working Github repo https://github.com/sigmoidanalytics/spork/tree/spork-0.9 Feel free to open issues over here https://github.com/sigmoidanalytics/spork/issues

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-11 Thread Akhil Das
Easiest fix would be adding the kafka jars to the SparkContext while creating it. Thanks Best Regards On Fri, Jul 11, 2014 at 4:39 AM, Dilip dilip_ram...@hotmail.com wrote: Hi, I am trying to run a program with spark streaming using Kafka on a stand alone system. These are my details:

Re: Streaming. Cannot get socketTextStream to receive anything.

2014-07-11 Thread Akhil Das
You simply use the *nc* command to do this. like: nc -p 12345 will open the 12345 port and from the terminal you can provide whatever input you require for your StreamingCode. Thanks Best Regards On Fri, Jul 11, 2014 at 2:41 AM, kytay kaiyang@gmail.com wrote: Hi I am learning spark

Re: Streaming. Cannot get socketTextStream to receive anything.

2014-07-11 Thread Akhil Das
Sorry, the command is nc -lk 12345 Thanks Best Regards On Fri, Jul 11, 2014 at 6:46 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You simply use the *nc* command to do this. like: nc -p 12345 will open the 12345 port and from the terminal you can provide whatever input you require

Re: Streaming. Cannot get socketTextStream to receive anything.

2014-07-11 Thread Akhil Das
Can you try this piece of code? SparkConf sparkConf = new SparkConf().setAppName(JavaNetworkWordCount ); JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(1000)); JavaReceiverInputDStreamString lines = ssc.socketTextStream( args[0],

Re: can we insert and update with spark sql

2014-07-17 Thread Akhil Das
Is this what you are looking for? https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html According to the doc, it says Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This

Re: Errors accessing hdfs while in local mode

2014-07-17 Thread Akhil Das
You can try the following in the spark-shell: 1. Run it in *Clustermode* by going inside the spark directory: $ SPARK_MASTER=spark://masterip:7077 ./bin/spark-shell val textFile = sc.textFile(hdfs://masterip/data/blah.csv) textFile.take(10).foreach(println) 2. Now try running in *Localmode:*

Re: Pysparkshell are not listing in the web UI while running

2014-07-17 Thread Akhil Das
Hi Neethu, Your application is running on local mode and that's the reason why you are not seeing the driver app in the 8080 webUI. You can pass the Master IP to your pyspark and get it running in cluster mode. eg: IPYTHON_OPTS=notebook --pylab inline $SPARK_HOME/bin/pyspark --master

Re: Spark Streaming: no job has started yet

2014-07-23 Thread Akhil Das
Can you paste the piece of code? Thanks Best Regards On Wed, Jul 23, 2014 at 1:22 AM, Bill Jay bill.jaypeter...@gmail.com wrote: Hi all, I am running a spark streaming job. The job hangs on one stage, which shows as follows: Details for Stage 4 Summary Metrics No tasks have started

Re: How could I start new spark cluster with hadoop2.0.2

2014-07-23 Thread Akhil Das
AFAIK you can use the --hadoop-major-version parameter with the spark-ec2 https://github.com/apache/spark/blob/master/ec2/spark_ec2.py script to switch the hadoop version. Thanks Best Regards On Wed, Jul 23, 2014 at 6:07 AM, durga durgak...@gmail.com wrote: Hi, I am trying to create spark

Re: Down-scaling Spark on EC2 cluster

2014-07-23 Thread Akhil Das
Hi Currently this is not supported out of the Box. But you can of course add/remove workers in a running cluster. Better option would be to use a Mesos cluster where adding/removing nodes are quiet simple. But again, i believe adding new worker in the middle of a task won't give you better

Re: save to HDFS

2014-07-24 Thread Akhil Das
Are you sure the RDD that you were saving isn't empty!? Are you seeing a _SUCCESS file in this location? hdfs:// masteripaddress:9000/root/test-app/test1/ (Do hadoop fs -ls hdfs://masteripaddress:9000/root/test-app/test1/) Thanks Best Regards On Thu, Jul 24, 2014 at 4:24 PM, lmk

Re: Starting with spark

2014-07-24 Thread Akhil Das
Here's the complete overview http://spark.apache.org/docs/latest/ And Here's the quick start guidelines http://spark.apache.org/docs/latest/quick-start.html I would suggest you downloading the Spark pre-compiled binaries

Re: save to HDFS

2014-07-24 Thread Akhil Das
This piece of code saveAsHadoopFile[TextOutputFormat[NullWritable,Text]](hdfs:// masteripaddress:9000/root/test-app/test1/) Saves the RDD into HDFS, and yes you can physically see the files using the hadoop command (hadoop fs -ls /root/test-app/test1 - yes you need to login to the cluster). In

Re: rdd.saveAsTextFile blows up

2014-07-25 Thread Akhil Das
Most likely you are closing the connection with HDFS. Can you paste the piece of code that you are executing? We were having similar problem when we closed the FileSystem object in our code. Thanks Best Regards On Thu, Jul 24, 2014 at 11:00 PM, Eric Friedman eric.d.fried...@gmail.com wrote:

Re: EOFException when I list all files in hdfs directory

2014-07-25 Thread Akhil Das
Try without the * val avroRdd = sc.newAPIHadoopFile(hdfs://url:8020/my dir/, classOf[AvroSequenceFileInputFormat[AvroKey[GenericRecord],NullWritable]], classOf[AvroKey[GenericRecord]], classOf[NullWritable]) avroRdd.collect() Thanks Best Regards On Fri, Jul 25, 2014 at 7:22 PM, Sparky

Re: Debugging Task not serializable

2014-07-28 Thread Akhil Das
A quick fix would be to implement java.io.Serializable in those classes which are causing this exception. Thanks Best Regards On Mon, Jul 28, 2014 at 9:21 PM, Juan Rodríguez Hortalá juan.rodriguez.hort...@gmail.com wrote: Hi all, I was wondering if someone has conceived a method for

Re: the EC2 setup script often will not allow me to SSH into my machines. Ideas?

2014-07-30 Thread Akhil Das
You need to increase the wait time, (-w) the default is 120 seconds, you may set it to a higher number like 300-400. The problem is that EC2 takes some time to initiate the machine (which is 120 seconds sometimes.) Thanks Best Regards On Wed, Jul 30, 2014 at 8:52 PM, William Cox

Re: Hbase

2014-08-01 Thread Akhil Das
at 12:17 PM, Akhil Das ak...@sigmoidanalytics.com wrote: You can use a map function like the following and do whatever you want with the Result. FunctionTuple2ImmutableBytesWritable, Result, IteratorString{ public IteratorString call(Tuple2ImmutableBytesWritable, Result test

Re: Can't see any thing one the storage panel of application UI

2014-08-05 Thread Akhil Das
You need to use persist or cache those rdds to appear in the Storage. Unless you do it, those rdds will be computed again. Thanks Best Regards On Tue, Aug 5, 2014 at 8:03 AM, binbinbin915 binbinbin...@live.cn wrote: Actually, if you don’t use method like persist or cache, it even not store

Re: Spark Memory Issues

2014-08-05 Thread Akhil Das
Are you able to see the job on the WebUI (8080)? If yes, how much memory are you seeing there specifically for this job? [image: Inline image 1] Here you can see i have 11.8Gb RAM on both workers and my app is using 11GB. 1. What are all the memory that you are seeing in your case? 2. Make sure

Re: Spark shell creating a local SparkContext instead of connecting to connecting to Spark Master

2014-08-05 Thread Akhil Das
​You can always start your spark-shell by specifying the master as MASTER=spark://*whatever*:7077 $SPARK_HOME/bin/spark-shell​ Then it will connect to that *whatever* master. Thanks Best Regards On Tue, Aug 5, 2014 at 8:51 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: Hi

Re: Spark Memory Issues

2014-08-05 Thread Akhil Das
Write 1 add1 0 0.0 B / 1766.4 MB 0.0 B 0 0 0 0 0 ms 0.0 B 0.0 B 2add2 0 0.0 B / 1766.4 MB 0.0 B0 0 00 0 ms0.0 B 0.0 B driver add3 0 0.0 B / 294.6 MB 0.0 B 0 0 0 0 0 ms 0.0 B 0.0 B On Tue, Aug 5, 2014 at 11:32 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you able to see the job

Re: Spark Memory Issues

2014-08-05 Thread Akhil Das
Are you sure that you were not running SparkPi in local mode? Thanks Best Regards On Wed, Aug 6, 2014 at 12:43 AM, Sunny Khatri sunny.k...@gmail.com wrote: Well I was able to run the SparkPi, that also does the similar stuff, successfully. On Tue, Aug 5, 2014 at 11:52 AM, Akhil Das ak

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Akhil Das
Looks like a netty conflict there, most likely you are having mutiple versions of netty jars (eg: netty-3.6.6.Final.jar, netty-3.2.2.Final.jar, netty-all-4.0.13.Final.jar), you only require 3.6.6 i believe. a quick fix would be to remove the rest of them. Thanks Best Regards On Wed, Aug 6, 2014

Re: Spark with HBase

2014-08-07 Thread Akhil Das
You can download and compile spark against your existing hadoop version. Here's a quick start https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types You can also read a bit here http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster ( the

Re: Unable to access worker web UI or application UI (EC2)

2014-08-08 Thread Akhil Das
Could be some issues with the way you access it. If you are able to see http://master-ip-public-ip:8080 then ideally the application UI (if you havent changed the default) will be available on http://master-public-ip:4040, Similarly, you can see the worker UIs at http://worker-public-ip:8081

Re: ClassNotFound exception on class in uber.jar

2014-08-12 Thread Akhil Das
This is how i used to do it: *// Create a list of jars* ListString jars = Lists.newArrayList(/home/akhld/mobi/localcluster/x/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.2.0.jar,ADD-All-The-Jars-Here );

Re: Should the memory of worker nodes be constrained to the size of the master node?

2014-08-14 Thread Akhil Das
Hi Darin, This is the piece of code https://github.com/mesos/spark-ec2/blob/v3/deploy_templates.py doing the actual work (Setting the memory). As you can see, it leaves 15Gb of ram for OS on a 100Gb machine... 2Gb RAM on a 10-20Gb machine etc. You can always set

Re: OutOfMemory Error

2014-08-18 Thread Akhil Das
Hi Ghousia, You can try the following: 1. Increase the heap size https://spark.apache.org/docs/0.9.0/configuration.html 2. Increase the number of partitions http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine 3. You could try

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Akhil Das
Looks like your hiveContext is null. Have a look at this documentation. https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Thanks Best Regards On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: Hello: I am trying to setup Spark to

Re: OutOfMemory Error

2014-08-18 Thread Akhil Das
18, 2014 at 12:02 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Hi Ghousia, You can try the following: 1. Increase the heap size https://spark.apache.org/docs/0.9.0/configuration.html 2. Increase the number of partitions http://stackoverflow.com/questions/21698443/spark-best-practice

Re: a noob question for how to implement setup and cleanup in Spark map

2014-08-18 Thread Akhil Das
You can create an RDD and then you can do a map or mapPartitions on that where in the top you will create the database connection and all, then do the operations and at the end close the connections. Thanks Best Regards On Mon, Aug 18, 2014 at 12:34 PM, Henry Hung ythu...@winbond.com wrote:

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Akhil Das
:00 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Looks like your hiveContext is null. Have a look at this documentation. https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Thanks Best Regards On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce

Re: spark - reading hfds files every 5 minutes

2014-08-19 Thread Akhil Das
Spark Stre​aming https://spark.apache.org/docs/latest/streaming-programming-guide.html is the best fit for this use case. Basically you create a streaming context pointing to that directory, also you can set the streaming interval (in your case its 5 minutes). SparkStreaming will only process the

Re: Executor Memory, Task hangs

2014-08-19 Thread Akhil Das
Looks like 1 worker is doing the job. Can you repartition the RDD? Also what is the number of cores that you allocated? Things like this, you can easily identify by looking at the workers webUI (default worker:8081) Thanks Best Regards On Tue, Aug 19, 2014 at 6:35 PM, Laird, Benjamin

Re: How to pass env variables from master to executors within spark-shell

2014-08-21 Thread Akhil Das
One approach would be to set these environment variables inside the spark-env.sh in all workers then you can access them using the System.getEnv(WHATEVER) Thanks Best Regards On Wed, Aug 20, 2014 at 9:49 PM, Darin McBeath ddmcbe...@yahoo.com.invalid wrote: Can't seem to figure this out. I've

Re: OOM Java heap space error on saveAsTextFile

2014-08-22 Thread Akhil Das
What operation are you performing before doing the saveAsTextFile? If you are doing a groupBy/sortBy/mapPartition/reduceByKey operations then you can specify the number of partitions. We were facing these kind of problems and specifying the correct partition solved the issue. Thanks Best Regards

Re: amp lab spark streaming twitter example

2014-08-26 Thread Akhil Das
I think your *sparkUrl *points to an invalid cluster url. Just make sure you are giving the correct url (the one you see on top left in the master:8080 webUI). Thanks Best Regards On Tue, Aug 26, 2014 at 11:07 AM, Forest D dev24a...@gmail.com wrote: Hi Jonathan, Thanks for the reply. I ran

Re: Spark webUI - application details page

2014-08-26 Thread Akhil Das
Have a look at the history server, looks like you have enabled history server on your local and not on the remote server. http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/monitoring.html Thanks Best Regards On Tue, Aug 26, 2014 at 7:01 AM, SK skrishna...@gmail.com wrote: Hi, I am

Re: How do you hit breakpoints using IntelliJ In functions used by an RDD

2014-08-26 Thread Akhil Das
You need to run your app in localmode ( aka master=local[2]) to get it debugged locally. If you are running it on a cluster, then you can use the remote debugging feature. http://stackoverflow.com/questions/19128264/how-to-remote-debug-in-intellij-12-1-4 For remote debugging, you need to pass the

Re: Request for Help

2014-08-26 Thread Akhil Das
Hi Not sure this is the right way of doing it, but if you can create a PairRDDFunction from that RDD then you can use the following piece of code to access the filenames from the RDD. PairRDDFunctionsK, V ds = .; //getting the

Re: Spark Streaming Output to DB

2014-08-26 Thread Akhil Das
Yes, you can open a jdbc connection at the beginning of the map method then close this connection at the end of map() and in between you can use this connection. Thanks Best Regards On Tue, Aug 26, 2014 at 6:12 PM, Ravi Sharma raviprincesha...@gmail.com wrote: Hello People, I'm using java

Re: Spark Streaming Output to DB

2014-08-27 Thread Akhil Das
really impact your performance. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, Aug 26, 2014 at 6:45 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes, you can open a jdbc connection at the beginning

Re: Example File not running

2014-08-27 Thread Akhil Das
The statement java.io.IOException: Could not locate executable null\bin\winutils.exe explains that the null is received when expanding or replacing an Environment Variable. I'm guessing that you are missing *HADOOP_HOME* in the environment variables. Thanks Best Regards On Wed, Aug 27, 2014

Re: Example File not running

2014-08-27 Thread Akhil Das
of that environment variable? I want to run the scripts locally on my machine and do not have any Hadoop installed. Thank you *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Mittwoch, 27. August 2014 12:54 *To:* Hingorani, Vineet *Cc:* user@spark.apache.org *Subject:* Re: Example

Re: Example File not running

2014-08-27 Thread Akhil Das
am running it on local machine and it is not able to find some dependencies of Hadoop. Please tell me what file should I download to work on my local machine (pre-built, so that I don’t have to build it again). *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] *Sent:* Mittwoch, 27

Re: org.apache.spark.examples.xxx

2014-08-30 Thread Akhil Das
It bundles all these src's https://github.com/apache/spark/tree/master/examples together and also it uses the pom file to get the dependencies list if I'm not wrong. Thanks Best Regards On Fri, Aug 29, 2014 at 12:39 AM, filipus floe...@gmail.com wrote: hey guys i still try to get used to

Re: Web UI

2014-09-04 Thread Akhil Das
Hi You can see this doc https://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security for all the available webUI ports. Yes there are ways to get the data metrics in Json format, One of them is below: *​​http://webUI:8080/json/ http://webUI:8080/json/* Or

Re: Task not serializable

2014-09-05 Thread Akhil Das
You can bring those classes out of the library and Serialize it (implements Serializable). It is not the right way of doing it though it solved few of my similar problems. Thanks Best Regards On Fri, Sep 5, 2014 at 7:36 PM, Sarath Chandra sarathchandra.jos...@algofusiontech.com wrote: Hi,

Re: Global Variables in Spark Streaming

2014-09-10 Thread Akhil Das
the value across the cluster. Please correct me if I'm wrong. Thanks, Cheers, Ravi Sharma On Wed, Sep 10, 2014 at 7:31 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Have a look at Broadcasting variables http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables

Re: Spark not installed + no access to web UI

2014-09-11 Thread Akhil Das
Which version of spark are you having? Thanks Best Regards On Thu, Sep 11, 2014 at 3:10 PM, mrm ma...@skimlinks.com wrote: Hi, I have been launching Spark in the same ways for the past months, but I have only recently started to have problems with it. I launch Spark using spark-ec2

Re: Unpersist

2014-09-11 Thread Akhil Das
like this? var temp = ... for (i - num) { temp = .. { do something } temp.unpersist() } Thanks Best Regards On Thu, Sep 11, 2014 at 3:26 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: I want to create a temporary variables in a spark code. Can I do this? for (i - num) {

Re: Network requirements between Driver, Master, and Slave

2014-09-12 Thread Akhil Das
Hi Jim, This approach will not work right out of the box. You need to understand a few things. A driver program and the master will be communicating with each other, for that you need to open up certain ports for your public ip (Read about port forwarding http://portforward.com/). Also on the

Re: Error Driver disassociated while running the spark job

2014-09-12 Thread Akhil Das
What is your system setup? Can you paste the spark-env.sh? Looks like you have some issues with your configuration. Thanks Best Regards On Fri, Sep 12, 2014 at 6:31 PM, 남윤민 rony...@dgist.ac.kr wrote: I got this error from the executor's stderr: Using Spark's default log4j profile:

Re: Driver fail with out of memory exception

2014-09-14 Thread Akhil Das
Try increasing the number of partitions while doing a reduceByKey() http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD Thanks Best Regards On Sun, Sep 14, 2014 at 5:11 PM, richiesgr richie...@gmail.com wrote: Hi I've written a job (I think not very

Re: Broadcast error

2014-09-15 Thread Akhil Das
up. :-( On Mon, Sep 15, 2014 at 1:20 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Can you give this a try: conf = SparkConf().set(spark.executor.memory, 32G)*.set(spark.akka.frameSize , 1000).set(spark.broadcast.factory,org.apache.spark.broadcast.TorrentBroadcastFactory)* sc

Re: CPU RAM

2014-09-17 Thread Akhil Das
Ganglia does give you a cluster wide and per machine utilization of resources, but i don't think it gives your per Spark Job. If you want to build something from scratch then you can follow up like : 1. Login to the machine 2. Get the PIDs 3. For network IO per process, you can have a look at

Re: collect on hadoopFile RDD returns wrong results

2014-09-17 Thread Akhil Das
Can you dump out a small piece of data? while doing rdd.collect and rdd.foreach(println) Thanks Best Regards On Wed, Sep 17, 2014 at 12:26 PM, vasiliy zadonsk...@gmail.com wrote: it also appears in streaming hdfs fileStream -- View this message in context:

  1   2   3   4   5   6   7   8   9   10   >