Re: Hadoop MapReduce on Spark

2014-02-01 Thread Ashish Rangole
I can see how that would be a valid use case. A lot of folks have code written using Hadoop MR apis or other layers that use them. It will help those Dev teams in migrating those apps to Spark if such a translation Layer was available On Feb 1, 2014 5:01 PM, Ankur Chauhan achau...@brightcove.com

Re: What could be the cause of this Streaming error

2014-01-28 Thread Ashish Rangole
I am using 2.9.3. On Jan 27, 2014 11:50 PM, Khanderao kand khanderao.k...@gmail.com wrote: Scala version changed in 0.9.0 to Scala 2.10 Are you using the same version? On Tue, Jan 28, 2014 at 11:30 AM, Ashish Rangole arang...@gmail.comwrote: Hi, I am seeing the following error message

What could be the cause of this Streaming error

2014-01-27 Thread Ashish Rangole
Hi, I am seeing the following error message when I began testing my Streaming application locally. Could it be due to a mismatch with old spark jars somewhere or is this something else? Thanks, Ashish SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in

Re: Spark streaming vs. spark usage

2013-12-18 Thread Ashish Rangole
I wonder if it will help to have a generic Monad container that wraps either RDD or DStream and provides map, flatmap, foreach and filter methods. case class DataMonad[A](data: A) { def map[B]( f : A = B ) : DataMonad[B] = { DataMonad( f( data ) ) } def flatMap[B]( f : A =

Re: data storage formats

2013-12-09 Thread Ashish Rangole
You can compress a csv or tab delimited file as well :) You can specify the codec of your choice, say snappy, when writing out. That's what we do. You can also write out data as sequence files. RCFile should also be possible given the flexibility of Spark API but we haven't tried that. On Dec 7,

Re: JavaRDD, Specify number of tasks

2013-12-09 Thread Ashish Rangole
that parameter: http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.api.java.JavaRDD -- *From:* Ashish Rangole [arang...@gmail.com] *Sent:* Monday, December 09, 2013 7:41 PM *To:* user@spark.incubator.apache.org *Subject:* Re

Re: JavaRDD, Specify number of tasks

2013-12-09 Thread Ashish Rangole
/index.html#org.apache.spark.api.java.JavaPairRDD -- *From:* Ashish Rangole [arang...@gmail.com] *Sent:* Monday, December 09, 2013 7:41 PM *To:* user@spark.incubator.apache.org *Subject:* Re: JavaRDD, Specify number of tasks AFAIK yes. IIRC, there is a 2nd parameter

Re: Hadoop RDD incorrect data

2013-12-09 Thread Ashish Rangole
That data size is sufficiently small for the cluster configuration that you mention. Are you doing the sort in local mode or on master only? Is the default parallelism system property being set prior to creating SparkContext? On Mon, Dec 9, 2013 at 10:45 PM, Matt Cheah mch...@palantir.com wrote:

Re: Splitting into partitions and sorting the partitions ... how to do that?

2013-12-04 Thread Ashish Rangole
I am not sure if 32 partitions is a hard limit that you have. Unless you have a strong reason to use only 32 partitions, please try providing the second optional argument (numPartitions) to reduceByKey and sortByKey methods which will paralellize these Reduce operations. A number 3x the number of

Re: Could not find resource path for Web UI: org/apache/spark/ui/static

2013-11-29 Thread Ashish Rangole
I am sure you have already checked this, any chance the classpath has v 0.7.x jars in it? On Nov 29, 2013 4:40 PM, Walrus theCat walrusthe...@gmail.com wrote: The full context isn't much -- this is the first thing I do in my main method (assign a value to sc), and it throws this error. On

Re: step-by-step recipe for running spark 0.8 on ec2

2013-11-25 Thread Ashish Rangole
Hi Walrus theCat, We have been successfully using Spark 0.8 on EC2 ever since it was released and we do this several times a day. We use spark-ec2.py with the new version option (--spark-version=0.8.0), to spin-up the Spark 0.8 cluster on ec2. The key is to use the new spark-ec2.py and not the

Re: Any help in this exception 'org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4'

2013-11-22 Thread Ashish Rangole
You likely have a different jar version between client and server. See the URL below for a similar problem to give you some idea: http://hbase.apache.org/book.html#client_dependencies On Fri, Nov 22, 2013 at 8:58 AM, Sriram Ramachandrasekaran sri.ram...@gmail.com wrote: It's a client and

Re: Troubleshooting and how to interpret the logs

2013-10-04 Thread Ashish Rangole
in SPARK_JAVA_OPTS to log the lengths of GC pauses. Matei On Oct 3, 2013, at 1:10 PM, Ashish Rangole arang...@gmail.com wrote: Hi, Trying to figure out what does it mean when the application (driver program) logs end with the the lines like the ones below. This is with the application

Troubleshooting and how to interpret the logs

2013-10-03 Thread Ashish Rangole
Hi, Trying to figure out what does it mean when the application (driver program) logs end with the the lines like the ones below. This is with the application running on Spark 0.8.0 on EC2. Any help will be greatly appreciated. Thanks! 13/10/03 16:17:33 INFO cluster.ClusterTaskSetManager: