How to sort rdd filled with existing data structures?

2014-09-24 Thread Tao Xiao
Hi , I have the following rdd : val conf = new SparkConf() .setAppName("<< Testing Sorting >>") val sc = new SparkContext(conf) val L = List( (new Student("XiaoTao", 80, 29), "I'm Xiaotao"), (new Student("CCC", 100, 24), "I'm CCC"), (new Student("Jack", 90,

Re: All-time stream re-processing

2014-09-24 Thread Tobias Pfeiffer
Hi, On Wed, Sep 24, 2014 at 7:23 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > So you have a single Kafka topic which has very high retention period ( > that decides the storage capacity of a given Kafka topic) and you want to > process all historical data first using Camus

Re: sortByKey trouble

2014-09-24 Thread david
thank's i've already try this solution but it does not compile (in Eclipse) I'm surprise to see that in Spark-shell, sortByKey works fine on 2 solutions : (String,String,String,String) (String,(String,String,String)) -- View this message in context: http://apache-spark-user-list.1

java.lang.NumberFormatException while starting spark-worker

2014-09-24 Thread jishnu.prathap
Hi , I am getting this weird error while starting Worker. -bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker spark://osebi-UServer:59468 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/09/24 16:22:04 INFO worker.Worker: Registered signal handle

java.lang.NumberFormatException while starting spark-worker

2014-09-24 Thread jishnu.prathap
Hi , I am getting this weird error while starting Worker. -bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker spark://osebi-UServer:59468 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/09/24 16:22:04 INFO worker.Worker: Registered signal handle

RE: java.lang.NumberFormatException while starting spark-worker

2014-09-24 Thread jishnu.prathap
Hi , I am getting this weird error while starting Worker. -bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker spark://osebi-UServer:59468 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/09/24 16:22:04 INFO worker.Worker: Registered signal handl

Re: java.lang.NumberFormatException while starting spark-worker

2014-09-24 Thread jishnu.prathap
Hi , I am getting this weird error while starting Worker. -bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker spark://osebi-UServer:59468 Spark assembly has been built with Hive, including Datanucleus jars on classpath 14/09/24 16:22:04 INFO worker.Worker: Registered signal handle

Does Spark Driver works with HDFS in HA mode

2014-09-24 Thread Petr Novak
Hello, if our Hadoop cluster is configured with HA and "fs.defaultFS" points to a namespace instead of a namenode hostname - hdfs:/// - then our Spark job fails with exception. Is there anything to configure or it is not implemented? Exception in thread "main" org.apache.spark.SparkException: Job

find subgraph in Graphx

2014-09-24 Thread uuree
Hi, I want to extract a subgraph using two clauses such as (?person worksAt MIT && MIT locatesIn USA), I have this query. My data is constructed as triplets so I assume Graphx will suit it best. But I found out that subgraph method only supports a single edge checking and it seems to me that I cann

Spark Streaming

2014-09-24 Thread Reddy Raja
Given this program.. I have the following queries.. val sparkConf = new SparkConf().setAppName("NetworkWordCount") sparkConf.set("spark.master", "local[2]") val ssc = new StreamingContext(sparkConf, Seconds(10)) val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.

All-time stream re-processing

2014-09-24 Thread Tobias Pfeiffer
Hi, I have a setup (in mind) where data is written to Kafka and this data is persisted in HDFS (e.g., using camus) so that I have an all-time archive of all stream data ever received. Now I want to process that all-time archive and when I am done with that, continue with the live stream, using Spa

Re: sortByKey trouble

2014-09-24 Thread Liquan Pei
Hi David, Can you try val rddToSave = file.map(l => l.split("\\|")).map(r => (r(34)+"-"+r(3), (r(4), r(10), r(12 ? That should work. Liquan On Wed, Sep 24, 2014 at 1:29 AM, david wrote: > Hi, > > Does anybody know how to use sortbykey in scala on a RDD like : > > val rddToSave = fi

RE: sortByKey trouble

2014-09-24 Thread Shao, Saisai
Hi, SortByKey is only for RDD[(K, V)], each tuple can only has two members, Spark will sort with first member, if you want to use sortByKey, you have to change your RDD[(String, String, String, String)] into RDD[(String, (String, String, String))]. Thanks Jerry -Original Message- From

sortByKey trouble

2014-09-24 Thread david
Hi, Does anybody know how to use sortbykey in scala on a RDD like : val rddToSave = file.map(l => l.split("\\|")).map(r => (r(34)+"-"+r(3), r(4), r(10), r(12))) besauce, i received ann error "sortByKey is not a member of ord.apache.spark.rdd.RDD[(String,String,String,String)]. What i t

Re: Converting one RDD to another

2014-09-24 Thread Sean Owen
top returns the specified number of "largest" elements in your RDD. They are returned to the driver as an Array. If you want to make an RDD out of them again, call SparkContext.parallelize(...). Make sure this is what you mean though. On Wed, Sep 24, 2014 at 5:33 AM, Deep Pradhan wrote: > Hi, > I

Re: HdfsWordCount only counts some of the words

2014-09-24 Thread Sean Owen
If you look at the code for HdfsWordCount, you see it calls print(), which defaults to print 10 elements from each RDD. If you are just talking about the console output, then it is not expected to print all words to begin with. On Wed, Sep 24, 2014 at 2:29 AM, SK wrote: > > I execute it as follow

[Streaming] Non-blocking recommendation in custom receiver documentation and KinesisReceiver's worker.run blocking calll

2014-09-24 Thread Aniket Bhatnagar
Hi all Reading through Spark streaming's custom receiver documentation, it is recommended that onStart and onStop methods should not block indefinitely. However, looking at the source code of KinesisReceiver, the onStart method calls worker.run that blocks until worker is shutdown (via a call to o

<    1   2