output the datas(txt)

2016-02-27 Thread Bonsen
I get results from RDDs, like : Array(Array(1,2,3),Array(2,3,4),Array(3,4,6)) how can I output them to 1.txt like : 1 2 3 2 3 4 3 4 6 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/output-the-datas-txt-tp26350.html Sent from the Apache Spark User List mail

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar
no particular reason. just wanted to know if there was another way as well. thanks On Saturday, 27 February 2016, 22:12, Yin Yang wrote: Is there particular reason you cannot use temporary table ? Thanks On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote: Thank you sir. Can one do thi

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang
Is there particular reason you cannot use temporary table ? Thanks On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote: > Thank you sir. > > Can one do this sorting without using temporary table if possible? > > Best > > > On Saturday, 27 February 2016, 18:50, Yin Yang wrote: > > > scala> Seq

Re: Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari
Thanks much Amit, Sebastian. It worked. Regards, ~Vinti On Sat, Feb 27, 2016 at 12:44 PM, Amit Assudani wrote: > Your context is not being created using checkpoints, use get or create, > > From: Vinti Maheshwari > Date: Saturday, February 27, 2016 at 3:28 PM > To: user > Subject: Spark stream

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-27 Thread Abhishek Anand
Hi Ryan, I am using mapWithState after doing reduceByKey. I am right now using mapWithState as you suggested and triggering the count manually. But, still unable to see any checkpointing taking place. In the DAG I can see that the reduceByKey operation for the previous batches are also being com

Re: Spark streaming not remembering previous state

2016-02-27 Thread Amit Assudani
Your context is not being created using checkpoints, use get or create, From: Vinti Maheshwari mailto:vinti.u...@gmail.com>> Date: Saturday, February 27, 2016 at 3:28 PM To: user mailto:user@spark.apache.org>> Subject: Spark streaming not remembering previous state Hi All, I wrote spark streamin

Re: Spark streaming not remembering previous state

2016-02-27 Thread Sebastian Piu
Here: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala On Sat, 27 Feb 2016, 20:42 Sebastian Piu, wrote: > You need to create the streaming context using an existing checkpoint for > it to work > > See sample

Re: Spark streaming not remembering previous state

2016-02-27 Thread Sebastian Piu
You need to create the streaming context using an existing checkpoint for it to work See sample here On Sat, 27 Feb 2016, 20:28 Vinti Maheshwari, wrote: > Hi All, > > I wrote spark streaming program with stateful transformation. > It seems like my spark streaming application is doing computatio

Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari
Hi All, I wrote spark streaming program with stateful transformation. It seems like my spark streaming application is doing computation correctly with check pointing. But i terminate my program and i start it again, it's not reading the previous checkpointing data and staring from the beginning. I

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar
Thank you sir. Can one do this sorting without using temporary table if possible? Best On Saturday, 27 February 2016, 18:50, Yin Yang wrote: scala>  Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from te

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang
scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test order by b") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| ++

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang
Is this what you look for ? scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| +

Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar
Hello, I like to be able to solve this using arrays. I have two dimensional array of (String,Int) with 5  entries say arr("A",20), arr("B",13), arr("C", 18), arr("D",10), arr("E",19) I like to write a small code to order these in the order of highest Int column so I will have arr("A",20), arr("E

RE: Get all vertexes with outDegree equals to 0 with GraphX

2016-02-27 Thread Mohammed Guller
Perhaps, the documentation of the filter method would help. Here is the method signature (copied from the API doc) def filter[VD2, ED2](preprocess: (Graph[VD, ED]) => Graph[VD2, ED2], epred: (EdgeTriplet[VD2, ED2]) => Boolean = (x: EdgeTriplet[VD2, ED2]) => true, vpred: (VertexId, VD2) => Bool

Re: .cache() changes contents of RDD

2016-02-27 Thread Sabarish Sasidharan
This is because Hadoop writables are being reused. Just map it to some custom type and then do further operations including cache() on it. Regards Sab On 27-Feb-2016 9:11 am, "Yan Yang" wrote: > Hi > > I am pretty new to Spark, and after experimentation on our pipelines. I > ran into this weird

Re: Get all vertexes with outDegree equals to 0 with GraphX

2016-02-27 Thread Guillermo Ortiz
Thank you, I have to think what the code does,, because I am a little noob in scala and it's hard to understand it to me. 2016-02-27 3:53 GMT+01:00 Mohammed Guller : > Here is another solution (minGraph is the graph from your code. I assume > that is your original graph): > > > > val graphWithNoO

Re: Restrictions on SQL operations on Spark temporary tables

2016-02-27 Thread 刘虓
Hi, For now Spark-sql does not support subquery,I guess that's the reason your query fails 2016-02-27 20:01 GMT+08:00 Mich Talebzadeh : > It appeas that certain SQL on Spark temporary tables do not support Hive > SQL even when they are using HiveContext > > example > > scala> HiveContext.sql("sel

deal with datas' structure

2016-02-27 Thread Bonsen
Now,I have a map val ji = scala.collection.mutable.Map[String,scala.collection.mutable.ArrayBuffer[String]]() there are so many datas like: ji = map("a"->ArrayBuffer["1","2","3"],"b"->ArrayBuffer["1","2","3"],"c"->ArrayBuffer["2","3"]) if "a" choose "1","b" and "c" can't choose "1", for ex

Restrictions on SQL operations on Spark temporary tables

2016-02-27 Thread Mich Talebzadeh
It appeas that certain SQL on Spark temporary tables do not support Hive SQL even when they are using HiveContext example scala> HiveContext.sql("select count(1) from tmp where ID in (select max(id) from tmp)") org.apache.spark.sql.AnalysisException: Unsupported language features in query: selec

Re: .cache() changes contents of RDD

2016-02-27 Thread Igor Berman
are you using avro format by any chance? there is some formats that need to be "deep"-copy before caching or aggregating try something like val input = sc.newAPIHadoopRDD(...) val rdd = input.map(deepCopyTransformation).map(...) rdd.cache() rdd.saveAsTextFile(...) where deepCopyTransformation is f

Re: DirectFileOutputCommiter

2016-02-27 Thread Igor Berman
Hi Reynold, thanks for the response Yes, speculation mode needs some coordination. Regarding job failure : correct me if I wrong - if one of jobs fails - client code will be sort of "notified" by exception or something similar, so the client can decide to re-submit action(job), i.e. it won't be "si

2 tables join happens at Hive but not in spark

2016-02-27 Thread Sandeep Khurana
Hello We have 2 tables (tab1, tab2) exposed using hive. The data is in different hdfs folders. We are trying to join these 2 tables on certain single column using sparkR join. But inspite of join columns having same values, it returns zero rows. But when I run the same join sql in hive, from hiv

Re: Is spark.driver.maxResultSize used correctly ?

2016-02-27 Thread Reynold Xin
But sometimes you might have skew and almost all the result data are in one or a few tasks though. On Friday, February 26, 2016, Jeff Zhang wrote: > > My job get this exception very easily even when I set large value of > spark.driver.maxResultSize. After checking the spark code, I found > spark

Re: Clarification on RDD

2016-02-27 Thread Mich Talebzadeh
Hi, The data (in this case example README.md) is kept in Hadoop Distributed File System (HDFS) among all datanodes in Hadoop cluster. The metadata that is used to get info about the storage of this file is kept in namenode. Your data is always stored in HDFS. Spark is an application that can acce