date:20160227

output the datas(txt)

2016-02-27 Thread Bonsen

I get results from RDDs, like : Array(Array(1,2,3),Array(2,3,4),Array(3,4,6)) how can I output them to 1.txt like : 1 2 3 2 3 4 3 4 6 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/output-the-datas-txt-tp26350.html Sent from the Apache Spark User List mail

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar

no particular reason. just wanted to know if there was another way as well. thanks On Saturday, 27 February 2016, 22:12, Yin Yang wrote: Is there particular reason you cannot use temporary table ? Thanks On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote: Thank you sir. Can one do thi

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

Is there particular reason you cannot use temporary table ? Thanks On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote: > Thank you sir. > > Can one do this sorting without using temporary table if possible? > > Best > > > On Saturday, 27 February 2016, 18:50, Yin Yang wrote: > > > scala> Seq

Re: Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari

Thanks much Amit, Sebastian. It worked. Regards, ~Vinti On Sat, Feb 27, 2016 at 12:44 PM, Amit Assudani wrote: > Your context is not being created using checkpoints, use get or create, > > From: Vinti Maheshwari > Date: Saturday, February 27, 2016 at 3:28 PM > To: user > Subject: Spark stream

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-27 Thread Abhishek Anand

Hi Ryan, I am using mapWithState after doing reduceByKey. I am right now using mapWithState as you suggested and triggering the count manually. But, still unable to see any checkpointing taking place. In the DAG I can see that the reduceByKey operation for the previous batches are also being com

Re: Spark streaming not remembering previous state

2016-02-27 Thread Amit Assudani

Your context is not being created using checkpoints, use get or create, From: Vinti Maheshwari mailto:vinti.u...@gmail.com>> Date: Saturday, February 27, 2016 at 3:28 PM To: user mailto:user@spark.apache.org>> Subject: Spark streaming not remembering previous state Hi All, I wrote spark streamin

Re: Spark streaming not remembering previous state

2016-02-27 Thread Sebastian Piu

Here: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala On Sat, 27 Feb 2016, 20:42 Sebastian Piu, wrote: > You need to create the streaming context using an existing checkpoint for > it to work > > See sample

Re: Spark streaming not remembering previous state

2016-02-27 Thread Sebastian Piu

You need to create the streaming context using an existing checkpoint for it to work See sample here On Sat, 27 Feb 2016, 20:28 Vinti Maheshwari, wrote: > Hi All, > > I wrote spark streaming program with stateful transformation. > It seems like my spark streaming application is doing computatio

Spark streaming not remembering previous state

2016-02-27 Thread Vinti Maheshwari

Hi All, I wrote spark streaming program with stateful transformation. It seems like my spark streaming application is doing computation correctly with check pointing. But i terminate my program and i start it again, it's not reading the previous checkpointing data and staring from the beginning. I

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar

Thank you sir. Can one do this sorting without using temporary table if possible? Best On Saturday, 27 February 2016, 18:50, Yin Yang wrote: scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from te

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test order by b") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| ++

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

Is this what you look for ? scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| +

Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Ashok Kumar

Hello, I like to be able to solve this using arrays. I have two dimensional array of (String,Int) with 5 entries say arr("A",20), arr("B",13), arr("C", 18), arr("D",10), arr("E",19) I like to write a small code to order these in the order of highest Int column so I will have arr("A",20), arr("E

RE: Get all vertexes with outDegree equals to 0 with GraphX

2016-02-27 Thread Mohammed Guller

Perhaps, the documentation of the filter method would help. Here is the method signature (copied from the API doc) def filter[VD2, ED2](preprocess: (Graph[VD, ED]) => Graph[VD2, ED2], epred: (EdgeTriplet[VD2, ED2]) => Boolean = (x: EdgeTriplet[VD2, ED2]) => true, vpred: (VertexId, VD2) => Bool

Re: .cache() changes contents of RDD

2016-02-27 Thread Sabarish Sasidharan

This is because Hadoop writables are being reused. Just map it to some custom type and then do further operations including cache() on it. Regards Sab On 27-Feb-2016 9:11 am, "Yan Yang" wrote: > Hi > > I am pretty new to Spark, and after experimentation on our pipelines. I > ran into this weird

Re: Get all vertexes with outDegree equals to 0 with GraphX

2016-02-27 Thread Guillermo Ortiz

Thank you, I have to think what the code does,, because I am a little noob in scala and it's hard to understand it to me. 2016-02-27 3:53 GMT+01:00 Mohammed Guller : > Here is another solution (minGraph is the graph from your code. I assume > that is your original graph): > > > > val graphWithNoO

Re: Restrictions on SQL operations on Spark temporary tables

2016-02-27 Thread 刘虓

Hi, For now Spark-sql does not support subquery,I guess that's the reason your query fails 2016-02-27 20:01 GMT+08:00 Mich Talebzadeh : > It appeas that certain SQL on Spark temporary tables do not support Hive > SQL even when they are using HiveContext > > example > > scala> HiveContext.sql("sel

deal with datas' structure

2016-02-27 Thread Bonsen

Now,I have a map val ji = scala.collection.mutable.Map[String,scala.collection.mutable.ArrayBuffer[String]]() there are so many datas like: ji = map("a"->ArrayBuffer["1","2","3"],"b"->ArrayBuffer["1","2","3"],"c"->ArrayBuffer["2","3"]) if "a" choose "1","b" and "c" can't choose "1", for ex

Restrictions on SQL operations on Spark temporary tables

2016-02-27 Thread Mich Talebzadeh

It appeas that certain SQL on Spark temporary tables do not support Hive SQL even when they are using HiveContext example scala> HiveContext.sql("select count(1) from tmp where ID in (select max(id) from tmp)") org.apache.spark.sql.AnalysisException: Unsupported language features in query: selec

Re: .cache() changes contents of RDD

2016-02-27 Thread Igor Berman

are you using avro format by any chance? there is some formats that need to be "deep"-copy before caching or aggregating try something like val input = sc.newAPIHadoopRDD(...) val rdd = input.map(deepCopyTransformation).map(...) rdd.cache() rdd.saveAsTextFile(...) where deepCopyTransformation is f

Re: DirectFileOutputCommiter

2016-02-27 Thread Igor Berman

Hi Reynold, thanks for the response Yes, speculation mode needs some coordination. Regarding job failure : correct me if I wrong - if one of jobs fails - client code will be sort of "notified" by exception or something similar, so the client can decide to re-submit action(job), i.e. it won't be "si

2 tables join happens at Hive but not in spark

2016-02-27 Thread Sandeep Khurana

Hello We have 2 tables (tab1, tab2) exposed using hive. The data is in different hdfs folders. We are trying to join these 2 tables on certain single column using sparkR join. But inspite of join columns having same values, it returns zero rows. But when I run the same join sql in hive, from hiv

Re: Is spark.driver.maxResultSize used correctly ?

2016-02-27 Thread Reynold Xin

But sometimes you might have skew and almost all the result data are in one or a few tasks though. On Friday, February 26, 2016, Jeff Zhang wrote: > > My job get this exception very easily even when I set large value of > spark.driver.maxResultSize. After checking the spark code, I found > spark

Re: Clarification on RDD

2016-02-27 Thread Mich Talebzadeh

Hi, The data (in this case example README.md) is kept in Hadoop Distributed File System (HDFS) among all datanodes in Hadoop cluster. The metadata that is used to get info about the storage of this file is kept in namenode. Your data is always stored in HDFS. Spark is an application that can acce

output the datas(txt)

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Spark streaming not remembering previous state

Re: Stateful Operation on JavaPairDStream Help Needed !!

Re: Spark streaming not remembering previous state

Re: Spark streaming not remembering previous state

Re: Spark streaming not remembering previous state

Spark streaming not remembering previous state

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Ordering two dimensional arrays of (String, Int) in the order of second element

RE: Get all vertexes with outDegree equals to 0 with GraphX

Re: .cache() changes contents of RDD

Re: Get all vertexes with outDegree equals to 0 with GraphX

Re: Restrictions on SQL operations on Spark temporary tables

deal with datas' structure

Restrictions on SQL operations on Spark temporary tables

Re: .cache() changes contents of RDD

Re: DirectFileOutputCommiter

2 tables join happens at Hive but not in spark

Re: Is spark.driver.maxResultSize used correctly ?

Re: Clarification on RDD

24 matches

Site Navigation

Mail list logo

Footer information