I get results from RDDs,
like :
Array(Array(1,2,3),Array(2,3,4),Array(3,4,6))
how can I output them to 1.txt
like :
1 2 3
2 3 4
3 4 6
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/output-the-datas-txt-tp26350.html
Sent from the Apache Spark User List mail
no particular reason. just wanted to know if there was another way as well.
thanks
On Saturday, 27 February 2016, 22:12, Yin Yang wrote:
Is there particular reason you cannot use temporary table ?
Thanks
On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote:
Thank you sir.
Can one do thi
Is there particular reason you cannot use temporary table ?
Thanks
On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote:
> Thank you sir.
>
> Can one do this sorting without using temporary table if possible?
>
> Best
>
>
> On Saturday, 27 February 2016, 18:50, Yin Yang wrote:
>
>
> scala> Seq
Thanks much Amit, Sebastian. It worked.
Regards,
~Vinti
On Sat, Feb 27, 2016 at 12:44 PM, Amit Assudani
wrote:
> Your context is not being created using checkpoints, use get or create,
>
> From: Vinti Maheshwari
> Date: Saturday, February 27, 2016 at 3:28 PM
> To: user
> Subject: Spark stream
Hi Ryan,
I am using mapWithState after doing reduceByKey.
I am right now using mapWithState as you suggested and triggering the count
manually.
But, still unable to see any checkpointing taking place. In the DAG I can
see that the reduceByKey operation for the previous batches are also being
com
Your context is not being created using checkpoints, use get or create,
From: Vinti Maheshwari mailto:vinti.u...@gmail.com>>
Date: Saturday, February 27, 2016 at 3:28 PM
To: user mailto:user@spark.apache.org>>
Subject: Spark streaming not remembering previous state
Hi All,
I wrote spark streamin
Here:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala
On Sat, 27 Feb 2016, 20:42 Sebastian Piu, wrote:
> You need to create the streaming context using an existing checkpoint for
> it to work
>
> See sample
You need to create the streaming context using an existing checkpoint for
it to work
See sample here
On Sat, 27 Feb 2016, 20:28 Vinti Maheshwari, wrote:
> Hi All,
>
> I wrote spark streaming program with stateful transformation.
> It seems like my spark streaming application is doing computatio
Hi All,
I wrote spark streaming program with stateful transformation.
It seems like my spark streaming application is doing computation correctly
with check pointing.
But i terminate my program and i start it again, it's not reading the
previous checkpointing data and staring from the beginning. I
Thank you sir.
Can one do this sorting without using temporary table if possible?
Best
On Saturday, 27 February 2016, 18:50, Yin Yang wrote:
scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT struct(id, b, a) from te
scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT struct(id, b, a) from test order by b")
df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct]
scala> df.show
++
|struct(id, b, a)|
++
Is this what you look for ?
scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT struct(id, b, a) from test")
df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct]
scala> df.show
++
|struct(id, b, a)|
+
Hello,
I like to be able to solve this using arrays.
I have two dimensional array of (String,Int) with 5 entries say arr("A",20),
arr("B",13), arr("C", 18), arr("D",10), arr("E",19)
I like to write a small code to order these in the order of highest Int column
so I will have arr("A",20), arr("E
Perhaps, the documentation of the filter method would help. Here is the method
signature (copied from the API doc)
def filter[VD2, ED2](preprocess: (Graph[VD, ED]) => Graph[VD2, ED2], epred:
(EdgeTriplet[VD2, ED2]) => Boolean = (x: EdgeTriplet[VD2, ED2]) => true, vpred:
(VertexId, VD2) => Bool
This is because Hadoop writables are being reused. Just map it to some
custom type and then do further operations including cache() on it.
Regards
Sab
On 27-Feb-2016 9:11 am, "Yan Yang" wrote:
> Hi
>
> I am pretty new to Spark, and after experimentation on our pipelines. I
> ran into this weird
Thank you, I have to think what the code does,, because I am a little noob
in scala and it's hard to understand it to me.
2016-02-27 3:53 GMT+01:00 Mohammed Guller :
> Here is another solution (minGraph is the graph from your code. I assume
> that is your original graph):
>
>
>
> val graphWithNoO
Hi,
For now Spark-sql does not support subquery,I guess that's the reason your
query fails
2016-02-27 20:01 GMT+08:00 Mich Talebzadeh :
> It appeas that certain SQL on Spark temporary tables do not support Hive
> SQL even when they are using HiveContext
>
> example
>
> scala> HiveContext.sql("sel
Now,I have a map
val ji =
scala.collection.mutable.Map[String,scala.collection.mutable.ArrayBuffer[String]]()
there are so many datas like:
ji =
map("a"->ArrayBuffer["1","2","3"],"b"->ArrayBuffer["1","2","3"],"c"->ArrayBuffer["2","3"])
if "a" choose "1","b" and "c" can't choose "1",
for ex
It appeas that certain SQL on Spark temporary tables do not support Hive
SQL even when they are using HiveContext
example
scala> HiveContext.sql("select count(1) from tmp where ID in (select
max(id) from tmp)")
org.apache.spark.sql.AnalysisException:
Unsupported language features in query: selec
are you using avro format by any chance?
there is some formats that need to be "deep"-copy before caching or
aggregating
try something like
val input = sc.newAPIHadoopRDD(...)
val rdd = input.map(deepCopyTransformation).map(...)
rdd.cache()
rdd.saveAsTextFile(...)
where deepCopyTransformation is f
Hi Reynold,
thanks for the response
Yes, speculation mode needs some coordination.
Regarding job failure :
correct me if I wrong - if one of jobs fails - client code will be sort of
"notified" by exception or something similar, so the client can decide to
re-submit action(job), i.e. it won't be "si
Hello
We have 2 tables (tab1, tab2) exposed using hive. The data is in different
hdfs folders. We are trying to join these 2 tables on certain single column
using sparkR join. But inspite of join columns having same values, it
returns zero rows.
But when I run the same join sql in hive, from hiv
But sometimes you might have skew and almost all the result data are in one
or a few tasks though.
On Friday, February 26, 2016, Jeff Zhang wrote:
>
> My job get this exception very easily even when I set large value of
> spark.driver.maxResultSize. After checking the spark code, I found
> spark
Hi,
The data (in this case example README.md) is kept in Hadoop Distributed
File System (HDFS) among all datanodes in Hadoop cluster. The metadata that
is used to get info about the storage of this file is kept in namenode.
Your data is always stored in HDFS.
Spark is an application that can acce
24 matches
Mail list logo