Hi ,
I have the following rdd :
val conf = new SparkConf()
.setAppName("<< Testing Sorting >>")
val sc = new SparkContext(conf)
val L = List(
(new Student("XiaoTao", 80, 29), "I'm Xiaotao"),
(new Student("CCC", 100, 24), "I'm CCC"),
(new Student("Jack", 90,
Hi,
On Wed, Sep 24, 2014 at 7:23 PM, Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com> wrote:
> So you have a single Kafka topic which has very high retention period (
> that decides the storage capacity of a given Kafka topic) and you want to
> process all historical data first using Camus
thank's
i've already try this solution but it does not compile (in Eclipse)
I'm surprise to see that in Spark-shell, sortByKey works fine on 2
solutions :
(String,String,String,String)
(String,(String,String,String))
--
View this message in context:
http://apache-spark-user-list.1
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal handle
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal handle
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal handl
Hi ,
I am getting this weird error while starting Worker.
-bash-4.1$ spark-class org.apache.spark.deploy.worker.Worker
spark://osebi-UServer:59468
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/09/24 16:22:04 INFO worker.Worker: Registered signal handle
Hello,
if our Hadoop cluster is configured with HA and "fs.defaultFS" points to a
namespace instead of a namenode hostname - hdfs:/// - then
our Spark job fails with exception. Is there anything to configure or it is
not implemented?
Exception in thread "main" org.apache.spark.SparkException: Job
Hi, I want to extract a subgraph using two clauses such as (?person worksAt
MIT && MIT locatesIn USA), I have this query. My data is constructed as
triplets so I assume Graphx will suit it best. But I found out that subgraph
method only supports a single edge checking and it seems to me that I cann
Given this program.. I have the following queries..
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.master", "local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(10))
val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
Hi,
I have a setup (in mind) where data is written to Kafka and this data is
persisted in HDFS (e.g., using camus) so that I have an all-time archive of
all stream data ever received. Now I want to process that all-time archive
and when I am done with that, continue with the live stream, using Spa
Hi David,
Can you try val rddToSave = file.map(l => l.split("\\|")).map(r =>
(r(34)+"-"+r(3), (r(4), r(10), r(12 ?
That should work.
Liquan
On Wed, Sep 24, 2014 at 1:29 AM, david wrote:
> Hi,
>
> Does anybody know how to use sortbykey in scala on a RDD like :
>
> val rddToSave = fi
Hi,
SortByKey is only for RDD[(K, V)], each tuple can only has two members, Spark
will sort with first member, if you want to use sortByKey, you have to change
your RDD[(String, String, String, String)] into RDD[(String, (String, String,
String))].
Thanks
Jerry
-Original Message-
From
Hi,
Does anybody know how to use sortbykey in scala on a RDD like :
val rddToSave = file.map(l => l.split("\\|")).map(r => (r(34)+"-"+r(3),
r(4), r(10), r(12)))
besauce, i received ann error "sortByKey is not a member of
ord.apache.spark.rdd.RDD[(String,String,String,String)].
What i t
top returns the specified number of "largest" elements in your RDD.
They are returned to the driver as an Array. If you want to make an
RDD out of them again, call SparkContext.parallelize(...). Make sure
this is what you mean though.
On Wed, Sep 24, 2014 at 5:33 AM, Deep Pradhan wrote:
> Hi,
> I
If you look at the code for HdfsWordCount, you see it calls print(),
which defaults to print 10 elements from each RDD. If you are just
talking about the console output, then it is not expected to print all
words to begin with.
On Wed, Sep 24, 2014 at 2:29 AM, SK wrote:
>
> I execute it as follow
Hi all
Reading through Spark streaming's custom receiver documentation, it is
recommended that onStart and onStop methods should not block indefinitely.
However, looking at the source code of KinesisReceiver, the onStart method
calls worker.run that blocks until worker is shutdown (via a call to
o
101 - 117 of 117 matches
Mail list logo