This discussion belongs on the dev list. Please post any replies there.
On Sat, May 23, 2015 at 10:19 PM, Cheolsoo Park piaozhe...@gmail.com
wrote:
Hi,
I've been testing SparkSQL in 1.4 rc and found two issues. I wanted to
confirm whether these are bugs or not before opening a jira.
*1)*
hello there I am trying to run a app in which part of it needs to run a
shell.how to run a shell distributed in spark cluster.thanks.
here's my code:import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.SparkConf;
import
*Problem Description*:
The program running in stand-alone spark cluster (1 master, 6 workers with
8g ram and 2 cores).
Input: a 468MB file with 133433 records stored in HDFS.
Output: just 2MB file will stored in HDFS
The program has two map operations and one reduceByKey operation.
Finally I
Can you pastebin the class path ?
Thanks
On May 24, 2015, at 5:02 AM, boci boci.b...@gmail.com wrote:
Yeah, I have same jar with same result, I run in docker container and I using
same docker container with my another project... the only difference is the
postgresql jdbc driver and the
Yeah, I have same jar with same result, I run in docker container and I
using same docker container with my another project... the only difference
is the postgresql jdbc driver and the custom RDD... no additional
dependencies (both single jar generated with same assembly configuration
with same
Really good list to brush up basics.
Just one input, regarding
* An RDD's processing is scheduled by driver's jobscheduler as a job. At a
given point of time only one job is active. So, if one job is executing the
other jobs are queued.
We can have multiple jobs running in a given
This may sound like an obvious question, but are you sure that the program
is doing any work when you don't have a saveAsTextFile? If there are
transformations but no actions to actually collect the data, there's no
need for Spark to execute the transformations.
As to the question of 'is this
You mean you want to execute some shell commands from spark? Here's
something i tried a while back. https://github.com/akhld/spark-exploit
Thanks
Best Regards
On Sun, May 24, 2015 at 4:53 PM, luohui20...@sina.com wrote:
hello there
I am trying to run a app in which part of it needs to
I used to hit a NPE when i don't add all the dependency jars to my context
while running it in standalone mode. Can you try adding all these
dependencies to your context?
sc.addJar(/home/akhld/.ivy2/cache/org.apache.spark/spark-streaming-kafka_2.10/jars/spark-streaming-kafka_2.10-1.3.1.jar)
Hi,
I'm running this piece of code in my program:
smallRdd.join(largeRdd)
.groupBy { case (id, (_, X(a, _, _))) = a }
.map { case (a, iterable) = a- iterable.size }
.sortBy({ case (_, count) = count }, ascending = false)
.take(k)
where basically
smallRdd is an rdd
Information Innovators, Inc.
http://www.iiinfo.com/
Spark, Spark Streaming, Spark SQL, MLLib
Developing data analytics systems for federal healthcare, national defense
and other programs using Spark on YARN.
--
This page tracks the users of Spark. To add yourself to the list, please
email
I think the Zookeeper watcher code should reside in task code.
Haven't found guide on this subject so far.
Cheers
On Sun, May 24, 2015 at 7:15 PM, bit1...@163.com bit1...@163.com wrote:
Can someone please help me on this?
--
bit1...@163.com
*发件人:*
Thanks Akhil,
your code is a big help to me,'cause perl script is the exactly
thing i wanna try to run in spark. I will have a try.
Thanksamp;Best regards!
San.Luo
- 原始邮件 -
发件人:Akhil Das ak...@sigmoidanalytics.com
收件人:罗辉
HI!
We are developing scoring system for recruitment. Recruiter enters vacancy
requirements, and we score tens of thousands of CVs to this requirements,
and return e.g. top 10 matches.
We do not use fulltext search and sometimes even dont filter input CVs
prior to scoring (some vacancies do not
Can someone please help me on this?
bit1...@163.com
发件人: bit1...@163.com
发送时间: 2015-05-24 13:53
收件人: user
主题: How to use zookeeper in Spark Streaming
Hi,
In my spark streaming application, when the application starts and get running,
the Tasks running on the Worker nodes need to be
Thanks for reporting this.
We intend to support the multiple metastore versions in a single
build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re
hitting the bug, please file a jira issue for this.
I will keep investigating on this also.
Hao
From: Mark Hamstra
Blocks are replicated immediately, before the driver launches any jobs
using them.
On Thu, May 21, 2015 at 2:05 AM, Hemant Bhanawat hemant9...@gmail.com
wrote:
Honestly, given the length of my email, I didn't expect a reply. :-)
Thanks for reading and replying. However, I have a follow-up
17 matches
Mail list logo