Hi Team,
Is it ok to spawn multiple spark jobs within a main spark job, my main
spark job's driver which was launched on yarn cluster, will do some
preprocessing and based on it, it needs to launch multilple spark jobs on
yarn cluster. Not sure if this right pattern.
Please share your thoughts.
Hi,
Can you try the combination of `repartition` + `sortWithinPartitions` on the
dataset?
E.g.,
val df = Seq((2, "b c a"), (1, "c a b"), (3, "a c b")).toDF("number",
"letters")
val df2 =
df.explode('letters) {
case Row(letters: String) => letters.split("
Hi,
You can't invoke any RDD actions/transformations inside another
transformations. They must be invoked by the driver.
If I understand your purpose correctly, you can partition your data (i.e.,
`partitionBy`) when writing out to parquet files.
-
Liang-Chi Hsieh | @viirya
Spark
Hi All,
PFB sample code ,
val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl",
zipcode)
val data =
Hi All,
PFB sample code ,
val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode)
val data =
Hi spark dev,
I am using spark 2 to write orc file to hdfs. I have one questions
about the savemode.
My use case is this. When I write data into hdfs. If one task failed I
hope the file that the task created should be delete and the retry task can
write all data, that is to
Hi spark dev,
I am using spark 2 to write orc file to hdfs. I have one questions
about the savemode.
My use case is this. When I write data into hdfs. If one task failed I
hope the file that the task created should be delete and the retry task can
write all data, that is to
HI Dev,
Sorry to bother with non-technical query. I wish to connect with any
active contributor / committer in and around Chennai / TamilNadu. I wish to
connect in person. Is there a list of all committer details in any location?
Regs,
Siva.
The examples look great indeed. Seems a good addition to the existing
documentation.
I understand the UDAF examples don't apply to Python but is there any
relevant reason to skip Python API altogether from this window functions
documentation?
On 20 December 2016 at 16:56, Jim Hughes
Hi Anton,
Your example and documentation looks great! I left some comments
suggesting a few additions, but the PR in its current state is a great
improvement!
Thanks,
Jim
On 12/18/2016 09:09 AM, Anton Okolnychyi wrote:
Any comments/suggestions are more than welcome.
Thanks,
Anton
Hi Shixiong,
Thanks for taking a look, I am trying to run and see if making
ContextCleaner run more frequently and/or making it non blocking will help.
--Prashant
On Tue, Dec 20, 2016 at 4:05 AM, Shixiong(Ryan) Zhu wrote:
> Hey Prashant. Thanks for your codes. I did
Hi Nick,
The scope of the PR I submitted is reduced because we can't make sure if it
is really the root cause of the error you faced. You can check out the
discussion on the PR. So I can just change the assert in the code as shown
in the PR.
If you can have a repro, we can go back to see if it
12 matches
Mail list logo