Here is screenshot . Status shows finished but it should be running for
next batch to pick up the data.
[image: Inline image 1]
On Thu, Nov 16, 2017 at 10:01 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I have scheduled spark streaming job to run every 30 minutes and it
Hi,
I have scheduled spark streaming job to run every 30 minutes and it was
running fine till 32 hours and suddenly I see status of Finsished instead
of running (Since it always run in background and shows up in resource
manager)
Am i doing anything wrong here? how come job was finished without p
Hi all,
Just wanted to announce that Deep Learning Pipelines 0.2.0 has been
released, providing utilities for transfer learning, parallelized
hyperparameter tuning of Keras models, and applying neural networks to
DataFrames as SQL UDFs.
Spark packages:
https://spark-packages.org/package/databrick
I just noticed that there's a problem on the Apache Spark Downloads page at:
https://spark.apache.org/downloads.html
Regardless of which option selected from the 'Choose a package type:'
pulldown menu, the file listed for download is always:
spark-2.2.0-bin-hadoop2.7.tgz
I'm using Chrome Browse
I don't have experience with Cascading, but we saw similar issue for importing
the data generated in Spark into Hive.
Did you try this setting "spark.sql.parquet.writeLegacyFormat" to true?
https://stackoverflow.com/questions/44279870/why-cant-impala-read-parquet-files-after-spark-sqls-write
On 16 Nov 2017, at 10:22, Michael Shtelma wrote:
> you call repartition(1) before starting processing your files. This
> will ensure that you end up with just one partition.
One question and one remark:
Q) val ds = sqlContext.read.parquet(path).repartition(1)
Am I absolutely sure that my file h
Dear Sparkers,
A while back, I asked how to process non-splittable files in parallel, one file
per executor. Vadim's suggested "scheduling within an application" approach
worked out beautifully.
I am now facing the 'opposite' problem:
- I have a bunch of parquet files to process
- Once proce
Hi,
You're right...killing the spark streaming job is the way to go. If a batch
was completed successfully, Spark Streaming will recover from the
controlled failure and start where it left off. I don't think there's other
way to do it.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskow