https://docs.databricks.com/spark/latest/data-sources/read-lzo.html
On Wed, Sep 27, 2017 at 6:36 AM 孫澤恩 wrote:
> Hi All,
>
> Currently, I follow this blog
> http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
> that
> I could use hdfs
Can you paste the code? It's unclear to me how/when the out of memory is
occurring without seeing the code.
On Sun, Aug 24, 2014 at 11:37 PM, Gefei Li gefeili.2...@gmail.com wrote:
Hello everyone,
I am transplanting a clustering algorithm to spark platform, and I
meet a problem
Hi,
I doubt the the broadcast variable is your problem, since you are seeing:
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.sql
.hive.HiveContext$$anon$3
We have a knowledgebase article that explains why this happens - it's
Hi Chris,
We have a knowledge base article to explain what's happening here:
https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md
Let me know if the article is not clear enough - I would be happy to edit
and improve it.
-Vida
On Wed,
Hi John,
It seems like original problem you had was that you were initializing the
RabbitMQ connection on the driver, but then calling the code to write to
RabbitMQ on the workers (I'm guessing, but I don't know since I didn't see
your code). That's definitely a problem because the connection
On Mon, Aug 18, 2014 at 4:25 PM, Vida Ha v...@databricks.com wrote:
Hi John,
It seems like original problem you had was that you were initializing the
RabbitMQ connection on the driver, but then calling the code to write to
RabbitMQ on the workers (I'm guessing, but I don't know since I didn't
The use case I was thinking of was outputting calculations made in Spark
into a SQL database for the presentation layer to access. So in other
words, having a Spark backend in Java that writes to a SQL database and
then having a Rails front-end that can display the data nicely.
On Thu, Aug 7,
. This is not a slow as you think, because Spark
can write the output in parallel to S3, and Redshift, too, can load data
from multiple files in parallel
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html
.
Nick
On Thu, Aug 7, 2014 at 1:52 PM, Vida Ha v
Hi,
I would like to save an RDD to a SQL database. It seems like this would be
a common enough use case. Are there any built in libraries to do it?
Otherwise, I'm just planning on mapping my RDD, and having that call a
method to write to the database. Given that a lot of records are going to