Hi there,
We are currently evaluating Spark to provide streaming analytics for
unbounded data sets. Basically creating aggregations over event time windows
while supporting both early and late/out of order messages. For streaming
aggregations (=SQL operations with well-defined semantics, e.g. coun
Data Source V2 API is available in Spark 2.3. But currently there's no
official library using data source v2 api to read/write avro or orc files --
spark-avro and spark-orc are both using Data Source V1. I wonder if there's
a plan in the upstream to implement those readers/writers and make them
off
I can't... do you think that it's a possible bug of this version?? from
Spark or Kafka?
El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
escribió:
> Are you able to try a recent version of spark?
>
> On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
> wrote:
> > I'm using Spark Stre
Are you able to try a recent version of spark?
On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
wrote:
> I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this
> exception and Spark dies.
>
> I couldn't see any error or problem among the machines, anybody has the
> reason
Hi,
I haven't checked my answer (too lazy today), but think I know what might
be going on.
tl;dr Use cache to preserve the initial set of rows from mysql
After you append new rows, you will have twice as many rows as you had
previously. Correct?
Since newDF references the table every time you u
Sorry, last mail format was not good.
println("Going to talk to mySql")
// Read table from mySQL.
val mysqlDF = spark.read.jdbc(jdbcUrl, table, properties)
println("I am back from mySql")
mysqlDF.show()
// Create a new Dataframe with column 'id' increased to avoid Duplicate
primary keys
va
Hi,
Can anyone help me to understand what is happening with my code ?
I wrote a Spark application to read from a MySQL table [that already has 4
records], Create a new DF by adding 10 to the ID field. Then, I wanted to
write the new DF to MySQL as well as to Hive.
I am surprised to se
Dear Apostolos,
Thanks for the response!
Our version is built on 2.1, the problem is that the state-of-the-art
system I'm trying to compare is built on the version 1.2. So I have to deal
with it.
If I understand the level of parallelism correctly, --total-executor-cores
is set to the number or w
Dear Jeevan,
Spark 1.2 is quite old, and If I were you I would go for a newer version.
However, is there a parallelism level (e.g., 20, 30) that works for both
installations?
regards,
Apostolos
On 29/08/2018 04:55 μμ, jeevan.ks wrote:
Hi,
I've two systems. One is built on Spark 1.2 and
Hi,
I've two systems. One is built on Spark 1.2 and the other on 2.1. I am
benchmarking both with the same benchmarks (wordcount, grep, sort, etc.)
with the same data set from S3 bucket (size ranges from 50MB to 10 GB). The
Spark cluster I made use of is r3.xlarge, 8 instances, 4 cores each, and
2
Hi,
I think that the best option is to use the py4j which is either
automatically installed with "pip install pyspark" or when we unzip the
Spark download from its site, its in SPARK_HOME/python/lib folder.
Regards,
Gourav Sengupta
On Wed, Aug 29, 2018 at 8:00 AM Aakash Basu
wrote:
> Hi,
>
>
Hi Team,
I am creating udf as follow from external jar
val spark = SparkSession.builder.appName("UdfUser")
.master("local")
.enableHiveSupport()
.getOrCreate()
spark.sql("CREATE FUNCTION uppercase AS 'path.package.udf.UpperCase' " +
"USING JAR '/home/swapnil/udf-jars/spar
I got this error from spark driver, it seems that I should increase the
memory in the driver although it's 5g (and 4 cores) right now. It seems
weird to me because I'm not using Kryo or broadcast in this process but in
the log there are references to Kryo and broadcast.
How could I figure out the r
I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this
exception and Spark dies.
I couldn't see any error or problem among the machines, anybody has the
reason about this error?
java.lang.IllegalStateException: This consumer has already been closed.
at
org.apache.kafka.clients
Hi,
Which Py4J version goes with Spark 2.3.1? I have py4j-0.10.7 but throws an
error because of certain compatibility issues with the Spark 2.3.1.
Error:
[2018-08-29] [06:46:56] [ERROR] - Traceback (most recent call last): File
"", line 120, in run File
"/data/spark-2.3.1-bin-hadoop2.7/python/li
15 matches
Mail list logo