You can also cache the data frame on disk, if it does not fit into memory.
An alternative would be to write out data frame as parquet and then read
it, you can check if in this case the whole pipeline works faster as with
the standard cache.
Best,
Michael
On Tue, Nov 20, 2018 at 9:14 AM Dipl.-In
Hi,
you can use this project in order to read Avro using Spark Structured
Streaming:
https://github.com/AbsaOSS/ABRiS
Spark 2.4 has also built in support for Avro, so you can use from_avro
function in Spark 2.4.
Best,
Michael
On Sat, Nov 3, 2018 at 4:34 AM Divya Narayan
wrote:
> Hi,
>
> I pr
If you configure to many Kafka partitions, you can run into memory issues.
This will increase memory requirements for spark job a lot.
Best,
Michael
On Wed, Nov 7, 2018 at 8:28 AM JF Chen wrote:
> I have a Spark Streaming application which reads data from kafka and save
> the the transformatio
Hi Xavier,
Dremio is looking really interesting and has nice UI. I think the idea to
replace SSIS or similar tools with Dremio is not so bad, but what about
complex scenarios with a lot of code and transformations ?
Is it possible to use Dremio via API and define own transformations and
transforma
Hi everybody,
I wanted to test CBO with enabled histograms.
In order to do this, I have enabled property
spark.sql.statistics.histogram.enabled.
In this test derby was used as a database for hive metastore.
The problem is, that in some cases, the values, that are inserted to table
TABLE_PARAMS e
ltiple directories on different disks. NOTE: In Spark 1.0 and later
> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>
> Regards,
> Gourav Sengupta
>
>
>
>
>
> On Mon, Mar 26, 201
you file a jira if this is a bug?
> Thanks!
>
> On Sat, Mar 24, 2018 at 1:23 AM, Michael Shtelma wrote:
>>
>> Hi Maropu,
>>
>> the problem seems to be in FilterEstimation.scala on lines 50 and 52:
>>
>> https://github.com/apache/spark/blob/master/sql/catal
guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM,
maropu
>
> On Fri, Mar 23, 2018 at 6:20 PM, Michael Shtelma wrote:
>>
>> Hi all,
>>
>> I am using Spark 2.3 with activated cost-based optimizer and a couple
>> of hive tables, that were analyzed previously.
>>
>> I am getting the following exceptio
Hi all,
I am using Spark 2.3 with activated cost-based optimizer and a couple
of hive tables, that were analyzed previously.
I am getting the following exception for different queries:
java.lang.NumberFormatException
at java.math.BigDecimal.(BigDecimal.java:494)
at java.math.BigDecimal.(BigDec
h/appcache/application_1521110306769_0041/container_1521110306769_0041_01_04/tmp
JVM is using the second Djava.io.tmpdir parameter and writing
everything to the same directory as before.
Best,
Michael
Sincerely,
Michael Shtelma
On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman wrote:
> Can yo
Chapman wrote:
> Hi Michael,
>
> You could either set spark.local.dir through spark conf or java.io.tmpdir
> system property.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma wrote:
>>
>> Hi ev
Hi everybody,
I am running spark job on yarn, and my problem is that the blockmgr-*
folders are being created under
/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
The size of this folder can grow to a significant size and does not
really fit into /tmp file system for one job,
Hi all,
I have a problem with the performance of the sparkSession.sql call. It
takes up to a couple of seconds for me right now. I have a lot of
generated temporary tables, which are registered within the session
and also a lot of temporary data frames. Is it possible, that the
analysis/resolve/an
> Pozdrawiam,
> Jacek Laskowski
>
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me
me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>
> On Mon, Jan 15, 2018 a
Hi all,
I am trying to compile my udf with janino copmpiler and then register
it in spark and use it afterwards. Here is the code:
String s = " \n" +
"public class MyUDF implements
org.apache.spark.sql.api.java.UDF1 {\n" +
"@Override\n" +
"public St
17 matches
Mail list logo