Hello,
This JIRA (SPARK-16951) already being closed with the resolution of "Won't Fix"
on 23/Feb/17.
But in TPC-H test, we met performance issue of Q16, which used NOT IN subquery
and being translated into broadcast nested loop join. This query uses almost
half time of total 22 queries. For ex
Hi All,
I have a tagging problem at hand where we currently use regular expressions
to tag records. Is there a recommended way to distribute & tag? Data is
about 10TB large.
--
Regards,
Rishi Shah
Hi Spark Users,
I want to parse xml coming in the query columns and get the value, I am
using *xpath_int* which works as per my requirement but When I am embedding
in the Spark SQL query columns it is failing.
select timesheet_profile_id,
*xpath_int(timesheet_profile_code, '(/timesheetprofile/wee
Hi,
I am tracking states in my Spark streaming application with
MapGroupsWithStateFunction described here:
https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/streaming/GroupState.html
Which are the limiting factors on the number of states a job can track at
the same time? Is it memor
Thanks Mich, Nilesh.
What is also working is create schema object and provide at .schema(X) in
spark.read. statement.
Thanks a lot.
On Sun, May 10, 2020 at 2:37 AM Nilesh Kuchekar
wrote:
> Hi Chetan,
>
> You can have a static parquet file created, and when you
> create a data
Hello sir,
I'm currently working on a project where i would've to detect anomalies in
real time streaming data pushing data from kafka into apache spark. I chose
to go with streaming kmeans clustering algorithm, but I couldn't find much
about it. Do you think it is a suitable algorithm to go with o
Please unsubscribe me.
Thanks,
Hi,
We are currently trying to replace hive with Spark thrift server.
We encounter a problem. With the following sql:
create table test_db.sink_test as select [some columns] from
test_db.test_source
After the SQL run successfully, we queried data from test_db.test_sink. The
data is
gibberish.
Hi Jeff,
I increased the broadcast timeout. Now facing the new error.
Caused by: java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSh
10 matches
Mail list logo