I have two java applications sharing the same spark cluster, the applications
should be running on different servers.
Based on my experience, if spark driver (inside java application) connects
remotely to spark master (which is running on different node), then the
response time to submit a
_1#0, _2#1, _3#2] You can use " roll up"
or "group set" for multiple dimension to replace "union" or "union all" On
Tue, Nov 20, 2018 at 8:34 PM onmstester onmstester
wrote: I'm using Spark-Sql to query Cassandra
tables. In Cassandra, i've partitioned
I'm using Spark-Sql to query Cassandra tables. In Cassandra, i've partitioned
my data with time bucket and one id, so based on queries i need to union
multiple partitions with spark-sql and do the aggregations/group-by on
union-result, something like this: for(all cassandra partitions){
DataSet
You could have used two separate pools with different weights for ETL and rest
jobs, when ETL pool weights is about 1 and Rest weight is 1000, anytime a Rest
Job comes in, it allocate all the resources. Details:
https://spark.apache.org/docs/latest/job-scheduling.html Sent using Zoho Mail
Refer: https://spark.apache.org/docs/latest/quick-start.html 1. Create a
singleton SparkContext at initialization of your cluster, the spark-context or
spark-sql would be accessible through a static method anywhere in your
application. I recommend using Fair scheduling on your context, to share
What about using cache() or save as a global temp table for subsequent access?
Sent using Zoho Mail Forwarded message From : Affan
Syed To : "spark users" Date : Thu, 25
Oct 2018 10:58:43 +0330 Subject : Having access to spark results
Forwarded message
e steps to configure this?
Thanks On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester
wrote: Hi, I failed to config spark for in-memory
shuffle so currently just using linux memory mapped directory (tmpfs) as
working directory of spark, so everything is fast Sent using Zoho Mail
Hi, I failed to config spark for in-memory shuffle so currently just using
linux memory mapped directory (tmpfs) as working directory of spark, so
everything is fast Sent using Zoho Mail On Wed, 17 Oct 2018 16:41:32 +0330
thomas lavocat wrote Hi everyone,
The possibility to have in
I'm loading some json files in a loop, deserialize them in a list of objects
and create a temp table from the list, run a select on table (repeat this for
every file): for(jsonFile : allJsonFiles){ sqlcontext.sql("select * from
mainTable").filter(").createOrReplaceTempView("table1");
I have a spark cluster containing 3 nodes and my application is a jar file
running by java -jar .
How can i set driver.memory for my application?
spark-defaults.conf only would be read by ./spark-summit
"java --driver-memory -jar " fails with exception.
Sent using Zoho Mail
How to enable jmx for spark worker/executor/driver in standalone mode?
i have add these:
spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote \
-Dcom.sun.management.jmxremote.port=9178 \
-Dcom.sun.management.jmxremote.authenticate=false \
-Dcom.sun.management.jmxremote.ssl=false
Hi,
I'm using spark on top of cassandra as backend CRUD of a Restfull Application.
Most of Rest API's retrieve huge amount of data from cassandra and doing a lot
of aggregation on them in spark which take some seconds.
Problem: sometimes the output result would be a big list which make
I'm reading from this table in cassandra:
Table mytable (
Integer Key,
Integer X,
Interger Y
Using:
sparkSqlContext.sql(select * from mytable where key = 1 and (X,Y) in
((1,2),(3,4)))
Encountered error:
StructType(StructField((X,IntegerType,true),StructField((Y,IntegerType,true))
I could not find how to pass a list to isin() filter in java, something like
this could be done with scala:
val ids = Array(1,2) df.filter(df("id").isin(ids:_*)).show
But in java everything that converts java list to scala Seq fails with
unsupported literal type exception:
Hi,
I need to run some queries on huge amount input records. Input rate for records
are 100K/seconds.
A record is like (key1,key2,value) and the application should report occurances
of kye1 = something key2 == somethingElse.
The problem is there are too many filters in my query: more than
15 matches
Mail list logo