from:"onmstester onmstester"

How to start two Workers connected to two different masters

2019-02-27 Thread onmstester onmstester

I have two java applications sharing the same spark cluster, the applications should be running on different servers. Based on my experience, if spark driver (inside java application) connects remotely to spark master (which is running on different node), then the response time to submit a job

Fwd: Re: spark-sql force parallel union

2018-11-20 Thread onmstester onmstester

ableScan [_1#0, _2#1, _3#2] You can use " roll up" or "group set" for multiple dimension to replace "union" or "union all" On Tue, Nov 20, 2018 at 8:34 PM onmstester onmstester wrote: I'm using Spark-Sql to query Cassandra tables. In Cassandra, i

spark-sql force parallel union

2018-11-20 Thread onmstester onmstester

I'm using Spark-Sql to query Cassandra tables. In Cassandra, i've partitioned my data with time bucket and one id, so based on queries i need to union multiple partitions with spark-sql and do the aggregations/group-by on union-result, something like this: for(all cassandra partitions){ DataSet

Fwd: How to avoid long-running jobs blocking short-running jobs

2018-11-03 Thread onmstester onmstester

You could have used two separate pools with different weights for ETL and rest jobs, when ETL pool weights is about 1 and Rest weight is 1000, anytime a Rest Job comes in, it allocate all the resources. Details: https://spark.apache.org/docs/latest/job-scheduling.html Sent using Zoho Mail =

Fwd: use spark cluster in java web service

2018-11-01 Thread onmstester onmstester

Refer: https://spark.apache.org/docs/latest/quick-start.html 1. Create a singleton SparkContext at initialization of your cluster, the spark-context or spark-sql would be accessible through a static method anywhere in your application. I recommend using Fair scheduling on your context, to share

Fwd: Having access to spark results

2018-10-25 Thread onmstester onmstester

What about using cache() or save as a global temp table for subsequent access? Sent using Zoho Mail Forwarded message From : Affan Syed To : "spark users" Date : Thu, 25 Oct 2018 10:58:43 +0330 Subject : Having access to spark results Forwarded message =

Re: Spark In Memory Shuffle

2018-10-18 Thread onmstester onmstester

ps to configure this? Thanks On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester wrote: Hi, I failed to config spark for in-memory shuffle so currently just using linux memory mapped directory (tmpfs) as working directory of spark, so everything is fast Sent using Zoho Mail

Re: Spark In Memory Shuffle

2018-10-17 Thread onmstester onmstester

Hi, I failed to config spark for in-memory shuffle so currently just using linux memory mapped directory (tmpfs) as working directory of spark, so everything is fast Sent using Zoho Mail On Wed, 17 Oct 2018 16:41:32 +0330 thomas lavocat wrote Hi everyone, The possibility to have in m

createorreplacetempview cause memory leak

2018-06-21 Thread onmstester onmstester

I'm loading some json files in a loop, deserialize them in a list of objects and create a temp table from the list, run a select on table (repeat this for every file): for(jsonFile : allJsonFiles){ sqlcontext.sql("select * from mainTable").filter(").createOrReplaceTempView("table1"); sqlcon

How to set spark.driver.memory?

2018-06-19 Thread onmstester onmstester

I have a spark cluster containing 3 nodes and my application is a jar file running by java -jar . How can i set driver.memory for my application? spark-defaults.conf only would be read by ./spark-summit "java --driver-memory -jar " fails with exception. Sent using Zoho Mail

enable jmx in standalone mode

2018-06-19 Thread onmstester onmstester

How to enable jmx for spark worker/executor/driver in standalone mode? i have add these: spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=9178 \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false spark

spark optimized pagination

2018-06-09 Thread onmstester onmstester

Hi, I'm using spark on top of cassandra as backend CRUD of a Restfull Application. Most of Rest API's retrieve huge amount of data from cassandra and doing a lot of aggregation on them in spark which take some seconds. Problem: sometimes the output result would be a big list which make clien

spark sql in-clause problem

2018-05-22 Thread onmstester onmstester

I'm reading from this table in cassandra: Table mytable ( Integer Key, Integer X, Interger Y Using: sparkSqlContext.sql(select * from mytable where key = 1 and (X,Y) in ((1,2),(3,4))) Encountered error: StructType(StructField((X,IntegerType,true),StructField((Y,IntegerType,true)) !=

Scala's Seq:* equivalent in java

2018-05-15 Thread onmstester onmstester

I could not find how to pass a list to isin() filter in java, something like this could be done with scala: val ids = Array(1,2) df.filter(df("id").isin(ids:_*)).show But in java everything that converts java list to scala Seq fails with unsupported literal type exception: JavaConversions.asScal

spark sql StackOverflow

2018-05-14 Thread onmstester onmstester

Hi, I need to run some queries on huge amount input records. Input rate for records are 100K/seconds. A record is like (key1,key2,value) and the application should report occurances of kye1 = something && key2 == somethingElse. The problem is there are too many filters in my query: more tha

How to start two Workers connected to two different masters

Fwd: Re: spark-sql force parallel union

spark-sql force parallel union

Fwd: How to avoid long-running jobs blocking short-running jobs

Fwd: use spark cluster in java web service

Fwd: Having access to spark results

Re: Spark In Memory Shuffle

Re: Spark In Memory Shuffle

createorreplacetempview cause memory leak

How to set spark.driver.memory?

enable jmx in standalone mode

spark optimized pagination

spark sql in-clause problem

Scala's Seq:* equivalent in java

spark sql StackOverflow

15 matches

Site Navigation

Mail list logo

Footer information