Caching table partition after join

2016-06-05 Thread Zalzberg, Idan (Agoda)
Hi, I have a complicated scenario where I can't seem to explain to spark how to handle the query in the best way. I am using spark from the thrift server so only SQL. To explain the scenario, let's assume: Table A: Key : String Value : String Table B: Key: String Value2: String Part : String

Re: "bootstrapping" DStream state

2016-03-10 Thread Zalzberg, Idan (Agoda)
l, event) }).updateStateByKey[Long](PrintEventCountsByInterval.counter _, new HashPartitioner(3), initialRDD = initialRDD) counts.print() HTH. -Todd On Thu, Mar 10, 2016 at 1:35 AM, Zalzberg, Idan (Agoda) <idan.zalzb...@agoda.com<mailto:idan.zalzb...@agoda.com>> wrote: Hi, I have a spark-st

"bootstrapping" DStream state

2016-03-09 Thread Zalzberg, Idan (Agoda)
Hi, I have a spark-streaming application that basically keeps track of a string->string dictionary. So I have messages coming in with updates, like: "A"->"B" And I need to update the dictionary. This seems like a simple use case for the updateStateByKey method. However, my issue is that when

RE: FlatMap Explanation

2015-09-03 Thread Zalzberg, Idan (Agoda)
Hi, Yes, I can explain 1 to 3 -> 1,2,3 2 to 3- > 2,3 3 to 3 -> 3 3 to 3 -> 3 Flat map that concatenates the results, so you get 1,2,3, 2,3, 3,3 You should get the same with any scala collection Cheers From: Ashish Soni [mailto:asoni.le...@gmail.com] Sent: Thursday, September 03, 2015 9:06 AM

Setting different amount of cache memory for driver

2015-07-16 Thread Zalzberg, Idan (Agoda)
Hi, I am using the spark thrift server. In my deployment, I need to have more memory for the driver, to be able to get results back from the executors. Currently a lot of the driver memory is spent on caching, but I would prefer the driver would not use memory for that (only the executors) Is

Using different users with spark thriftserver

2015-07-08 Thread Zalzberg, Idan (Agoda)
Hi, We are using spark thrift server as a hive replacement. One of the things we have with hive, is that different users can connect with their own usernames/passwords and get appropriate permissions. So on the same server, one user may have a query that will have permissions to run, while the

RE: unsafe memory access in spark 1.2.1

2015-03-01 Thread Zalzberg, Idan (Agoda)
Thanks, We monitor disk space so I doubt that is it, but I will check again From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 01, 2015 11:45 PM To: Zalzberg, Idan (Agoda) Cc: user@spark.apache.org Subject: Re: unsafe memory access in spark 1.2.1 Google led me to: https

unsafe memory access in spark 1.2.1

2015-03-01 Thread Zalzberg, Idan (Agoda)
Hi, I am using spark 1.2.1, sometimes I get these errors sporadically: Any thought on what could be the cause? Thanks 2015-02-27 15:08:47 ERROR SparkUncaughtExceptionHandler:96 - Uncaught exception in thread Thread[Executor task launch worker-25,5,main] java.lang.InternalError: a fault occurred

RE: unsafe memory access in spark 1.2.1

2015-03-01 Thread Zalzberg, Idan (Agoda)
My run time version is: java version 1.7.0_75 OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13) OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode) Thanks From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, March 01, 2015 10:18 PM To: Zalzberg, Idan (Agoda) Cc: user

Problem when using spark.kryo.registrationRequired=true

2015-02-06 Thread Zalzberg, Idan (Agoda)
Hi, I am trying to strict my serialized classes, as I am having weird issues with regards to serialization. However, my efforts hit a brick wall when I got the exception: Caused by: java.lang.IllegalArgumentException: Class is not registered: scala.reflect.ClassTag$$anon$1 Note: To register

RE: Backporting spark 1.1.0 to CDH 5.1.3

2014-11-13 Thread Zalzberg, Idan (Agoda)
...@cloudera.com] Sent: Tuesday, November 11, 2014 2:52 AM To: Zalzberg, Idan (Agoda) Cc: user@spark.apache.org Subject: Re: Backporting spark 1.1.0 to CDH 5.1.3 Hello, CDH 5.1.3 ships with a version of Hive that's not entirely the same as the Hive Spark 1.1 supports. So when building your custom Spark

Backporting spark 1.1.0 to CDH 5.1.3

2014-11-10 Thread Zalzberg, Idan (Agoda)
Hello, I have a big cluster running CDH 5.1.3 which I can't upgrade to 5.2.0 at the current time. I would like to run Spark-On-Yarn in that cluster. I tried to compile spark with CDH-5.1.3 and I got HDFS to work but I am having problems with the connection to hive: java.sql.SQLException: Could

Re: Exception with SparkSql and Avro

2014-09-23 Thread Zalzberg, Idan (Agoda)
/pull/2475 On Mon, Sep 22, 2014 at 10:07 PM, Zalzberg, Idan (Agoda) idan.zalzb...@agoda.commailto:idan.zalzb...@agoda.com wrote: Hello, I am trying to read a hive table that is stored in Avro DEFLATE files. something simple like “SELECT * FROM X LIMIT 10” I get 2 exceptions in the logs: 2014-09-23