Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
Hi Changing spark version if my last resort, is there any other workaround for this problem. On Mon, Sep 18, 2017 at 11:43 AM pandees waran wrote: > All, May I know what exactly changed in 2.1.1 which solved this problem? > > Sent from my iPhone > > On Sep 17, 2017, at 11:08 PM, Anastasios Zou

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-17 Thread Arun Khetarpal
Ping. I did some digging around in the code base - I see that this is not present currently. Just looking for an acknowledgement Regards, Arun > On 15-Sep-2017, at 8:43 PM, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and REVOKE statements available? > I

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread pandees waran
All, May I know what exactly changed in 2.1.1 which solved this problem? Sent from my iPhone > On Sep 17, 2017, at 11:08 PM, Anastasios Zouzias wrote: > > Hi, > > I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 > solved my issue. Can you try with 2.1.1 as well and repo

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread Anastasios Zouzias
Hi, I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 solved my issue. Can you try with 2.1.1 as well and report back? Best, Anastasios Am 17.09.2017 16:48 schrieb "HARSH TAKKAR" : Hi I am using spark 2.1.0 with scala 2.11.8, and while iterating over the partitions of e

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread kant kodali
You should paste some code. ConcurrentModificationException normally happens when you modify a list or any non-thread safe data structure while you are iterating over it. On Sun, Sep 17, 2017 at 10:25 PM, HARSH TAKKAR wrote: > Hi, > > No we are not creating any thread for kafka DStream > however

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
Hi, No we are not creating any thread for kafka DStream however, we have a single thread for refreshing a resource cache on driver, but that is totally separate to this connection. On Mon, Sep 18, 2017 at 12:29 AM kant kodali wrote: > Are you creating threads in your application? > > On Sun, Se

spark 2.1.1 ml.LogisticRegression with large feature set cause Kryo serialization failed: Buffer overflow

2017-09-17 Thread haibo wu
I try to train a big model. I have 40 million instances and 50 million feature set, and it is sparse. I am using 40 executors with 20 GB each + driver with 40 GB. The number of data partitions is 5000, the treeAggregate depth is 4, the spark.kryoserializer.buffer.max is 2016m, the spark.driver.maxR

Spark 2.1.1 Driver OOM when use interaction for large scale Sparse Vector

2017-09-17 Thread haibo wu
I'm working on large scale logistic regression for ctr prediction, and when user interaction for feature engineer, driver OOM. For detail, I interact among userid(one-hot, 30w dimension, sparse) and base features(60 dimensions, dense), driver memory is set to 40g. So, I try to debug from remote, a

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread kant kodali
Are you creating threads in your application? On Sun, Sep 17, 2017 at 7:48 AM, HARSH TAKKAR wrote: > > Hi > > I am using spark 2.1.0 with scala 2.11.8, and while iterating over the > partitions of each rdd in a dStream formed using KafkaUtils, i am getting > the below exception, please suggest

ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
Hi I am using spark 2.1.0 with scala 2.11.8, and while iterating over the partitions of each rdd in a dStream formed using KafkaUtils, i am getting the below exception, please suggest a fix. I have following config kafka : enable.auto.commit:"true", auto.commit.interval.ms:"1000", session.timeo