Spark sql creating managed table with location converts it to external table

2018-06-22 Thread Nirav Patel
http://www.gatorsmile.io/table-types-in-spark-external-or-managed/ "We do not allow users to create a MANAGED table with the users supplied LOCATION." Is this supposed to get resolved in 2.2 ? Thanks --   

Re: Increase no of tasks

2018-06-22 Thread pratik4891
It's default , I haven't changed that. Is there any specific way I can know that no Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Increase no of tasks

2018-06-22 Thread Apostolos N. Papadopoulos
How many partitions do you have in your data? On 22/06/2018 09:46 μμ, pratik4891 wrote: Hi Gurus, I am running a spark job and in one stage it's creating 9 tasks .So even if I have 25 executors only 9s are getting utilized. The other executors going to dead status , how can I increase the

Increase no of tasks

2018-06-22 Thread pratik4891
Hi Gurus, I am running a spark job and in one stage it's creating 9 tasks .So even if I have 25 executors only 9s are getting utilized. The other executors going to dead status , how can I increase the no of tasks so all my executors can be utilized.Any help/guidance is appreciated :)

Re: [Spark Structured Streaming] Measure metrics from CsvSink for Rate source

2018-06-22 Thread Dhruv Kumar
I actually tried the File Source (reading CSV files as stream and processing them). File source seems to be generating valid numbers in the metrics log files. I may be wrong but seems like an issue with the Rate source generating metrics in the metrics log files.

Dataframe to automatically create Impala table when writing to Impala

2018-06-22 Thread Spico Florin
Hello! I would like to know if there is any feature in Spark Dataframe that when is writing data to a Impala table, to also create that table when this table was not previously cretaed in Impala . For example, the code: myDatafarme.write.mode(SaveMode.Overwrite).jdbc(jdbcURL, "books",

Re: RepartitionByKey Behavior

2018-06-22 Thread Nathan Kronenfeld
> > On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit >>> wrote: >>> Hi I have been trying to this simple operation. I want to land all values with one key in same partition, and not have any different key in the same partition. Is this possible? I am getting b and c

Kafka streaming maxOffsetsPerTrigger

2018-06-22 Thread Girish Subramanian
Hi. I am in the process of migrating from DStreams to structured steaming. In DStream API job we were using batch size as 1 minute and it was taking 30-45 secs to process it. With structured streaming I am using a trigger interval of 30 secs but it seems to be taking a large amount of time to

Re: RepartitionByKey Behavior

2018-06-22 Thread Elior Malul
Hi Chawla, There is nothing wrong with your code, nor with Spark. The situation in which two different keys are mapped to the same partition is perfectly valid, since they are mapped to the same 'bucket'. The promise is that all records with the same key 'k' will be mapped to the same partition.