date:20190211

The spark sql ODBC/JDBC driver that supports Kerbose delegation

2019-02-11 Thread luby

Hi, All,

We want to use SPARK SQL in Tableau. But according to the 
https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_sparksql.htm

The driver provided by Tableau doesn't suppport Kerbose delegation.

Is there any SPARK SQL ODBC or JDBC driver that support Kerbose 
delegation?

Thanks 

Boying 



 
本邮件内容包含保密信息。如阁下并非拟发送的收件人，请您不要阅读、保存、对外
披露或复制本邮件的任何内容，或者打开本邮件的任何附件。请即回复邮件告知发件
人，并立刻将该邮件及其附件从您的电脑系统中全部删除，不胜感激。

 
This email message may contain confidential and/or privileged information. 
If you are not the intended recipient, please do not read, save, forward, 
disclose or copy the contents of this email or open any file attached to 
this email. We will be grateful if you could advise the sender immediately 
by replying this email, and delete this email and any attachment or links 
to this email completely and immediately from your computer system.

Create Hive table from CSVfile

2019-02-11 Thread Soheil Pourbafrani

Hi, Using the following code I create a Thrift Server including a Hive
table from CSV file and I expect it considers the first line as a header
but when I select data from the so-called table, I see it considers the CSV
header as data row! It seems the line "TBLPROPERTIES(skip.header.line.count
= 1)" didn't work! Is there any way to do that using the SparkSQL?

def main(args: Array[String]): Unit = {
val conf = new SparkConf
conf
  .set("hive.server2.thrift.port", "1")
  .set("spark.sql.hive.thriftServer.singleSession", "true")
  .set("spark.sql.warehouse.dir", "/metadata/hive")
  .set("spark.sql.catalogImplementation","hive")
  .set("skip.header.line.count","1")
  .setMaster("local[*]")
  .setAppName("ThriftServer")
val sc = new SparkContext(conf)
val spark = SparkSession.builder()
  .config(conf)
  .enableHiveSupport()
  .getOrCreate()

spark.sql(
  "CREATE TABLE IF NOT EXISTS freq_back (" +
"id int," +
"time_stamp bigint," +
"time_quality string )" +
"ROW FORMAT DELIMITED " +
"FIELDS TERMINATED BY ',' " +
"STORED AS TEXTFILE " +
"LOCATION 'hdfs://DB_BackUp/freq' " +
"TBLPROPERTIES(skip.header.line.count = 1)"
)

HiveThriftServer2.startWithContext(spark.sqlContext)

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Vadim Semenov

something like this

import org.apache.spark.TaskContext
ds.map(r => {
  val taskContext = TaskContext.get()
  if (taskContext.partitionId == 1000) {
throw new RuntimeException
  }
  r
})

On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak  wrote:
>
> I need to crash task which does repartition.
>
> пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi :
>>
>> What blocks you to put if conditions inside the mentioned map function?
>>
>> On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak  
>> wrote:
>>>
>>> Yeah, but I don't need to crash entire app, I want to fail several tasks or 
>>> executors and then wait for completion.
>>>
>>> вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi :

 Another approach is adding artificial exception into the application's 
 source code like this:

 val query = input.toDS.map(_ / 0).writeStream.format("console").start()

 G


 On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak  
 wrote:
>
> Hi BR,
> thanks for your reply. I want to mimic the issue and kill tasks at a 
> certain stage. Killing executor is also an option for me.
> I'm curious how do core spark contributors test spark fault tolerance?
>
>
> вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi :
>>
>> Hi Serega,
>>
>> If I understand your problem correctly you would like to kill one 
>> executor only and the rest of the app has to be untouched.
>> If that's true yarn -kill is not what you want because it stops the 
>> whole application.
>>
>> I've done similar thing when tested/testing Spark's HA features.
>> - jps -vlm | grep 
>> "org.apache.spark.executor.CoarseGrainedExecutorBackend.*applicationid"
>> - kill -9 pidofoneexecutor
>>
>> Be aware if it's a multi-node cluster check whether at least one process 
>> runs on a specific node(it's not required).
>> Happy killing...
>>
>> BR,
>> G
>>
>>
>> On Sun, Feb 10, 2019 at 4:19 PM Jörn Franke  wrote:
>>>
>>> yarn application -kill applicationid ?
>>>
>>> > Am 10.02.2019 um 13:30 schrieb Serega Sheypak 
>>> > :
>>> >
>>> > Hi there!
>>> > I have weird issue that appears only when tasks fail at specific 
>>> > stage. I would like to imitate failure on my own.
>>> > The plan is to run problematic app and then kill entire executor or 
>>> > some tasks when execution reaches certain stage.
>>> >
>>> > Is it do-able?
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>


-- 
Sent from my iPhone

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: structured streaming handling validation and json flattening

2019-02-11 Thread Jacek Laskowski

Hi Lian,

"What have you tried?" would be a good starting point. Any help on this?

How do you read the JSONs? readStream.json? You could use readStream.text
followed by filter to include/exclude good/bad JSONs.

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Sat, Feb 9, 2019 at 8:25 PM Lian Jiang  wrote:

> Hi,
>
> We have a structured streaming job that converting json into parquets. We
> want to validate the json records. If a json record is not valid, we want
> to log a message and refuse to write it into the parquet. Also the json has
> nesting jsons and we want to flatten the nesting jsons into other parquets
> by using the same streaming job. My questions are:
>
> 1. how to validate the json records in a structured streaming job?
> 2. how to flattening the nesting jsons in a structured streaming job?
> 3. is it possible to use one structured streaming job to validate json,
> convert json into a parquet and convert nesting jsons into other parquets?
>
> I think unstructured streaming can achieve these goals but structured
> streaming is recommended by spark community.
>
> Appreciate your feedback!
>

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak

I need to crash task which does repartition.

пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi :

> What blocks you to put if conditions inside the mentioned map function?
>
> On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak 
> wrote:
>
>> Yeah, but I don't need to crash entire app, I want to fail several tasks
>> or executors and then wait for completion.
>>
>> вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi :
>>
>>> Another approach is adding artificial exception into the application's
>>> source code like this:
>>>
>>> val query = input.toDS.map(_ / 0).writeStream.format("console").start()
>>>
>>> G
>>>
>>>
>>> On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak 
>>> wrote:
>>>
 Hi BR,
 thanks for your reply. I want to mimic the issue and kill tasks at a
 certain stage. Killing executor is also an option for me.
 I'm curious how do core spark contributors test spark fault tolerance?


 вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi >>> >:

> Hi Serega,
>
> If I understand your problem correctly you would like to kill one
> executor only and the rest of the app has to be untouched.
> If that's true yarn -kill is not what you want because it stops the
> whole application.
>
> I've done similar thing when tested/testing Spark's HA features.
> - jps -vlm | grep
> "org.apache.spark.executor.CoarseGrainedExecutorBackend.*applicationid"
> - kill -9 pidofoneexecutor
>
> Be aware if it's a multi-node cluster check whether at least one
> process runs on a specific node(it's not required).
> Happy killing...
>
> BR,
> G
>
>
> On Sun, Feb 10, 2019 at 4:19 PM Jörn Franke 
> wrote:
>
>> yarn application -kill applicationid ?
>>
>> > Am 10.02.2019 um 13:30 schrieb Serega Sheypak <
>> serega.shey...@gmail.com>:
>> >
>> > Hi there!
>> > I have weird issue that appears only when tasks fail at specific
>> stage. I would like to imitate failure on my own.
>> > The plan is to run problematic app and then kill entire executor or
>> some tasks when execution reaches certain stage.
>> >
>> > Is it do-able?
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

RE: Multiple column aggregations

2019-02-11 Thread Shiva Prashanth Vallabhaneni

Hi Sonu,

You could use a query that is similar to the below one. You could further 
optimize the below query by adding a WHERE clause. I would suggest that you 
benchmark the performance of both approaches (multiple group-by queries vs 
single query with multiple window functions), before choosing one of these 
options. Before running the benchmark, I would ensure that the underlying data 
is stored in a columnar storage format with compression enabled. For instance, 
you could use parquet file format with block-level compression using Snappy.

SELECT  SUM(CASE WHEN accountRank =2 THEN 1 ELSE 0 END) AS 
accountsWithMoreThanOneOrder,
SUM(CASE WHEN orderRank =2 THEN 1 ELSE 0 END) AS ordersWithMoreThanOneAccount,
FROM   (
  SELECT  accountNo,
   orderNo,
  rank() OVER (PARTITION BY orderNo ORDER BY accountNo) AS orderRank,
 rank() OVER (PARTITION BY accountNo ORDER BY orderNo) AS accountRank
  FROM   accountOrders
)

P.S – You will need to check the above query for any syntax errors.

– Shiva

From: Sonu Jyotshna 
Sent: Saturday, February 9, 2019 10:17 AM
To: user@spark.apache.org
Subject: Multiple column aggregations


Hello,

I have a requirement where I need to group by multiple columns and aggregate 
them not at same time .. I mean I have a structure which contains accountid, 
some cols, order id . I need to calculate some scenarios like account having 
multiple orders so group by account and aggregate will work here but I need to 
find orderid associated to multiple accounts so may be group by orderid will 
work here but for better performance on the dataset level can we do in single 
step? Where both will work or any better approach I can follow . Can you help


Regards,
Sonu

Any comments or statements made in this email are not necessarily those of 
Tavant Technologies. The information transmitted is intended only for the 
person or entity to which it is addressed and may contain confidential and/or 
privileged material. If you have received this in error, please contact the 
sender and delete the material from any computer. All emails sent from or to 
Tavant Technologies may be subject to our monitoring procedures.

Data growth vs Cluster Size planning

2019-02-11 Thread Aakash Basu

Hi,

I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master
18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes*
for a *very large ML tuning based job* (training).

Now, my requirement is to run the *same operation on 3M records*. Any idea
on how we should proceed? Should we go for a vertical scaling or a
horizontal one? How should this problem be approached in a
stepwise/systematic manner?

Thanks in advance.

Regards,
Aakash.

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Gabor Somogyi

What blocks you to put if conditions inside the mentioned map function?

On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak 
wrote:

> Yeah, but I don't need to crash entire app, I want to fail several tasks
> or executors and then wait for completion.
>
> вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi :
>
>> Another approach is adding artificial exception into the application's
>> source code like this:
>>
>> val query = input.toDS.map(_ / 0).writeStream.format("console").start()
>>
>> G
>>
>>
>> On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak 
>> wrote:
>>
>>> Hi BR,
>>> thanks for your reply. I want to mimic the issue and kill tasks at a
>>> certain stage. Killing executor is also an option for me.
>>> I'm curious how do core spark contributors test spark fault tolerance?
>>>
>>>
>>> вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi :
>>>
 Hi Serega,

 If I understand your problem correctly you would like to kill one
 executor only and the rest of the app has to be untouched.
 If that's true yarn -kill is not what you want because it stops the
 whole application.

 I've done similar thing when tested/testing Spark's HA features.
 - jps -vlm | grep
 "org.apache.spark.executor.CoarseGrainedExecutorBackend.*applicationid"
 - kill -9 pidofoneexecutor

 Be aware if it's a multi-node cluster check whether at least one
 process runs on a specific node(it's not required).
 Happy killing...

 BR,
 G


 On Sun, Feb 10, 2019 at 4:19 PM Jörn Franke 
 wrote:

> yarn application -kill applicationid ?
>
> > Am 10.02.2019 um 13:30 schrieb Serega Sheypak <
> serega.shey...@gmail.com>:
> >
> > Hi there!
> > I have weird issue that appears only when tasks fail at specific
> stage. I would like to imitate failure on my own.
> > The plan is to run problematic app and then kill entire executor or
> some tasks when execution reaches certain stage.
> >
> > Is it do-able?
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Spark on YARN, HowTo kill executor or individual task?

2019-02-11 Thread Serega Sheypak

Yeah, but I don't need to crash entire app, I want to fail several tasks or
executors and then wait for completion.

вс, 10 февр. 2019 г. в 21:49, Gabor Somogyi :

> Another approach is adding artificial exception into the application's
> source code like this:
>
> val query = input.toDS.map(_ / 0).writeStream.format("console").start()
>
> G
>
>
> On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak 
> wrote:
>
>> Hi BR,
>> thanks for your reply. I want to mimic the issue and kill tasks at a
>> certain stage. Killing executor is also an option for me.
>> I'm curious how do core spark contributors test spark fault tolerance?
>>
>>
>> вс, 10 февр. 2019 г. в 16:57, Gabor Somogyi :
>>
>>> Hi Serega,
>>>
>>> If I understand your problem correctly you would like to kill one
>>> executor only and the rest of the app has to be untouched.
>>> If that's true yarn -kill is not what you want because it stops the
>>> whole application.
>>>
>>> I've done similar thing when tested/testing Spark's HA features.
>>> - jps -vlm | grep
>>> "org.apache.spark.executor.CoarseGrainedExecutorBackend.*applicationid"
>>> - kill -9 pidofoneexecutor
>>>
>>> Be aware if it's a multi-node cluster check whether at least one process
>>> runs on a specific node(it's not required).
>>> Happy killing...
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Sun, Feb 10, 2019 at 4:19 PM Jörn Franke 
>>> wrote:
>>>
 yarn application -kill applicationid ?

 > Am 10.02.2019 um 13:30 schrieb Serega Sheypak <
 serega.shey...@gmail.com>:
 >
 > Hi there!
 > I have weird issue that appears only when tasks fail at specific
 stage. I would like to imitate failure on my own.
 > The plan is to run problematic app and then kill entire executor or
 some tasks when execution reaches certain stage.
 >
 > Is it do-able?

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org

The spark sql ODBC/JDBC driver that supports Kerbose delegation

Create Hive table from CSVfile

Re: Spark on YARN, HowTo kill executor or individual task?

Re: structured streaming handling validation and json flattening

Re: Spark on YARN, HowTo kill executor or individual task?

RE: Multiple column aggregations

Data growth vs Cluster Size planning

Re: Spark on YARN, HowTo kill executor or individual task?

Re: Spark on YARN, HowTo kill executor or individual task?

9 matches

Site Navigation

Mail list logo

Footer information