Re: How to pass a constant value to a partitioned hive table in spark

2020-04-19 Thread Mich Talebzadeh
Many thanks Ayan.

I tried that as well as follows:

val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

*val newDF = df.withColumn("broadcastId", lit(broadcastValue))*
So this column broadcastId  is a static partition in Hive table whereas the
other column brand is considered a dynamic partition

newDF.createOrReplaceTempView("tmp")
// Need to create and populate target Parquet table
michtest.BroadcastStaging
//
HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")

  var sqltext = """
  CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
 partyId STRING
   , phoneNumber STRING
  )
  PARTITIONED BY (
 broadcastId STRING
   , brand STRING
)
  STORED AS PARQUET
  """
  HiveContext.sql(sqltext)
  //
  // Put data in Hive table
  //
 // Dynamic partitioning is disabled by default. We turn it on
 //spark.sql("SET hive.exec.dynamic.partition = true")
 spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict")
 // spark.sql("SET hive.exec.max.dynamic.partitions.pernode = 400")

  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
  FROM tmp
  """

org.apache.spark.sql.catalyst.parser.ParseException:
missing STRING at ','(line 2, pos 85)

== SQL ==

  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand)
-^^^
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
   , brand
  FROM tmp

The thing is that if I pass (broadcastId = "123456789", brand) it works
with no problem!

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 16 Apr 2020 at 13:05, ayan guha  wrote:

> Hi Mitch
>
> Add it in the DF first
>
> from pyspark.sql.functions import lit
>
> df = df.withColumn('broadcastId, lit(broadcastValue))
>
> Then you will be able to access the column in the temp view
>
> Re: Partitioning, DataFrame.write also supports partitionBy clause and you
> can use it along with saveAsTable.
>
>
> On Thu, Apr 16, 2020 at 9:47 PM Mich Talebzadeh 
> wrote:
>
>> Thanks Zhang,
>>
>> That is not working. I need to send the value for variable
>> broadcastValue, it cannot interpret it.
>>
>>  scala>   sqltext = """
>>  |   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> (broadcastId = broadcastValue, brand = "dummy")
>>  |   SELECT
>>  |   ocis_party_id AS partyId
>>  | , target_mobile_no AS phoneNumber
>>  |   FROM tmp
>>  |   """
>> sqltext: String =
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> broadcastValue, brand = "dummy")
>>   SELECT
>>   ocis_party_id AS partyId
>> , target_mobile_no AS phoneNumber
>>   FROM tmp
>>
>>
>> *scala>   spark.sql(sqltext)*
>> org.apache.spark.sql.catalyst.parser.ParseException:
>> missing STRING at ','(line 2, pos 85)
>>
>> == SQL ==
>>
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> broadcastValue, brand = "dummy")
>>
>> -^^^
>>   SELECT
>>   ocis_party_id AS partyId
>> , target_mobile_no AS phoneNumber
>>   FROM tmp
>>
>>
>>   at
>> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
>>   at
>> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
>>   at
>> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>>   at
>> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
>>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
>>   ... 55 elided
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, 

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread ayan guha
Hi Mitch

Add it in the DF first

from pyspark.sql.functions import lit

df = df.withColumn('broadcastId, lit(broadcastValue))

Then you will be able to access the column in the temp view

Re: Partitioning, DataFrame.write also supports partitionBy clause and you
can use it along with saveAsTable.


On Thu, Apr 16, 2020 at 9:47 PM Mich Talebzadeh 
wrote:

> Thanks Zhang,
>
> That is not working. I need to send the value for variable
> broadcastValue, it cannot interpret it.
>
>  scala>   sqltext = """
>  |   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
> (broadcastId = broadcastValue, brand = "dummy")
>  |   SELECT
>  |   ocis_party_id AS partyId
>  | , target_mobile_no AS phoneNumber
>  |   FROM tmp
>  |   """
> sqltext: String =
>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> broadcastValue, brand = "dummy")
>   SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
>   FROM tmp
>
>
> *scala>   spark.sql(sqltext)*
> org.apache.spark.sql.catalyst.parser.ParseException:
> missing STRING at ','(line 2, pos 85)
>
> == SQL ==
>
>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> broadcastValue, brand = "dummy")
>
> -^^^
>   SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
>   FROM tmp
>
>
>   at
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
>   at
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>   at
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
>   ... 55 elided
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 16 Apr 2020 at 12:26, ZHANG Wei  wrote:
>
>> > scala>   spark.sql($sqltext)
>> > :41: error: not found: value $sqltext
>> >  spark.sql($sqltext)
>>  ^
>>  +-- should be Scala language
>>
>> Try this:
>>
>> scala> spark.sql(sqltext)
>>
>> --
>> Cheers,
>> -z
>>
>> On Thu, 16 Apr 2020 08:49:40 +0100
>> Mich Talebzadeh  wrote:
>>
>> > I have a variable to be passed to a column of partition as shown below
>> >
>> > *val broadcastValue = "123456789" * // I assume this will be sent as a
>> > constant for the batch
>> > // Create a DF on top of XML
>> >
>> > df.createOrReplaceTempView("tmp")
>> > // Need to create and populate target Parquet table
>> > michtest.BroadcastStaging
>> > //
>> > HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")
>> >
>> >   var sqltext = """
>> >   CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
>> >  partyId STRING
>> >, phoneNumber STRING
>> >   )
>> >   PARTITIONED BY (
>> >  broadcastId STRING
>> >, brand STRING)
>> >   STORED AS PARQUET
>> >   """
>> >   HiveContext.sql(sqltext)
>> >
>> > // Now insert the data from temp table
>> >   //
>> >   // Put data in Hive table
>> >   //
>> >  // Dynamic partitioning is disabled by default. We turn it on
>> >  spark.sql("SET hive.exec.dynamic.partition = true")
>> >  spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
>> >
>> >   sqltext = """
>> >
>> > *  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> > $broadcastValue, brand = "dummy")*  SELECT
>> >   ocis_party_id AS partyId
>> > , target_mobile_no AS phoneNumber
>> >   FROM tmp
>> >   """
>> >   spark.sql($sqltext)
>> >
>> >
>> > However, this does not work!
>> >
>> >
>> > scala>   sqltext = """
>> >  |   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> > (broadcastId = $broadcastValue, brand = "dummy")
>> >  |   SELECT
>> >  |   ocis_party_id AS partyId
>> >  | , target_mobile_no AS phoneNumber
>> >  |   FROM tmp
>> >  |   """
>> > sqltext: String =
>> >   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
>> > $broadcastValue, brand = "dummy")
>> >   SELECT
>> >   ocis_party_id AS partyId
>> > , target_mobile_no AS phoneNumber
>> >   FROM tmp
>> >
>> >
>> > scala>   spark.sql($sqltext)
>> > :41: error: not found: value $sqltext
>> >  

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh
Thanks Zhang,

That is not working. I need to send the value for variable  broadcastValue,
it cannot interpret it.

 scala>   sqltext = """
 |   INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId
= broadcastValue, brand = "dummy")
 |   SELECT
 |   ocis_party_id AS partyId
 | , target_mobile_no AS phoneNumber
 |   FROM tmp
 |   """
sqltext: String =
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand = "dummy")
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
  FROM tmp


*scala>   spark.sql(sqltext)*
org.apache.spark.sql.catalyst.parser.ParseException:
missing STRING at ','(line 2, pos 85)

== SQL ==

  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
broadcastValue, brand = "dummy")
-^^^
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
  FROM tmp


  at
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
  at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
  at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  ... 55 elided



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 16 Apr 2020 at 12:26, ZHANG Wei  wrote:

> > scala>   spark.sql($sqltext)
> > :41: error: not found: value $sqltext
> >  spark.sql($sqltext)
>  ^
>  +-- should be Scala language
>
> Try this:
>
> scala> spark.sql(sqltext)
>
> --
> Cheers,
> -z
>
> On Thu, 16 Apr 2020 08:49:40 +0100
> Mich Talebzadeh  wrote:
>
> > I have a variable to be passed to a column of partition as shown below
> >
> > *val broadcastValue = "123456789" * // I assume this will be sent as a
> > constant for the batch
> > // Create a DF on top of XML
> >
> > df.createOrReplaceTempView("tmp")
> > // Need to create and populate target Parquet table
> > michtest.BroadcastStaging
> > //
> > HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")
> >
> >   var sqltext = """
> >   CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
> >  partyId STRING
> >, phoneNumber STRING
> >   )
> >   PARTITIONED BY (
> >  broadcastId STRING
> >, brand STRING)
> >   STORED AS PARQUET
> >   """
> >   HiveContext.sql(sqltext)
> >
> > // Now insert the data from temp table
> >   //
> >   // Put data in Hive table
> >   //
> >  // Dynamic partitioning is disabled by default. We turn it on
> >  spark.sql("SET hive.exec.dynamic.partition = true")
> >  spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
> >
> >   sqltext = """
> >
> > *  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> > $broadcastValue, brand = "dummy")*  SELECT
> >   ocis_party_id AS partyId
> > , target_mobile_no AS phoneNumber
> >   FROM tmp
> >   """
> >   spark.sql($sqltext)
> >
> >
> > However, this does not work!
> >
> >
> > scala>   sqltext = """
> >  |   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION
> > (broadcastId = $broadcastValue, brand = "dummy")
> >  |   SELECT
> >  |   ocis_party_id AS partyId
> >  | , target_mobile_no AS phoneNumber
> >  |   FROM tmp
> >  |   """
> > sqltext: String =
> >   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> > $broadcastValue, brand = "dummy")
> >   SELECT
> >   ocis_party_id AS partyId
> > , target_mobile_no AS phoneNumber
> >   FROM tmp
> >
> >
> > scala>   spark.sql($sqltext)
> > :41: error: not found: value $sqltext
> >  spark.sql($sqltext)
> >
> >
> > Any ideas?
> >
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly 

Re: How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread ZHANG Wei
> scala>   spark.sql($sqltext)
> :41: error: not found: value $sqltext
>  spark.sql($sqltext)
 ^
 +-- should be Scala language

Try this:

scala> spark.sql(sqltext)

-- 
Cheers,
-z

On Thu, 16 Apr 2020 08:49:40 +0100
Mich Talebzadeh  wrote:

> I have a variable to be passed to a column of partition as shown below
> 
> *val broadcastValue = "123456789" * // I assume this will be sent as a
> constant for the batch
> // Create a DF on top of XML
> 
> df.createOrReplaceTempView("tmp")
> // Need to create and populate target Parquet table
> michtest.BroadcastStaging
> //
> HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")
> 
>   var sqltext = """
>   CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
>  partyId STRING
>, phoneNumber STRING
>   )
>   PARTITIONED BY (
>  broadcastId STRING
>, brand STRING)
>   STORED AS PARQUET
>   """
>   HiveContext.sql(sqltext)
> 
> // Now insert the data from temp table
>   //
>   // Put data in Hive table
>   //
>  // Dynamic partitioning is disabled by default. We turn it on
>  spark.sql("SET hive.exec.dynamic.partition = true")
>  spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
> 
>   sqltext = """
> 
> *  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> $broadcastValue, brand = "dummy")*  SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
>   FROM tmp
>   """
>   spark.sql($sqltext)
> 
> 
> However, this does not work!
> 
> 
> scala>   sqltext = """
>  |   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION
> (broadcastId = $broadcastValue, brand = "dummy")
>  |   SELECT
>  |   ocis_party_id AS partyId
>  | , target_mobile_no AS phoneNumber
>  |   FROM tmp
>  |   """
> sqltext: String =
>   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
> $broadcastValue, brand = "dummy")
>   SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
>   FROM tmp
> 
> 
> scala>   spark.sql($sqltext)
> :41: error: not found: value $sqltext
>  spark.sql($sqltext)
> 
> 
> Any ideas?
> 
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



How to pass a constant value to a partitioned hive table in spark

2020-04-16 Thread Mich Talebzadeh
I have a variable to be passed to a column of partition as shown below

*val broadcastValue = "123456789" * // I assume this will be sent as a
constant for the batch
// Create a DF on top of XML

df.createOrReplaceTempView("tmp")
// Need to create and populate target Parquet table
michtest.BroadcastStaging
//
HiveContext.sql("""DROP TABLE IF EXISTS michtest.BroadcastStaging""")

  var sqltext = """
  CREATE TABLE IF NOT EXISTS michtest.BroadcastStaging (
 partyId STRING
   , phoneNumber STRING
  )
  PARTITIONED BY (
 broadcastId STRING
   , brand STRING)
  STORED AS PARQUET
  """
  HiveContext.sql(sqltext)

// Now insert the data from temp table
  //
  // Put data in Hive table
  //
 // Dynamic partitioning is disabled by default. We turn it on
 spark.sql("SET hive.exec.dynamic.partition = true")
 spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")

  sqltext = """

*  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
$broadcastValue, brand = "dummy")*  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
  FROM tmp
  """
  spark.sql($sqltext)


However, this does not work!


scala>   sqltext = """
 |   $INSERT INTO TABLE michtest.BroadcastStaging PARTITION
(broadcastId = $broadcastValue, brand = "dummy")
 |   SELECT
 |   ocis_party_id AS partyId
 | , target_mobile_no AS phoneNumber
 |   FROM tmp
 |   """
sqltext: String =
  $INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastId =
$broadcastValue, brand = "dummy")
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
  FROM tmp


scala>   spark.sql($sqltext)
:41: error: not found: value $sqltext
 spark.sql($sqltext)


Any ideas?


Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.