Hot to filter the datatime in dataset with java code please?

2018-04-11 Thread 1427357...@qq.com
HI  all,

I want to filter the data by the datatime.
In mysql, the colume is the DATETIME type, named A. 
I write my code like:
import java.util.Date;
newX.filter(newX.col("A").isNull().or(newX.col("A").lt(new Date(.show();

I got error :
Exception in thread "main" java.lang.RuntimeException: Unsupported literal type 
class java.util.Date Wed Apr 11 16:17:31 CST 2018
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at scala.util.Try.getOrElse(Try.scala:79)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
at org.apache.spark.sql.functions$.lit(functions.scala:96)
at org.apache.spark.sql.Column.$less(Column.scala:384)
at org.apache.spark.sql.Column.lt(Column.scala:399)
at Main.main(Main.java:38)

How should I do to filter the datetime in dataset filter please?






1427357...@qq.com


Re: Re: how to use the sql join in java please

2018-04-11 Thread 1427357...@qq.com
Hi  yucai,

It works well now.
Thanks.



1427357...@qq.com
 
From: Yu, Yucai
Date: 2018-04-11 16:01
To: 1427357...@qq.com; spark?users
Subject: Re: how to use the sql join in java please
Do you really want to do a cartesian product on those two tables?
If yes, you can set spark.sql.crossJoin.enabled=true.
 
Thanks,
Yucai
 
From: "1427357...@qq.com" <1427357...@qq.com>
Date: Wednesday, April 11, 2018 at 3:16 PM
To: spark?users 
Subject: how to use the sql join in java please
 
Hi  all,
 
I write java code to join two table.
My code looks like:
 
SparkSession ss = 
SparkSession.builder().master("local[4]").appName("testSql").getOrCreate();

Properties properties = new Properties();
properties.put("user","A");
properties.put("password","B");
String url = 
"jdbc:mysql://xxx:/xxx?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull&serverTimezone=UTC";
Dataset data_busi_hour = ss.read().jdbc(url, "A", properties);
data_busi_hour.show();
//newemployee.printSchema();

Dataset t_pro_ware_partner_rela = ss.read().jdbc(url, "B", 
properties);

Dataset newX  = t_pro_ware_partner_rela.join(data_busi_hour);
newX.show();
 
I get a error  like below:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected 
cartesian product for INNER join between logical plans
Relation[ XXX   FIRST_ORG_ARCHNAME#80,... 11 more fields] 
JDBCRelation(t_pro_ware_partner_rela) [numPartitions=1]
and
Relation[id#0L,project_code#1,project_name#2] JDBCRelation(data_busi_hour) 
[numPartitions=1]
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;
at 
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1124)
at 
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
at 
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1121)
at 
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1103)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at 
org.apache.spark.sql.catalyst.

how to use the sql join in java please

2018-04-11 Thread 1427357...@qq.com
ion.sparkPlan(QueryExecution.scala:68)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2484)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2698)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
at org.apache.spark.sql.Dataset.show(Dataset.scala:691)



The table A and B don't have same column.
What can I do please?


QQ GROUP:296020884




1427357...@qq.com


Re: Re: the issue about the + in column,can we support the string please?

2018-04-01 Thread 1427357...@qq.com
Hi , 
I checked the code.
It seems it is hard to change the code.
Current code, string + int is translated to double + double.
If I change the the string + int to string + sting, it will incompatible whit 
old version.

Does anyone have better idea about this issue please?



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 17:17
To: 1427357...@qq.com
CC: spark users; dev
Subject: Re: Re: the issue about the + in column,can we support the string 
please?
I agree.

Just pointed out the option, in case you missed it.

Cheers,
Shmuel

On Mon, Mar 26, 2018 at 10:57 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi,

Using concat is one of the way.
But the + is more intuitive and easy to understand.



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 15:31
To: 1427357...@qq.com
CC: spark?users; dev
Subject: Re: the issue about the + in column,can we support the string please?
Hi,

you can get the same with:

import org.apache.spark.sql.functions._
import sqlContext.implicits._
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType} 

val schema = StructType(Array(StructField("name", StringType),
StructField("age", IntegerType) ))

val lst = List(Row("Shmuel", 13), Row("Blitz", 23))
val rdd = sc.parallelize(lst)

val df = sqlContext.createDataFrame(rdd,schema)

df.withColumn("newName", concat($"name" ,  lit("abc"))  ).show()

On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi  all,

I have a table like below:

+---+-+---+
| id| name|sharding_id|
+---+-+---+
|  1|leader us|  1|
|  3|mycat|  1|
+---+-+---+

My schema is :
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- sharding_id: integer (nullable = false)

I want add a new column named newName. The new column is based on "name" and 
append "abc" after it. My code looks like:

stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  ).show()
When I run the code, I got the reslult:
+---+-+---+---+
| id| name|sharding_id|newName|
+---+-+---+---+
|  1|leader us|  1|   null|
|  3|mycat|  1|   null|
+---+-+---+---+


I checked the code, the key code is  in arithmetic.scala. line 165.
It looks like:

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType 
match {
  case dt: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
  case ByteType | ShortType =>
defineCodeGen(ctx, ev,
  (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)")
  case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
  case _ =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
}

My issue is:
Can we add case StringType in this class to support string append please?





1427357...@qq.com



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com 



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com 


Re: Re: the issue about the + in column,can we support the string please?

2018-03-26 Thread 1427357...@qq.com
Hi,

Using concat is one of the way.
But the + is more intuitive and easy to understand.



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 15:31
To: 1427357...@qq.com
CC: spark?users; dev
Subject: Re: the issue about the + in column,can we support the string please?
Hi,

you can get the same with:

import org.apache.spark.sql.functions._
import sqlContext.implicits._
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType} 

val schema = StructType(Array(StructField("name", StringType),
StructField("age", IntegerType) ))

val lst = List(Row("Shmuel", 13), Row("Blitz", 23))
val rdd = sc.parallelize(lst)

val df = sqlContext.createDataFrame(rdd,schema)

df.withColumn("newName", concat($"name" ,  lit("abc"))  ).show()

On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi  all,

I have a table like below:

+---+-+---+
| id| name|sharding_id|
+---+-+---+
|  1|leader us|  1|
|  3|mycat|  1|
+---+-+---+

My schema is :
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- sharding_id: integer (nullable = false)

I want add a new column named newName. The new column is based on "name" and 
append "abc" after it. My code looks like:

stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  ).show()
When I run the code, I got the reslult:
+---+-+---+---+
| id| name|sharding_id|newName|
+---+-+---+---+
|  1|leader us|  1|   null|
|  3|mycat|  1|   null|
+---+-+---+---+


I checked the code, the key code is  in arithmetic.scala. line 165.
It looks like:

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType 
match {
  case dt: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
  case ByteType | ShortType =>
defineCodeGen(ctx, ev,
  (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)")
  case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
  case _ =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
}

My issue is:
Can we add case StringType in this class to support string append please?





1427357...@qq.com



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com 


the issue about the + in column,can we support the string please?

2018-03-25 Thread 1427357...@qq.com
Hi  all,

I have a table like below:

+---+-+---+
| id| name|sharding_id|
+---+-+---+
|  1|leader us|  1|
|  3|mycat|  1|
+---+-+---+

My schema is :
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- sharding_id: integer (nullable = false)

I want add a new column named newName. The new column is based on "name" and 
append "abc" after it. My code looks like:

stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  ).show()
When I run the code, I got the reslult:
+---+-+---+---+
| id| name|sharding_id|newName|
+---+-+---+---+
|  1|leader us|  1|   null|
|  3|mycat|  1|   null|
+---+-+---+---+


I checked the code, the key code is  in arithmetic.scala. line 165.
It looks like:

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType 
match {
  case dt: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
  case ByteType | ShortType =>
defineCodeGen(ctx, ev,
  (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)")
  case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
  case _ =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
}

My issue is:
Can we add case StringType in this class to support string append please?





1427357...@qq.com


the meaining of "samplePointsPerPartitionHint" in RangePartitioner

2018-03-20 Thread 1427357...@qq.com
HI  all,

The belowing is the code of RangePartitioner.
class RangePartitioner[K : Ordering : ClassTag, V](
partitions: Int,
rdd: RDD[_ <: Product2[K, V]],
private var ascending: Boolean = true,
val samplePointsPerPartitionHint: Int = 20)
I feel puzzled about the samplePointsPerPartitionHint.
My issue is :
what is the samplePointsPerPartitionHint used for please?
If I set samplePointsPerPartitionHint as 100 or 20,what will happed please?

Thanks.

Robin Shao




1427357...@qq.com