Hot to filter the datatime in dataset with java code please?
HI all, I want to filter the data by the datatime. In mysql, the colume is the DATETIME type, named A. I write my code like: import java.util.Date; newX.filter(newX.col("A").isNull().or(newX.col("A").lt(new Date(.show(); I got error : Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class java.util.Date Wed Apr 11 16:17:31 CST 2018 at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162) at org.apache.spark.sql.functions$.typedLit(functions.scala:113) at org.apache.spark.sql.functions$.lit(functions.scala:96) at org.apache.spark.sql.Column.$less(Column.scala:384) at org.apache.spark.sql.Column.lt(Column.scala:399) at Main.main(Main.java:38) How should I do to filter the datetime in dataset filter please? 1427357...@qq.com
Re: Re: how to use the sql join in java please
Hi yucai, It works well now. Thanks. 1427357...@qq.com From: Yu, Yucai Date: 2018-04-11 16:01 To: 1427357...@qq.com; spark?users Subject: Re: how to use the sql join in java please Do you really want to do a cartesian product on those two tables? If yes, you can set spark.sql.crossJoin.enabled=true. Thanks, Yucai From: "1427357...@qq.com" <1427357...@qq.com> Date: Wednesday, April 11, 2018 at 3:16 PM To: spark?users Subject: how to use the sql join in java please Hi all, I write java code to join two table. My code looks like: SparkSession ss = SparkSession.builder().master("local[4]").appName("testSql").getOrCreate(); Properties properties = new Properties(); properties.put("user","A"); properties.put("password","B"); String url = "jdbc:mysql://xxx:/xxx?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull&serverTimezone=UTC"; Dataset data_busi_hour = ss.read().jdbc(url, "A", properties); data_busi_hour.show(); //newemployee.printSchema(); Dataset t_pro_ware_partner_rela = ss.read().jdbc(url, "B", properties); Dataset newX = t_pro_ware_partner_rela.join(data_busi_hour); newX.show(); I get a error like below: Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans Relation[ XXX FIRST_ORG_ARCHNAME#80,... 11 more fields] JDBCRelation(t_pro_ware_partner_rela) [numPartitions=1] and Relation[id#0L,project_code#1,project_name#2] JDBCRelation(data_busi_hour) [numPartitions=1] Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.; at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1124) at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1121) at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1103) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84) at org.apache.spark.sql.catalyst.
how to use the sql join in java please
ion.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248) at org.apache.spark.sql.Dataset.head(Dataset.scala:2484) at org.apache.spark.sql.Dataset.take(Dataset.scala:2698) at org.apache.spark.sql.Dataset.showString(Dataset.scala:254) at org.apache.spark.sql.Dataset.show(Dataset.scala:723) at org.apache.spark.sql.Dataset.show(Dataset.scala:682) at org.apache.spark.sql.Dataset.show(Dataset.scala:691) The table A and B don't have same column. What can I do please? QQ GROUP:296020884 1427357...@qq.com
Re: Re: the issue about the + in column,can we support the string please?
Hi , I checked the code. It seems it is hard to change the code. Current code, string + int is translated to double + double. If I change the the string + int to string + sting, it will incompatible whit old version. Does anyone have better idea about this issue please? 1427357...@qq.com From: Shmuel Blitz Date: 2018-03-26 17:17 To: 1427357...@qq.com CC: spark users; dev Subject: Re: Re: the issue about the + in column,can we support the string please? I agree. Just pointed out the option, in case you missed it. Cheers, Shmuel On Mon, Mar 26, 2018 at 10:57 AM, 1427357...@qq.com <1427357...@qq.com> wrote: Hi, Using concat is one of the way. But the + is more intuitive and easy to understand. 1427357...@qq.com From: Shmuel Blitz Date: 2018-03-26 15:31 To: 1427357...@qq.com CC: spark?users; dev Subject: Re: the issue about the + in column,can we support the string please? Hi, you can get the same with: import org.apache.spark.sql.functions._ import sqlContext.implicits._ import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} val schema = StructType(Array(StructField("name", StringType), StructField("age", IntegerType) )) val lst = List(Row("Shmuel", 13), Row("Blitz", 23)) val rdd = sc.parallelize(lst) val df = sqlContext.createDataFrame(rdd,schema) df.withColumn("newName", concat($"name" , lit("abc")) ).show() On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote: Hi all, I have a table like below: +---+-+---+ | id| name|sharding_id| +---+-+---+ | 1|leader us| 1| | 3|mycat| 1| +---+-+---+ My schema is : root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- sharding_id: integer (nullable = false) I want add a new column named newName. The new column is based on "name" and append "abc" after it. My code looks like: stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") + "abc" ).show() When I run the code, I got the reslult: +---+-+---+---+ | id| name|sharding_id|newName| +---+-+---+---+ | 1|leader us| 1| null| | 3|mycat| 1| null| +---+-+---+---+ I checked the code, the key code is in arithmetic.scala. line 165. It looks like: override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case dt: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)") case ByteType | ShortType => defineCodeGen(ctx, ev, (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)") case _ => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2") } My issue is: Can we add case StringType in this class to support string append please? 1427357...@qq.com -- Shmuel Blitz Big Data Developer Email: shmuel.bl...@similarweb.com www.similarweb.com -- Shmuel Blitz Big Data Developer Email: shmuel.bl...@similarweb.com www.similarweb.com
Re: Re: the issue about the + in column,can we support the string please?
Hi, Using concat is one of the way. But the + is more intuitive and easy to understand. 1427357...@qq.com From: Shmuel Blitz Date: 2018-03-26 15:31 To: 1427357...@qq.com CC: spark?users; dev Subject: Re: the issue about the + in column,can we support the string please? Hi, you can get the same with: import org.apache.spark.sql.functions._ import sqlContext.implicits._ import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} val schema = StructType(Array(StructField("name", StringType), StructField("age", IntegerType) )) val lst = List(Row("Shmuel", 13), Row("Blitz", 23)) val rdd = sc.parallelize(lst) val df = sqlContext.createDataFrame(rdd,schema) df.withColumn("newName", concat($"name" , lit("abc")) ).show() On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote: Hi all, I have a table like below: +---+-+---+ | id| name|sharding_id| +---+-+---+ | 1|leader us| 1| | 3|mycat| 1| +---+-+---+ My schema is : root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- sharding_id: integer (nullable = false) I want add a new column named newName. The new column is based on "name" and append "abc" after it. My code looks like: stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") + "abc" ).show() When I run the code, I got the reslult: +---+-+---+---+ | id| name|sharding_id|newName| +---+-+---+---+ | 1|leader us| 1| null| | 3|mycat| 1| null| +---+-+---+---+ I checked the code, the key code is in arithmetic.scala. line 165. It looks like: override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case dt: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)") case ByteType | ShortType => defineCodeGen(ctx, ev, (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)") case _ => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2") } My issue is: Can we add case StringType in this class to support string append please? 1427357...@qq.com -- Shmuel Blitz Big Data Developer Email: shmuel.bl...@similarweb.com www.similarweb.com
the issue about the + in column,can we support the string please?
Hi all, I have a table like below: +---+-+---+ | id| name|sharding_id| +---+-+---+ | 1|leader us| 1| | 3|mycat| 1| +---+-+---+ My schema is : root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- sharding_id: integer (nullable = false) I want add a new column named newName. The new column is based on "name" and append "abc" after it. My code looks like: stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") + "abc" ).show() When I run the code, I got the reslult: +---+-+---+---+ | id| name|sharding_id|newName| +---+-+---+---+ | 1|leader us| 1| null| | 3|mycat| 1| null| +---+-+---+---+ I checked the code, the key code is in arithmetic.scala. line 165. It looks like: override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case dt: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)") case ByteType | ShortType => defineCodeGen(ctx, ev, (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)") case _ => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2") } My issue is: Can we add case StringType in this class to support string append please? 1427357...@qq.com
the meaining of "samplePointsPerPartitionHint" in RangePartitioner
HI all, The belowing is the code of RangePartitioner. class RangePartitioner[K : Ordering : ClassTag, V]( partitions: Int, rdd: RDD[_ <: Product2[K, V]], private var ascending: Boolean = true, val samplePointsPerPartitionHint: Int = 20) I feel puzzled about the samplePointsPerPartitionHint. My issue is : what is the samplePointsPerPartitionHint used for please? If I set samplePointsPerPartitionHint as 100 or 20,what will happed please? Thanks. Robin Shao 1427357...@qq.com