Re: ORC Transaction Table - Spark
Are there any plans to include it in the future releases of Spark ? Regards, Aviral Agarwal On Thu, Aug 24, 2017 at 3:11 PM, Akhil Das wrote: > How are you reading the data? Its clearly saying > *java.lang.NumberFormatException: > For input string: "0645253_0001" * > > On Tue, Aug 22, 2017 at 7:40 PM, Aviral Agarwal > wrote: > >> Hi, >> >> I am trying to read hive orc transaction table through Spark but I am >> getting the following error >> >> Caused by: java.lang.RuntimeException: serious problem >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli >> tsInfo(OrcInputFormat.java:1021) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(Or >> cInputFormat.java:1048) >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) >> . >> Caused by: java.util.concurrent.ExecutionException: >> java.lang.NumberFormatException: For input string: "0645253_0001" >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> at java.util.concurrent.FutureTask.get(FutureTask.java:192) >> at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSpli >> tsInfo(OrcInputFormat.java:998) >> ... 118 more >> >> Any help would be appreciated. >> >> Thanks and Regards, >> Aviral Agarwal >> >> > > > -- > Cheers! > >
Fwd: ORC Transaction Table - Spark
Hi, I am trying to read hive orc transaction table through Spark but I am getting the following error Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo( OrcInputFormat.java:1021) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits( OrcInputFormat.java:1048) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) . Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0645253_0001" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo( OrcInputFormat.java:998) ... 118 more Any help would be appreciated. Thanks and Regards, Aviral Agarwal
Re: JDBC RDD Timestamp Parsing Issue
This woks. Thanks ! - Aviral Agarwal On Wed, Jun 21, 2017 at 6:07 PM, Eduardo Mello wrote: > You can add "?zeroDateTimeBehavior=convertToNull" to the connection > string. > > On Wed, Jun 21, 2017 at 9:04 AM, Aviral Agarwal > wrote: > >> The exception is happening in JDBC RDD code where getNext() is called to >> get the next row. >> I do not have access to the result set. I am operating on a DataFrame. >> >> Thanks and Regards, >> Aviral Agarwal >> >> On Jun 21, 2017 17:19, "Mahesh Sawaiker" >> wrote: >> >>> This has to do with how you are creating the timestamp object from the >>> resultset ( I guess). >>> >>> If you can provide more code it will help, but you could surround the >>> parsing code with a try catch and then just ignore the exception. >>> >>> >>> >>> *From:* Aviral Agarwal [mailto:aviral12...@gmail.com] >>> *Sent:* Wednesday, June 21, 2017 2:37 PM >>> *To:* user@spark.apache.org >>> *Subject:* JDBC RDD Timestamp Parsing Issue >>> >>> >>> >>> Hi, >>> >>> >>> >>> I am using JDBC RDD to read from a MySQL RDBMS. >>> >>> My spark job fails with the below error : >>> >>> >>> >>> java.sql.SQLException: Value '-00-00 00:00:00.000' can not be >>> represented as java.sql.Timestamp >>> >>> >>> >>> Now instead of the whole job failing I want to skip this record and >>> continue processing the rest. >>> Any leads on how that can be done ? >>> >>> >>> Thanks and Regards, >>> Aviral Agarwal >>> DISCLAIMER >>> == >>> This e-mail may contain privileged and confidential information which is >>> the property of Persistent Systems Ltd. It is intended only for the use of >>> the individual or entity to which it is addressed. If you are not the >>> intended recipient, you are not authorized to read, retain, copy, print, >>> distribute or use this message. If you have received this communication in >>> error, please notify the sender and delete all copies of this message. >>> Persistent Systems Ltd. does not accept any liability for virus infected >>> mails. >>> >> >
RE: JDBC RDD Timestamp Parsing Issue
The exception is happening in JDBC RDD code where getNext() is called to get the next row. I do not have access to the result set. I am operating on a DataFrame. Thanks and Regards, Aviral Agarwal On Jun 21, 2017 17:19, "Mahesh Sawaiker" wrote: > This has to do with how you are creating the timestamp object from the > resultset ( I guess). > > If you can provide more code it will help, but you could surround the > parsing code with a try catch and then just ignore the exception. > > > > *From:* Aviral Agarwal [mailto:aviral12...@gmail.com] > *Sent:* Wednesday, June 21, 2017 2:37 PM > *To:* user@spark.apache.org > *Subject:* JDBC RDD Timestamp Parsing Issue > > > > Hi, > > > > I am using JDBC RDD to read from a MySQL RDBMS. > > My spark job fails with the below error : > > > > java.sql.SQLException: Value '-00-00 00:00:00.000' can not be represented > as java.sql.Timestamp > > > > Now instead of the whole job failing I want to skip this record and > continue processing the rest. > Any leads on how that can be done ? > > > Thanks and Regards, > Aviral Agarwal > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is > the property of Persistent Systems Ltd. It is intended only for the use of > the individual or entity to which it is addressed. If you are not the > intended recipient, you are not authorized to read, retain, copy, print, > distribute or use this message. If you have received this communication in > error, please notify the sender and delete all copies of this message. > Persistent Systems Ltd. does not accept any liability for virus infected > mails. >
JDBC RDD Timestamp Parsing Issue
Hi, I am using JDBC RDD to read from a MySQL RDBMS. My spark job fails with the below error : java.sql.SQLException: Value '-00-00 00:00:00.000' can not be represented as java.sql.Timestamp Now instead of the whole job failing I want to skip this record and continue processing the rest. Any leads on how that can be done ? Thanks and Regards, Aviral Agarwal
[SparkSQL] Project using NamedExpression
Hi guys, I want transform Row using NamedExpression. Below is the code snipped that I am using : def apply(dataFrame: DataFrame, selectExpressions: java.util.List[String]): RDD[UnsafeRow] = { val exprArray = selectExpressions.map(s => Column(SqlParser.parseExpression(s)).named ) val inputSchema = dataFrame.logicalPlan.output val transformedRDD = dataFrame.mapPartitions( iter => { val project = UnsafeProjection.create(exprArray,inputSchema) iter.map{ row => project(InternalRow.fromSeq(row.toSeq)) } }) transformedRDD } The problem is that expression becomes unevaluable : Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: 'a at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.genCode(Expression.scala:233) at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.genCode(unresolved.scala:53) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.scala:106) at org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$gen$2.apply(Expression.scala:102) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.expressions.Expression.gen(Expression.scala:102) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenContext$$anonfun$generateExpressions$1.apply(CodeGenerator.scala:464) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenContext$$anonfun$generateExpressions$1.apply(CodeGenerator.scala:464) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenContext.generateExpressions(CodeGenerator.scala:464) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.createCode(GenerateUnsafeProjection.scala:281) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:324) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:317) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:635) at org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:125) at org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:135) at org.apache.spark.sql.ScalaTransform$$anonfun$3.apply(ScalaTransform.scala:31) at org.apache.spark.sql.ScalaTransform$$anonfun$3.apply(ScalaTransform.scala:30) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This might be because the Expression is unresolved. Any help would be appreciated. Thanks and Regards, Aviral Agarwal
Spark SQL Skip and Log bad records
Hi guys, Is there a way to skip some bad records and log them when using DataFrame API ? Thanks and Regards, Aviral Agarwal
Mismatched datatype in Case statement
Hi, I was trying Spark version 1.6.0 when I ran into the error mentioned in the following Hive JIRA. https://issues.apache.org/jira/browse/HIVE-5825 This error was there in both cases : either using SQLContext or HiveContext. Any indication if this has been fixed in a higher spark version ? If yes, which version ? Thanks and Regards, Aviral Agarwal