Can not find usage of classTag variable defined in abstract class AtomicType in spark project

2016-08-15 Thread Andy Zhao
When I read spark source code, I found an abstract class AtomicType. It's
defined like this:

protected[sql] abstract class AtomicType extends DataType {
  private[sql] type InternalType
  private[sql] val tag: TypeTag[InternalType]
  private[sql] val ordering: Ordering[InternalType]

  @transient private[sql] val classTag = ScalaReflectionLock.synchronized {
  val mirror = runtimeMirror(Utils.getSparkClassLoader)
  ClassTag[InternalType](mirror.runtimeClass(tag.tpe))
 }
}
I understand that classTag is used to get runtime type information of type
member InternelType, but I can not find how this classTag is used in
subclasses of AtomicType. And I also cannot find any reference to this
variable in spark project.

Can anyone tell me how this classTag is used in spark project?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-find-usage-of-classTag-variable-defined-in-abstract-class-AtomicType-in-spark-project-tp27534.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark streaming lost data when ReceiverTracker writes Blockinfo to hdfs timeout

2016-07-26 Thread Andy Zhao
Hi guys, 

I wrote a spark streaming program which consume 1000 messages from one
topic of Kafka, did some transformation, and wrote the result back to
another topic. But only found 988 messages in the second topic. I checked
log info and confirmed all messages was received by receivers. But I found a
hdfs writing time out message printed from Class BatchedWriteAheadLog. 

I checkout source code and found code like this: 
  
/** Add received block. This event will get written to the write ahead
log (if enabled). */ 
  def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = { 
try { 
  val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo)) 
  if (writeResult) { 
synchronized { 
  getReceivedBlockQueue(receivedBlockInfo.streamId) +=
receivedBlockInfo 
} 
logDebug(s"Stream ${receivedBlockInfo.streamId} received " + 
  s"block ${receivedBlockInfo.blockStoreResult.blockId}") 
  } else { 
logDebug(s"Failed to acknowledge stream
${receivedBlockInfo.streamId} receiving " + 
  s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write
Ahead Log.") 
  } 
  writeResult 
} catch { 
  case NonFatal(e) => 
logError(s"Error adding block $receivedBlockInfo", e) 
false 
} 
  } 


It seems that ReceiverTracker tries to write block info to hdfs, but the
write operation time out, this cause writeToLog function return false, and 
this code "getReceivedBlockQueue(receivedBlockInfo.streamId) +=
receivedBlockInfo" is skipped. so the block info is lost. 

   The spark version I use is 1.6.1 and I did not turn on
spark.streaming.receiver.writeAheadLog.enable. 

   I want to know whether or not this is a designed behaviour. 

Thanks
  




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-lost-data-when-ReceiverTracker-writes-Blockinfo-to-hdfs-timeout-tp27410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



spark sql OOM

2015-10-14 Thread Andy Zhao
Hi guys,

I'm testing sparkSql 1.5.1, and I use hadoop-2.5.0-cdh5.3.2.
One sql which can ran successfully using hive failed when I ran it using
sparkSql.
I got the following errno:

 

I read the source code, It seems that the compute method of HadoopRDD is
called infinite times, every time it got called, some new instance need to
be allocated on the heap and finally OOM.

Does anyone have same problem?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-OOM-tp25059.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark sql OOM

2015-10-14 Thread Andy Zhao
Hi guys, 

I'm testing sparkSql 1.5.1, and I use hadoop-2.5.0-cdh5.3.2. 
One sql which can ran successfully using hive failed when I ran it using
sparkSql. 
I got the following errno: 


 

I read the source code, It seems that the compute method of HadoopRDD is
called infinite times, every time it got called, some new instance need to
be allocated on the heap and finally OOM. 

Does anyone have same problem? 

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-OOM-tp25060.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark sql OOM

2015-10-14 Thread Andy Zhao
I increased executor memory from 6g to 10g, but it still failed and report
the same error and because of my company security policy, I cannot write the
sql out. But I'm sure that this error occurred in the compute method of
HadoopRDD, and this error happened in one of executors.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-OOM-tp25060p25061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Job hang when running random forest

2015-07-28 Thread Andy Zhao
Hi guys,

A job hanged about 16 hours when I run random forest algorithm, I don't know
why that happened.
I use spark 1.4.0 on yarn and here is the code 
http://apache-spark-user-list.1001560.n3.nabble.com/file/n24047/1.png 

and following picture is from spark ui
http://apache-spark-user-list.1001560.n3.nabble.com/file/n24047/2.png 
Can anybody help?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Job-hang-when-running-random-forest-tp24047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Job hang when running random forest

2015-07-28 Thread Andy Zhao
Hi guys

When I run random forest algorithm, a job hanged for 15.8h, I can not figure
out why that happened.
Here is the code 
http://apache-spark-user-list.1001560.n3.nabble.com/file/n24046/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7_2015-07-29_%E4%B8%8A%E5%8D%8810.png
 
And I use spark 1.4.0 on yarn
http://apache-spark-user-list.1001560.n3.nabble.com/file/n24046/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7_2015-07-29_%E4%B8%8A%E5%8D%8810.png
 

Can anyone help?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Job-hang-when-running-random-forest-tp24046.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How can I add my custom Rule to spark sql?

2015-04-13 Thread Andy Zhao
Hi guys,

I want to add my custom Rules(whatever the rule is) when the sql Logical
Plan is being analysed.
Is there a way to do that in the spark application code?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-add-my-custom-Rule-to-spark-sql-tp22485.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org