PIG and Spark actually similar?

Yang Sun, 19 Jul 2015 14:03:38 -0700

Spark is very hot now, but after reading the paper, I found it surprisingly
similar to PIG's concept: the RDD is just Relation/set in PIG's
terminology.


I think a great strength of Spark is that it tries to merge multiple
"narrow dependency" stages together to avoid too much IO. does PIG do that
too? otherwise, I can't figure out what other major design differences
would lead to huge performance difference, if Spark also uses on-disk
storage. The overhead to start a MR task should not be that big.

PIG and Spark actually similar?

Reply via email to