Hello Kevin, You can take a look at our generic load function <https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions> .
For example, you can use val df = sqlContext.load("/myData", "parquet") To load a parquet dataset stored in "/myData" as a DataFrame <https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>. You can use it to load data stored in various formats, like json (Spark built-in), parquet (Spark built-in), avro <https://github.com/databricks/spark-avro>, and csv <https://github.com/databricks/spark-csv>. Thanks, Yin On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin <yun...@ebay.com> wrote: > Hi, Paul > > > > You are right. > > > > The story is that we have a lot of pig load function to load our different > data. > > > > And now we want to use spark to read and process these data. > > > > So we want to figure out a way to reuse our existing load function in > spark to read these data. > > > > Any idea? > > > > Best Regards, > > Kevin. > > > > *From:* Paul Brown [mailto:p...@mult.ifario.us] > *Sent:* 2015年3月24日 4:11 > *To:* Dai, Kevin > *Subject:* Re: Use pig load function in spark > > > > > > The answer is "Maybe, but you probably don't want to do that.". > > > > A typical Pig load function is devoted to bridging external data into > Pig's type system, but you don't really need to do that in Spark because it > is (thankfully) not encumbered by Pig's type system. What you probably > want to do is to figure out a way to use native Spark facilities (e.g., > textFile) coupled with some of the logic out of your Pig load function > necessary to turn your external data into an RDD. > > > > > — > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ > > > > On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yun...@ebay.com> wrote: > > Hi, all > > > > Can spark use pig’s load function to load data? > > > > Best Regards, > > Kevin. > > >