Re: Hadoop Input Format - newAPIHadoopFile
Here is a tutorial on how to customize your own file format in hadoop: https://developer.yahoo.com/hadoop/tutorial/module5.html#fileformat and once you get your own file format, you can use it the same way as TextInputFormat in spark as you have done in this post. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hadoop-Input-Format-newAPIHadoopFile-tp2860p10762.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Hadoop Input Format - newAPIHadoopFile
Thanks . it worked.. Very basic question, i have created custominput format e.g. stock. How do I refer this class as custom inputformat. I.e. where to keep this class on linux folder. Do i need to add this jar if so how . I am running code through spark-shell. Thanks Pari On 19-Mar-2014 7:35 pm, "Shixiong Zhu" wrote: > The correct import statement is "import > org.apache.hadoop.mapreduce.lib.input.TextInputFormat". > > Best Regards, > Shixiong Zhu > > > 2014-03-19 18:46 GMT+08:00 Pariksheet Barapatre : > >> Seems like import issue, ran with HadoopFile and it worked. Not getting >> import statement for textInputFormat class location for new API. >> >> Can anybody help? >> >> Thanks >> Pariksheet >> >> >> On 19 March 2014 16:05, Bertrand Dechoux wrote: >> >>> I don't know the Spark issue but the Hadoop context is clear. >>> >>> old api -> org.apache.hadoop.mapred >>> new api -> org.apache.hadoop.mapreduce >>> >>> You might only need to change your import. >>> >>> Regards >>> >>> Bertrand >>> >>> >>> On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre < >>> pbarapa...@gmail.com> wrote: >>> Hi, Trying to read HDFS file with TextInputFormat. scala> import org.apache.hadoop.mapred.TextInputFormat scala> import org.apache.hadoop.io.{LongWritable, Text} scala> val file2 = sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") This is giving me the error. :14: error: type arguments [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] conform to the bounds of none of the overloaded alternatives of value newAPIHadoopFile: [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] val file2 = sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") What is correct syntax if I want to use TextInputFormat. Also, how to use customInputFormat. Very silly question but I am not sure how and where to keep jar file containing customInputFormat class. Thanks Pariksheet -- Cheers, Pari >>> >>> >> >> >> -- >> Cheers, >> Pari >> > >
Re: Hadoop Input Format - newAPIHadoopFile
The correct import statement is "import org.apache.hadoop.mapreduce.lib.input.TextInputFormat". Best Regards, Shixiong Zhu 2014-03-19 18:46 GMT+08:00 Pariksheet Barapatre : > Seems like import issue, ran with HadoopFile and it worked. Not getting > import statement for textInputFormat class location for new API. > > Can anybody help? > > Thanks > Pariksheet > > > On 19 March 2014 16:05, Bertrand Dechoux wrote: > >> I don't know the Spark issue but the Hadoop context is clear. >> >> old api -> org.apache.hadoop.mapred >> new api -> org.apache.hadoop.mapreduce >> >> You might only need to change your import. >> >> Regards >> >> Bertrand >> >> >> On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre < >> pbarapa...@gmail.com> wrote: >> >>> Hi, >>> >>> Trying to read HDFS file with TextInputFormat. >>> >>> scala> import org.apache.hadoop.mapred.TextInputFormat >>> scala> import org.apache.hadoop.io.{LongWritable, Text} >>> scala> val file2 = >>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// >>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") >>> >>> >>> This is giving me the error. >>> >>> :14: error: type arguments >>> [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] >>> conform to the bounds of none of the overloaded alternatives of >>> value newAPIHadoopFile: [K, V, F <: >>> org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass: >>> Class[F], kClass: Class[K], vClass: Class[V], conf: >>> org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] >>> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: >>> String)(implicit km: scala.reflect.ClassTag[K], implicit vm: >>> scala.reflect.ClassTag[V], implicit fm: >>> scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] >>>val file2 = >>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// >>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") >>> >>> >>> What is correct syntax if I want to use TextInputFormat. >>> >>> Also, how to use customInputFormat. Very silly question but I am not >>> sure how and where to keep jar file containing customInputFormat class. >>> >>> Thanks >>> Pariksheet >>> >>> >>> >>> -- >>> Cheers, >>> Pari >>> >> >> > > > -- > Cheers, > Pari >
Re: Hadoop Input Format - newAPIHadoopFile
Seems like import issue, ran with HadoopFile and it worked. Not getting import statement for textInputFormat class location for new API. Can anybody help? Thanks Pariksheet On 19 March 2014 16:05, Bertrand Dechoux wrote: > I don't know the Spark issue but the Hadoop context is clear. > > old api -> org.apache.hadoop.mapred > new api -> org.apache.hadoop.mapreduce > > You might only need to change your import. > > Regards > > Bertrand > > > On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre < > pbarapa...@gmail.com> wrote: > >> Hi, >> >> Trying to read HDFS file with TextInputFormat. >> >> scala> import org.apache.hadoop.mapred.TextInputFormat >> scala> import org.apache.hadoop.io.{LongWritable, Text} >> scala> val file2 = >> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// >> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") >> >> >> This is giving me the error. >> >> :14: error: type arguments >> [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] >> conform to the bounds of none of the overloaded alternatives of >> value newAPIHadoopFile: [K, V, F <: >> org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass: >> Class[F], kClass: Class[K], vClass: Class[V], conf: >> org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] >> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: >> String)(implicit km: scala.reflect.ClassTag[K], implicit vm: >> scala.reflect.ClassTag[V], implicit fm: >> scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] >>val file2 = >> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// >> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") >> >> >> What is correct syntax if I want to use TextInputFormat. >> >> Also, how to use customInputFormat. Very silly question but I am not sure >> how and where to keep jar file containing customInputFormat class. >> >> Thanks >> Pariksheet >> >> >> >> -- >> Cheers, >> Pari >> > > -- Cheers, Pari
Re: Hadoop Input Format - newAPIHadoopFile
I don't know the Spark issue but the Hadoop context is clear. old api -> org.apache.hadoop.mapred new api -> org.apache.hadoop.mapreduce You might only need to change your import. Regards Bertrand On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre wrote: > Hi, > > Trying to read HDFS file with TextInputFormat. > > scala> import org.apache.hadoop.mapred.TextInputFormat > scala> import org.apache.hadoop.io.{LongWritable, Text} > scala> val file2 = > sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// > 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") > > > This is giving me the error. > > :14: error: type arguments > [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] > conform to the bounds of none of the overloaded alternatives of > value newAPIHadoopFile: [K, V, F <: > org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass: > Class[F], kClass: Class[K], vClass: Class[V], conf: > org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] > [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: > String)(implicit km: scala.reflect.ClassTag[K], implicit vm: > scala.reflect.ClassTag[V], implicit fm: > scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] >val file2 = > sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// > 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") > > > What is correct syntax if I want to use TextInputFormat. > > Also, how to use customInputFormat. Very silly question but I am not sure > how and where to keep jar file containing customInputFormat class. > > Thanks > Pariksheet > > > > -- > Cheers, > Pari >
Hadoop Input Format - newAPIHadoopFile
Hi, Trying to read HDFS file with TextInputFormat. scala> import org.apache.hadoop.mapred.TextInputFormat scala> import org.apache.hadoop.io.{LongWritable, Text} scala> val file2 = sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") This is giving me the error. :14: error: type arguments [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] conform to the bounds of none of the overloaded alternatives of value newAPIHadoopFile: [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass: Class[F], kClass: Class[K], vClass: Class[V], conf: org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String)(implicit km: scala.reflect.ClassTag[K], implicit vm: scala.reflect.ClassTag[V], implicit fm: scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)] val file2 = sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs:// 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt") What is correct syntax if I want to use TextInputFormat. Also, how to use customInputFormat. Very silly question but I am not sure how and where to keep jar file containing customInputFormat class. Thanks Pariksheet -- Cheers, Pari