Re: Hadoop Input Format - newAPIHadoopFile

2014-07-28 Thread chang cheng
Here is a tutorial on how to customize your own file format in hadoop:

https://developer.yahoo.com/hadoop/tutorial/module5.html#fileformat

and once you get your own file format, you can use it the same way as
TextInputFormat in spark as you have done in this post.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hadoop-Input-Format-newAPIHadoopFile-tp2860p10762.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Hadoop Input Format - newAPIHadoopFile

2014-03-19 Thread Pariksheet Barapatre
Thanks . it worked..

Very basic question, i have created  custominput format e.g. stock. How do
I refer this class as custom inputformat. I.e. where to keep this class on
linux folder. Do i need to add this jar if so how .
I am running code through spark-shell.

Thanks
Pari
On 19-Mar-2014 7:35 pm, "Shixiong Zhu"  wrote:

> The correct import statement is "import
> org.apache.hadoop.mapreduce.lib.input.TextInputFormat".
>
> Best Regards,
> Shixiong Zhu
>
>
> 2014-03-19 18:46 GMT+08:00 Pariksheet Barapatre :
>
>> Seems like import issue, ran with HadoopFile and it worked. Not getting
>> import statement for textInputFormat class location for new API.
>>
>> Can anybody help?
>>
>> Thanks
>> Pariksheet
>>
>>
>> On 19 March 2014 16:05, Bertrand Dechoux  wrote:
>>
>>> I don't know the Spark issue but the Hadoop context is clear.
>>>
>>> old api -> org.apache.hadoop.mapred
>>> new api -> org.apache.hadoop.mapreduce
>>>
>>> You might only need to change your import.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre <
>>> pbarapa...@gmail.com> wrote:
>>>
 Hi,

 Trying to read HDFS file with TextInputFormat.

 scala> import org.apache.hadoop.mapred.TextInputFormat
 scala> import org.apache.hadoop.io.{LongWritable, Text}
 scala> val file2 =
 sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")


 This is giving me the error.

 :14: error: type arguments
 [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat]
 conform to the bounds of none of the overloaded alternatives of
  value newAPIHadoopFile: [K, V, F <:
 org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass:
 Class[F], kClass: Class[K], vClass: Class[V], conf:
 org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] 
 [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path:
 String)(implicit km: scala.reflect.ClassTag[K], implicit vm:
 scala.reflect.ClassTag[V], implicit fm:
 scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
val file2 =
 sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")


 What is correct syntax if I want to use TextInputFormat.

 Also, how to use customInputFormat. Very silly question but I am not
 sure how and where to keep jar file containing customInputFormat class.

 Thanks
 Pariksheet



 --
 Cheers,
 Pari

>>>
>>>
>>
>>
>> --
>> Cheers,
>> Pari
>>
>
>


Re: Hadoop Input Format - newAPIHadoopFile

2014-03-19 Thread Shixiong Zhu
The correct import statement is "import
org.apache.hadoop.mapreduce.lib.input.TextInputFormat".

Best Regards,
Shixiong Zhu


2014-03-19 18:46 GMT+08:00 Pariksheet Barapatre :

> Seems like import issue, ran with HadoopFile and it worked. Not getting
> import statement for textInputFormat class location for new API.
>
> Can anybody help?
>
> Thanks
> Pariksheet
>
>
> On 19 March 2014 16:05, Bertrand Dechoux  wrote:
>
>> I don't know the Spark issue but the Hadoop context is clear.
>>
>> old api -> org.apache.hadoop.mapred
>> new api -> org.apache.hadoop.mapreduce
>>
>> You might only need to change your import.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre <
>> pbarapa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Trying to read HDFS file with TextInputFormat.
>>>
>>> scala> import org.apache.hadoop.mapred.TextInputFormat
>>> scala> import org.apache.hadoop.io.{LongWritable, Text}
>>> scala> val file2 =
>>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
>>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>>>
>>>
>>> This is giving me the error.
>>>
>>> :14: error: type arguments
>>> [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat]
>>> conform to the bounds of none of the overloaded alternatives of
>>>  value newAPIHadoopFile: [K, V, F <:
>>> org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass:
>>> Class[F], kClass: Class[K], vClass: Class[V], conf:
>>> org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] 
>>> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path:
>>> String)(implicit km: scala.reflect.ClassTag[K], implicit vm:
>>> scala.reflect.ClassTag[V], implicit fm:
>>> scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
>>>val file2 =
>>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
>>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>>>
>>>
>>> What is correct syntax if I want to use TextInputFormat.
>>>
>>> Also, how to use customInputFormat. Very silly question but I am not
>>> sure how and where to keep jar file containing customInputFormat class.
>>>
>>> Thanks
>>> Pariksheet
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Pari
>>>
>>
>>
>
>
> --
> Cheers,
> Pari
>


Re: Hadoop Input Format - newAPIHadoopFile

2014-03-19 Thread Pariksheet Barapatre
Seems like import issue, ran with HadoopFile and it worked. Not getting
import statement for textInputFormat class location for new API.

Can anybody help?

Thanks
Pariksheet


On 19 March 2014 16:05, Bertrand Dechoux  wrote:

> I don't know the Spark issue but the Hadoop context is clear.
>
> old api -> org.apache.hadoop.mapred
> new api -> org.apache.hadoop.mapreduce
>
> You might only need to change your import.
>
> Regards
>
> Bertrand
>
>
> On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre <
> pbarapa...@gmail.com> wrote:
>
>> Hi,
>>
>> Trying to read HDFS file with TextInputFormat.
>>
>> scala> import org.apache.hadoop.mapred.TextInputFormat
>> scala> import org.apache.hadoop.io.{LongWritable, Text}
>> scala> val file2 =
>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>>
>>
>> This is giving me the error.
>>
>> :14: error: type arguments
>> [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat]
>> conform to the bounds of none of the overloaded alternatives of
>>  value newAPIHadoopFile: [K, V, F <:
>> org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass:
>> Class[F], kClass: Class[K], vClass: Class[V], conf:
>> org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] 
>> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path:
>> String)(implicit km: scala.reflect.ClassTag[K], implicit vm:
>> scala.reflect.ClassTag[V], implicit fm:
>> scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
>>val file2 =
>> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
>> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>>
>>
>> What is correct syntax if I want to use TextInputFormat.
>>
>> Also, how to use customInputFormat. Very silly question but I am not sure
>> how and where to keep jar file containing customInputFormat class.
>>
>> Thanks
>> Pariksheet
>>
>>
>>
>> --
>> Cheers,
>> Pari
>>
>
>


-- 
Cheers,
Pari


Re: Hadoop Input Format - newAPIHadoopFile

2014-03-19 Thread Bertrand Dechoux
I don't know the Spark issue but the Hadoop context is clear.

old api -> org.apache.hadoop.mapred
new api -> org.apache.hadoop.mapreduce

You might only need to change your import.

Regards

Bertrand


On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre  wrote:

> Hi,
>
> Trying to read HDFS file with TextInputFormat.
>
> scala> import org.apache.hadoop.mapred.TextInputFormat
> scala> import org.apache.hadoop.io.{LongWritable, Text}
> scala> val file2 =
> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>
>
> This is giving me the error.
>
> :14: error: type arguments
> [org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat]
> conform to the bounds of none of the overloaded alternatives of
>  value newAPIHadoopFile: [K, V, F <:
> org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass:
> Class[F], kClass: Class[K], vClass: Class[V], conf:
> org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] 
> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path:
> String)(implicit km: scala.reflect.ClassTag[K], implicit vm:
> scala.reflect.ClassTag[V], implicit fm:
> scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
>val file2 =
> sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
> 192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")
>
>
> What is correct syntax if I want to use TextInputFormat.
>
> Also, how to use customInputFormat. Very silly question but I am not sure
> how and where to keep jar file containing customInputFormat class.
>
> Thanks
> Pariksheet
>
>
>
> --
> Cheers,
> Pari
>


Hadoop Input Format - newAPIHadoopFile

2014-03-19 Thread Pariksheet Barapatre
Hi,

Trying to read HDFS file with TextInputFormat.

scala> import org.apache.hadoop.mapred.TextInputFormat
scala> import org.apache.hadoop.io.{LongWritable, Text}
scala> val file2 =
sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")


This is giving me the error.

:14: error: type arguments
[org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat]
conform to the bounds of none of the overloaded alternatives of
 value newAPIHadoopFile: [K, V, F <:
org.apache.hadoop.mapreduce.InputFormat[K,V]](path: String, fClass:
Class[F], kClass: Class[K], vClass: Class[V], conf:
org.apache.hadoop.conf.Configuration)org.apache.spark.rdd.RDD[(K, V)] 
[K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](path:
String)(implicit km: scala.reflect.ClassTag[K], implicit vm:
scala.reflect.ClassTag[V], implicit fm:
scala.reflect.ClassTag[F])org.apache.spark.rdd.RDD[(K, V)]
   val file2 =
sc.newAPIHadoopFile[LongWritable,Text,TextInputFormat]("hdfs://
192.168.100.130:8020/user/hue/pig/examples/data/sonnets.txt")


What is correct syntax if I want to use TextInputFormat.

Also, how to use customInputFormat. Very silly question but I am not sure
how and where to keep jar file containing customInputFormat class.

Thanks
Pariksheet



-- 
Cheers,
Pari