subject:"Reading 2 table data in MapReduce for Performing Join"

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-27 Thread Suraj Nayak

This is solved. Used Writable instead of LongWritable or NullWritable in
Mapper input key type.

Thanks
Suraj Nayak
On 19-Mar-2015 9:48 PM, Suraj Nayak snay...@gmail.com wrote:

 Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is
 there a workaround?

 On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 I was successfully able to integrate HCatMultipleInputs with the patch
 for the tables created with TEXTFILE. But I get error when I read table
 created with ORC file. The error is below :

 15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
 attempt_1425012118520_9756_m_00_0, Status : FAILED
 Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
 cannot be cast to org.apache.hadoop.io.LongWritable
 at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


 Can anyone help?

 Thanks in advance!

 On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 https://issues.apache.org/jira/browse/HIVE-4997 patch helped!


 On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables.
 I did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code
 join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can
 someone suggest how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-19 Thread Suraj Nayak

Hi All,

I was successfully able to integrate HCatMultipleInputs with the patch for
the tables created with TEXTFILE. But I get error when I read table created
with ORC file. The error is below :

15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
attempt_1425012118520_9756_m_00_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
cannot be cast to org.apache.hadoop.io.LongWritable
at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Can anyone help?

Thanks in advance!

On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 https://issues.apache.org/jira/browse/HIVE-4997 patch helped!


 On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables. I
 did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone
 suggest how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-19 Thread Suraj Nayak

Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is
there a workaround?

On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 I was successfully able to integrate HCatMultipleInputs with the patch for
 the tables created with TEXTFILE. But I get error when I read table created
 with ORC file. The error is below :

 15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
 attempt_1425012118520_9756_m_00_0, Status : FAILED
 Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
 cannot be cast to org.apache.hadoop.io.LongWritable
 at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


 Can anyone help?

 Thanks in advance!

 On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote:

 Hi All,

 https://issues.apache.org/jira/browse/HIVE-4997 patch helped!


 On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables.
 I did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone
 suggest how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

2015-03-18 Thread Suraj Nayak

Hi All,

https://issues.apache.org/jira/browse/HIVE-4997 patch helped!

On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote:

 Hi,

 I tried reading data via HCatalog for 1 Hive table in MapReduce using
 something similar to
 https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
 I was able to read successfully.

 Now am trying to read 2 tables, as the requirement is to join 2 tables. I
 did not find API similar to *FileInputFormat.addInputPaths* in
 *HCatInputFormat*. What is the equivalent of the same in HCat ?

 I had performed join using FilesInputFormat in HDFS(by getting split
 information in mapper). This article(
 http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
 http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest
 how I can perform join operation using HCatalog ?

 Briefly, the aim is to

- Read 2 tables (almost similar schema)
- If key exists in both the table send it to same reducer.
- Do some processing on the records in reducer.
- Save the output into file/Hive table.

 *P.S : The reason for using MapReduce to perform join is because of
 complex requirement which can't be solved via Hive/Pig directly. *

 Any help will be greatly appreciated :)

 --
 Thanks
 Suraj Nayak M




-- 
Thanks
Suraj Nayak M

Reading 2 table data in MapReduce for Performing Join

2015-03-16 Thread Suraj Nayak

Hi,

I tried reading data via HCatalog for 1 Hive table in MapReduce using
something similar to
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
I was able to read successfully.

Now am trying to read 2 tables, as the requirement is to join 2 tables. I
did not find API similar to *FileInputFormat.addInputPaths* in
*HCatInputFormat*. What is the equivalent of the same in HCat ?

I had performed join using FilesInputFormat in HDFS(by getting split
information in mapper). This article(
http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest
how I can perform join operation using HCatalog ?

Briefly, the aim is to

   - Read 2 tables (almost similar schema)
   - If key exists in both the table send it to same reducer.
   - Do some processing on the records in reducer.
   - Save the output into file/Hive table.

*P.S : The reason for using MapReduce to perform join is because of complex
requirement which can't be solved via Hive/Pig directly. *

Any help will be greatly appreciated :)

-- 
Thanks  Regards
Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

Re: Reading 2 table data in MapReduce for Performing Join

Re: Reading 2 table data in MapReduce for Performing Join

Re: Reading 2 table data in MapReduce for Performing Join

Reading 2 table data in MapReduce for Performing Join

5 matches

Site Navigation

Mail list logo

Footer information