Re: Reading 2 table data in MapReduce for Performing Join
This is solved. Used Writable instead of LongWritable or NullWritable in Mapper input key type. Thanks Suraj Nayak On 19-Mar-2015 9:48 PM, Suraj Nayak snay...@gmail.com wrote: Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is there a workaround? On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, I was successfully able to integrate HCatMultipleInputs with the patch for the tables created with TEXTFILE. But I get error when I read table created with ORC file. The error is below : 15/03/19 10:51:32 INFO mapreduce.Job: Task Id : attempt_1425012118520_9756_m_00_0, Status : FAILED Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Can anyone help? Thanks in advance! On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
Re: Reading 2 table data in MapReduce for Performing Join
Hi All, I was successfully able to integrate HCatMultipleInputs with the patch for the tables created with TEXTFILE. But I get error when I read table created with ORC file. The error is below : 15/03/19 10:51:32 INFO mapreduce.Job: Task Id : attempt_1425012118520_9756_m_00_0, Status : FAILED Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Can anyone help? Thanks in advance! On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
Re: Reading 2 table data in MapReduce for Performing Join
Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is there a workaround? On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, I was successfully able to integrate HCatMultipleInputs with the patch for the tables created with TEXTFILE. But I get error when I read table created with ORC file. The error is below : 15/03/19 10:51:32 INFO mapreduce.Job: Task Id : attempt_1425012118520_9756_m_00_0, Status : FAILED Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be cast to org.apache.hadoop.io.LongWritable at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Can anyone help? Thanks in advance! On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak snay...@gmail.com wrote: Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
Re: Reading 2 table data in MapReduce for Performing Join
Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
Reading 2 table data in MapReduce for Performing Join
Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Regards Suraj Nayak M