hadoop.mapred.join.Parser does not work with KeyValueTextInputFormat

2008-05-05 Thread Haijun Cao
Hi, Chris, Thanks for adding the map side join feature (http://issues.apache.org/jira/browse/HADOOP-2085) I tried the join example with KeyValueTextInputFormat as input format, but got following exception: java.lang.NullPointerException at org.apache.hadoop.mapred.KeyValu

Re: Groovy Scripting for Hadoop

2008-05-05 Thread Ted Dunning
Chris, I have been meaning to write to you. Have you seen my grool system which allows simple MR programs to be written simply? I have been thinking for some time that it would make a good match with Cascades. In addition, I have been working on a layer over Zookeeper to handle collection of d

Groovy Scripting for Hadoop

2008-05-05 Thread Chris K Wensel
Hey all Just wanted to let interested parties know we just released 0.1.0 of our Groovy 'builder' extension. We think this will be a great tool for those groups that need to expose Hadoop to the 'casual' user who needs to get and manipulate valuable data on a Hadoop cluster, but doesn't h

Re: {start,stop}-balancer.sh paths (HADOOP-2930)

2008-05-05 Thread Hairong Kuang
I got it committed. Thank you, Spiros! Hairong On 5/4/08 10:52 AM, "Spiros Papadimitriou" <[EMAIL PROTECTED]> wrote: > Hi, > > Those two scripts assume that the bin directory is in the path, unlike all > other scripts. > > I opened a JIRA (https://issues.apache.org/jira/browse/HADOOP-2930) an

Re: How to perform FILE IO with Hadoop DFS

2008-05-05 Thread Ted Dunning
Keep in mind that many applications can do without real append if they don't have massive reliability requirements. Just accumulate data on the side and burp it into HDFS periodically. Then on some longer time scale accumulate your data burps into a full sized data belch. The cost is surprising

Re: How to perform FILE IO with Hadoop DFS

2008-05-05 Thread Ted Dunning
Yeah... Submit code. Failing that, help design the code. On 5/5/08 1:03 PM, "vikas" <[EMAIL PROTECTED]> wrote: > Is there any thing which I can do to raise its priority.

Re: How to perform FILE IO with Hadoop DFS

2008-05-05 Thread vikas
Thank you very much for the right link. It really helped. As many others even I'm waiting for "Append to files in HDFS" Is there any thing which I can do to raise its priority. Does HADOOP Developer community is tracking any request counter for a particular feature to raise ones priority. if that

Re: where is the documentation for MiniDFSCluster

2008-05-05 Thread Doug Cutting
Maneesha Jain wrote: I'm looking for any documentation or javadoc for MiniDFSCluster and have not been able to find it anywhere. Can someone please point me to it. http://svn.apache.org/repos/asf/hadoop/core/trunk/src/test/org/apache/hadoop/dfs/MiniDFSCluster.java This is part of the test cod

where is the documentation for MiniDFSCluster

2008-05-05 Thread Maneesha Jain
Hi, I'm looking for any documentation or javadoc for MiniDFSCluster and have not been able to find it anywhere. Can someone please point me to it. regards maneesha

Re: Query against different data types within HDFS using Map/Reduce

2008-05-05 Thread Kayla Jay
Awesome. Thanks for the replies. Do you mind sharing your code or providing high-level details on the implementation? - Original Message From: Jason Venner <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Monday, May 5, 2008 12:41:26 PM Subject: Re: Query against different dat

Re: Query against different data types within HDFS using Map/Reduce

2008-05-05 Thread Jason Venner
We do this all the time. In one case we have the mapper work out the input type by examining the input file name and the record data. We tend to do this for the textual keyTABvalue records In another case we have a container object that can hold any writable, that we pass around. We do this f

Re: How to perform FILE IO with Hadoop DFS

2008-05-05 Thread Arun C Murthy
On May 4, 2008, at 6:27 PM, vikas wrote: Hi All, I was looking for, how multiple inputs can be written to same output that too at different intervals of time ( ie.. I want to re-open the same file to append data to it ) This link did not contain any thing related to my Q. http://issues.a

RE: Query against different data types within HDFS using Map/Reduce

2008-05-05 Thread Ted Dunning
You just have to write an adapted input format that reads multiple kinds of input. It can key off the contents of the file or the name. Depending on names is bad, but has a long lineage so people tend to deal with it reasonably well. It isn't very hard to write. -Original Message- Fr

Query against different data types within HDFS using Map/Reduce

2008-05-05 Thread Kayla Jay
Has anyone come across this scenario and if not, does anyone have any suggestions? What if you store different types of data within HDFS. You store XML, text, binary, sequence files, etc. You now want to run a query against ALL of the data stored within HDFS via a map/reduce job. How do you