Re: Setting input paths

2011-04-06 Thread Harsh Chouraria
Hello Mark and Robert, On Wed, Apr 6, 2011 at 9:55 PM, Mark wrote: >> On 4/6/11 9:53 AM, "Mark"  wrote: >> >> How can I tell my job to include all the subdirectories and their >> content of a certain path? Also worth noting is that in future releases of Hadoop, it will be possible to ask the Fil

Re: Setting input paths

2011-04-06 Thread Mark
Ok so the behavior is a little different when using FileInputFormat.addInputPath as opposed to using pig. Ill try the glob. Thanks On 4/6/11 8:41 AM, Robert Evans wrote: I believe that opening a directory as a file will result in a file not found. You probably need to set it to a glob, tha

Re: Setting input paths

2011-04-06 Thread hadoopman
I have a process which is loading data into hive hourly. Loading data hourly isn't a problem however when I load historical data say 24-48 hours I receive the below error msg. In googling I've come across some suggestions that jvm memory needs to be increased. Are there any other options or

Re: Setting input paths

2011-04-06 Thread Robert Evans
I believe that opening a directory as a file will result in a file not found. You probably need to set it to a glob, that points to that actual files. Something like /user/root/logs/2011/*/*/* for all entries in 2011, or /user/root/logs/2011/01/*/* if you want to restrict it to just January.

Setting input paths

2011-04-06 Thread Mark
How can I tell my job to include all the subdirectories and their content of a certain path? My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I tried setting my input path to 'logs/' using FileInputFormat.addInputPath however I keep receiving the following error: ava.io.F