Hello Mark and Robert,
On Wed, Apr 6, 2011 at 9:55 PM, Mark wrote:
>> On 4/6/11 9:53 AM, "Mark" wrote:
>>
>> How can I tell my job to include all the subdirectories and their
>> content of a certain path?
Also worth noting is that in future releases of Hadoop, it will be
possible to ask the Fil
Ok so the behavior is a little different when using
FileInputFormat.addInputPath
as opposed to using pig. Ill try the glob.
Thanks
On 4/6/11 8:41 AM, Robert Evans wrote:
I believe that opening a directory as a file will result in a file not found.
You probably need to set it to a glob, tha
I have a process which is loading data into hive hourly. Loading data
hourly isn't a problem however when I load historical data say 24-48
hours I receive the below error msg. In googling I've come across some
suggestions that jvm memory needs to be increased. Are there any other
options or
I believe that opening a directory as a file will result in a file not found.
You probably need to set it to a glob, that points to that actual files.
Something like
/user/root/logs/2011/*/*/* for all entries in 2011, or
/user/root/logs/2011/01/*/* if you want to restrict it to just January.
How can I tell my job to include all the subdirectories and their
content of a certain path?
My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I
tried setting my input path to 'logs/' using
FileInputFormat.addInputPath however I keep receiving the following error:
ava.io.F