Creation of output path should be done by storage function
----------------------------------------------------------
Key: PIG-1174
URL: https://issues.apache.org/jira/browse/PIG-1174
Project: Pig
Issue Type: Bug
Reporter: Bill Graham
When executing a STORE command, Pig creates the output location before the
storage function gets called. This causes problems with storage functions that
have logic to determine the output location. See this thread:
http://www.mail-archive.com/pig-user%40hadoop.apache.org/msg01538.html
For example, when making a request like this:
STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0',
'none', '\t');
Pig creates a file '/my/home/output' and then an exception is thrown when
MultiStorage tries to make a directory under '/my/home/output'. The workaround
is to instead specify a dummy location as the first path like so:
STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0',
'none', '\t');
Two changes should be made:
1. The path specified in the INTO clause should be available to the storage
function so it doesn't need to be duplicated.
2. The creation of the output paths should be delegated to the storage function.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.