[
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857253#action_12857253
]
Ankur commented on PIG-1229:
----------------------------
So I read the complete thread and here are my thoughts:-
- Speculative execution issue : With recent changes of moving to Hadoop's I/O
format in Load/Store, DBStorage has been modified to commit the data to DB in
OutputCommitter's
commitTask() method. Hadoop itself gaurantees that the method will be called
only for first successful attempt so it shouldn't matter whether or not
speculative execution is on.
BUT this does NOT solve the problem where certain tasks finished successfully
but the JOB itself failed in which case the data from successful attempts
should be rolled back.
- Writing to Temporary Table: Even this does not handle the case the above case
since some of the tasks would have moved their data to the actual table.
- Bulk loading : This is the most suitable option in my opinion if the data is
large. However for small to medium data size (like aggregate summaries), I
found DBStorage UDF to be most helpful.
It just eliminates one more layer of processing from the application. In fact
this was precisely the reason it was written for.
So in a nutshell, using a single mapper/reducer with this patch should be good
regardless of speculative execution being off/on. In case of multiple
mappers/reducers writing to DB it should be application's
responsibility to cleanup data ONLY IN CASE of job failure.
> allow pig to write output into a JDBC db
> ----------------------------------------
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Ian Holsman
> Assignee: Ankur
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira