[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910629#action_12910629 ] Sandesh Devaraju commented on PIG-1229: --- I narrowed down the problem to org.apache.hadoop.mapred.Task.java lines 411-418. {code:title=org.apache.hadoop.mapred.Task.java|linenumbers=true|firstline=411} if (useNewApi) { LOG.debug(using new api for output committer); outputFormat = ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job); committer = outputFormat.getOutputCommitter(taskContext); } else { committer = conf.getOutputCommitter(); } {code} But DBStorage UDF assumes that the OutputFormat is in a closure. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910331#action_12910331 ] Sandesh Devaraju commented on PIG-1229: --- I upgraded to 0.7 and tried the updated patch. However, I don't see any entries in the database. Upon further investigation, I noticed that in my particular case, the batch size was 100 and the number of output records that ended up at every reducer was below this threshold. I added a debug statement to the OuputComitter's commitTask method and found that count was 0. Any ideas why this might be happening? allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910441#action_12910441 ] Ankur commented on PIG-1229: In the putNext() method, count is reset to 0 every time the number of tuples added to the batch exceed 'batchSize'. The batch is then executed and its parameters cleared. There is currently an ExecException in the putNext() method that is being ignored. Can you try adding some debugging System.outs and check the stdout/stderr of your reducers to see if that is the problem ? allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894963#action_12894963 ] Ashutosh Chauhan commented on PIG-1229: --- I am still getting the same exception {code} java.io.IOException: JDBC Error at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.init(PigOutputFormat.java:124) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:488) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:610) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.sql.SQLException: Table not found in statement [insert into ttt (id, name, ratio) values (?,?,?)] at org.hsqldb.jdbc.Util.throwError(Unknown Source) at org.hsqldb.jdbc.jdbcPreparedStatement.init(Unknown Source) at org.hsqldb.jdbc.jdbcConnection.prepareStatement(Unknown Source) at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:288) ... 6 more {code} Reading through few internet forums it seems that there are subtle differences in stand-alone mode Vs server mode of hsqldb . May be starting hsqldb instance in server mode would alleviate the problem. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894975#action_12894975 ] Aaron Kimball commented on PIG-1229: Haven't looked at how you're using hsqldb in this patch, but I've got a lot of experience using HSQLDB for testing. If you're running one or more tests in a single process that requires an HSQLDB-backed database, you do not need to create a new instance of Server. You can just set your JDBC connect string to {{jdbc:hsqldb:mem:foodbname}} and get a {{Connection}} instance to a memory-backed single-process database called {{foodbname}}. This database will exist for the lifetime of the Java process. You can have multiple {{Connection}} instances (concurrently or serially) open to this database and it will function like you expect a database to work like. The advantage of not using a server is that this does not require binding a port; therefore you can run multiple tests concurrently without worrying about collisions. Similarly, there's no need to use the {{jdbc:hsqldb:file}} protocol unless you want to restore the contents of the database in a subsequent process. When your Java process ends, you won't have a bonus file to clean up with {{jdbc:hsqldb:mem}}. Of course, if you're testing with {{MiniMRCluster}} or something, you'll want to start a Server so that the external mapper processes can connect to the same database via {{jdbc:hsqldb:hsql://server:port/dbname}}. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892999#action_12892999 ] Hadoop QA commented on PIG-1229: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450586/jira-1229-final.patch against trunk revision 979781. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/console This message is automatically generated. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892378#action_12892378 ] Ashutosh Chauhan commented on PIG-1229: --- Since fix to PIG-1424 doesnt look straight forward and I dont think anyone is working on it, I will suggest to unblock this useful piggy bank functionality from Pig's issues. We can take the original approach suggested in the first patch of passing jdbc url string as constructor argument instead of store location. Ankur, do you have cycles to generate the patch which we will commit now so it makes into 0.8. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869552#action_12869552 ] Ankur commented on PIG-1229: Hi Ashutosh, Thanks for helping out here. The error that you see - ...The database is already in use by another process is due to locking issues in hsqldb 1.8.0.7. Upgrading to 1.8.0.10 alleviates the problem and the test passes successfully. Few changes that I did 1. Added a placeholder record-writer as PigOutputFormat calls close() on it throwing null pointer exception if we return null from our output format. 2. Looks like you missed the ivy.xml and build.xml changes to pull the correct hsqldb jar. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869692#action_12869692 ] Ashutosh Chauhan commented on PIG-1229: --- Cool. I created PIG-1424 to track the Pig issue. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861177#action_12861177 ] Ashutosh Chauhan commented on PIG-1229: --- Ankur, The stack trace above is out of sync with trunk. Can you upload the patch with this alternative approach that you are trying. I think it might be possible to get this working. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857154#action_12857154 ] Ashutosh Chauhan commented on PIG-1229: --- As per http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg02257.html thread I am wondering if it will be safe and possible to make sure that job using this storage has speculative execution turned-off. Otherwise, with S.E. turned on, there are too many scenarios we would have to handle. What do you think? allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856761#action_12856761 ] Ankur commented on PIG-1229: Any updates ? allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855835#action_12855835 ] Ankur commented on PIG-1229: * Sigh * The problem is with hadoop's Path implementation that has problems understanding JDBC URLs correctly. So turning relToAbsPathForStoreFunction() does NOT help. The URI SyntaxException is now propagated to the point of setting output path for the job. Here is the new trace from the text execution failure with suggested workaround org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835) at org.apache.pig.PigServer.execute(PigServer.java:828) at org.apache.pig.PigServer.access$100(PigServer.java:105) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080) at org.apache.pig.PigServer.executeBatch(PigServer.java:288) at org.apache.pig.piggybank.test.storage.TestDBStorage.testWriteToDB(Unknown Source) Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:45) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:459) Caused by: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854740#action_12854740 ] Ashutosh Chauhan commented on PIG-1229: --- You can get rid of this stack-trace by overriding relToAbsPathForStoreLocation() of StoreFunc which DBStorage extends and turning it into no-op. Since, DB location is always absolute, there is no need of default behavior which is there in StoreFunc. For DataType.find() I found even PigStorage does the same, so this patch is no worse then PigStorage in that way. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853843#action_12853843 ] Ankur commented on PIG-1229: So accepting the JDBC URL in setStoreLocation() exposes a flaw in Hadoop's Path class and it causes test case to fail with following exception java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:238) at org.apache.pig.StoreFunc.relToAbsPathForStoreLocation(StoreFunc.java:60) at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3587) ... ... Caused by: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) Looking at the code of Path.java it seems like it extracts scheme based on the first occurrence of ':', this causes authority and path to be extracted incorrectly resulting in the above exception thrown java.net.URI. However if I try to initialize URI directly with the URL string, no exception is thrown. As for DB reachability check, I think it is ok to check the availability at the runtime an fail if its available. We do this prepareToWrite(). For performance enhancement, I think we can track that via separate issue. This patch has taken quite a while now and I wouldn't want to delay it further by depending on a hadoop fix. So If a reviewer does not find any blocking issues then my suggestion is to go ahead with the commit. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852190#action_12852190 ] Ashutosh Chauhan commented on PIG-1229: --- Few suggestions: Reading from test case, currently store statements look like: {code} b = store a into 'dummy' using org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert into a...'); {code} here 'dummy' is totally ignored. while this works, from a user experience following might be better: {code} b = store a into 'jdbc:hsqldb:file:/tmp/batchtest' using org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert into a'); {code} that is, have db url as store location and second param of store func as db params. you can use setStoreLocation() to store url. Apart from more intuitive store stmt, this will also allow you to check whether DB is reachable or not at compile time itself, instead of at runtime. You can do that via checkOutputSpecs(). Doing DataType.findType() on every element of every tuple will be expensive. I am wondering if you can get hold of schema in your store func and use that to map pig types to sql types. All of these suggestions may come in as later patches. So, if you want to get this committed and track these separately I think that also will work as this patch is functionally complete. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852243#action_12852243 ] Ankur commented on PIG-1229: Ashutosh, Thanks for the review comments. Accepting the store location via setStoreLocation() definitely makes sense. However I am not sure about checking database reachability in checkOutputSepcs() since that may be called on the client side as well and the DB machine may not be reachable from the client machine. Isn't OutputFormat's setupTask() a better place to do a DB availability checks ? This sounds like a reasonable ask before a commit. I will incorporate this and submit a new patch Doing DataType.find() I assume this is what you have in mind :- 1. Getting DB Schema information for the table we are writing to. 2. Use checkSchema() API to validate this with Pig supplied schema and cache it. 3. Use the cached information in the putNext() method. This is more of a performance enhancement and looks like more work. So I would prefer if we track this as a JIRA for DBStorage. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851455#action_12851455 ] Olga Natkovich commented on PIG-1229: - Since we already branched, this feature will not go into 0.7.0 branch but would instead be committed to trunk and released as part of 0.8.0 release. I think this patch should work just fine against trunk since we have noit deviated much. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851665#action_12851665 ] Hadoop QA commented on PIG-1229: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440249/jira-1229-v2.patch against trunk revision 928950. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/console This message is automatically generated. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847909#action_12847909 ] Ankur commented on PIG-1229: @Ashtosh Chauhan I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . Besides that apache cocoon uses it. I think we should be ok pulling it through ivy. I'll make the ivy and load-store related changes and submit a new patch on Monday. Sorry for the delay. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.7.0 Attachments: hsqldb.jar, jira-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840963#action_12840963 ] Olga Natkovich commented on PIG-1229: - Ashutosh, please, review and see if we can pull the jar from IVY. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.7.0 Attachments: hsqldb.jar, jira-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841003#action_12841003 ] Ashutosh Chauhan commented on PIG-1229: --- Ankur, With recent Load-Store interface changes, the patch doesn't compile. Can you regenerate it? And while you are at it, can you also make changes in ivy.xml so that hsqldb.jar is pulled over internet instead of needing it to be bundled with pig distribution. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.7.0 Attachments: hsqldb.jar, jira-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833998#action_12833998 ] Aaron Kimball commented on PIG-1229: Looks much better - thanks for adding the test case too. Including hsqldb.jar in your patch didn't work, by the way -- you'll need to attach that jar separately to the issue I think. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.6.0 Attachments: jira-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831052#action_12831052 ] Aaron Kimball commented on PIG-1229: Ian, This class looks reasonable to me. You'll probably need to format this as a patch to get it accepted into the project though. Is there a test plan for this code and/or unit tests? Some database-specific things I've noticed: * You create a PreparedStatement, and call its executeUpdate() method several times then call close() on the statement. This assumes you're in Auto-commit mode; I think you should configure the commit mode explicitly when creating the connection. Also, you'll probably get a lot better performance if you use addBatch() / executeBatch() for your batch size rather than individual executeUpdate() statements. You should then call connection.commit() and ps.clear() rather than closing the prepared statement and compiling a new one. * If user and pass are null, I think you may need to use DriverManager.getConnection(jdbcUrl) instead of DriverManager.getConnection(jdbcUrl, null, null). Worth a unit test. * See org.apache.hadoop.mapreduce.lib.db.DBOutputFormat in the MapReduce project for some similar code to take inspiration from. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Priority: Minor Attachments: DbStorage.java UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831337#action_12831337 ] Ankur commented on PIG-1229: Aaron, Thanks for the suggestions. I'll have an updated patch coming soon. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Attachments: DbStorage.java UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.