[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-09-17 Thread Sandesh Devaraju (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910629#action_12910629
 ] 

Sandesh Devaraju commented on PIG-1229:
---

I narrowed down the problem to org.apache.hadoop.mapred.Task.java lines 411-418.

{code:title=org.apache.hadoop.mapred.Task.java|linenumbers=true|firstline=411}
if (useNewApi) {
  LOG.debug(using new api for output committer);
  outputFormat =
ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job);
  committer = outputFormat.getOutputCommitter(taskContext);
} else {
  committer = conf.getOutputCommitter();
}
{code}

But DBStorage UDF assumes that the OutputFormat is in a closure.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-09-16 Thread Sandesh Devaraju (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910331#action_12910331
 ] 

Sandesh Devaraju commented on PIG-1229:
---

I upgraded to 0.7 and tried the updated patch. However, I don't see any entries 
in the database.
Upon further investigation, I noticed that in my particular case, the batch 
size was 100 and the number of output records that ended up at every reducer 
was below this threshold.
I added a debug statement to the OuputComitter's commitTask method and found 
that count was 0.
Any ideas why this might be happening?

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-09-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910441#action_12910441
 ] 

Ankur commented on PIG-1229:


In the putNext() method, count is reset to 0 every time the number of tuples 
added to the batch exceed 'batchSize'. The batch is then executed and its 
parameters cleared. There is currently 
an ExecException in the putNext() method that is being ignored. Can you try 
adding some debugging System.outs and check the stdout/stderr of your reducers 
to see if that is the problem ?

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-08-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894963#action_12894963
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

I am still getting the same exception 
{code}
java.io.IOException: JDBC Error
at 
org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:291)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.init(PigOutputFormat.java:124)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:488)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:610)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.sql.SQLException: Table not found in statement [insert into ttt 
(id, name, ratio) values (?,?,?)]
at org.hsqldb.jdbc.Util.throwError(Unknown Source)
at org.hsqldb.jdbc.jdbcPreparedStatement.init(Unknown Source)
at org.hsqldb.jdbc.jdbcConnection.prepareStatement(Unknown Source)
at 
org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:288)
... 6 more
{code}

Reading through few internet forums it seems that there are subtle differences 
in stand-alone mode Vs server mode of hsqldb . May be starting hsqldb 
instance in server mode would alleviate the problem.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-08-03 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894975#action_12894975
 ] 

Aaron Kimball commented on PIG-1229:


Haven't looked at how you're using hsqldb in this patch, but I've got a lot of 
experience using HSQLDB for testing.

If you're running one or more tests in a single process that requires an 
HSQLDB-backed database, you do not need to create a new instance of Server. You 
can just set your JDBC connect string to {{jdbc:hsqldb:mem:foodbname}} and get 
a {{Connection}} instance to a memory-backed single-process database called 
{{foodbname}}. This database will exist for the lifetime of the Java process. 
You can have multiple {{Connection}} instances (concurrently or serially) open 
to this database and it will function like you expect a database to work like. 
The advantage of not using a server is that this does not require binding a 
port; therefore you can run multiple tests concurrently without worrying about 
collisions. Similarly, there's no need to use the {{jdbc:hsqldb:file}} protocol 
unless you want to restore the contents of the database in a subsequent 
process. When your Java process ends, you won't have a bonus file to clean up 
with {{jdbc:hsqldb:mem}}.

Of course, if you're testing with {{MiniMRCluster}} or something, you'll want 
to start a Server so that the external mapper processes can connect to the same 
database via {{jdbc:hsqldb:hsql://server:port/dbname}}. 



 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892999#action_12892999
 ] 

Hadoop QA commented on PIG-1229:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12450586/jira-1229-final.patch
  against trunk revision 979781.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/console

This message is automatically generated.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
 jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-07-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892378#action_12892378
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

Since fix to PIG-1424 doesnt look straight forward and I dont think anyone is 
working on it, I will suggest to unblock this useful piggy bank functionality 
from Pig's issues. We can take the original approach suggested in the first 
patch of passing jdbc url string as constructor argument instead of store 
location. 
Ankur, do you have cycles to generate the patch which we will commit now so it 
makes into 0.8.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, 
 pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-05-20 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869552#action_12869552
 ] 

Ankur commented on PIG-1229:


Hi Ashutosh,
   Thanks for helping out here. The error that you see - 
...The database is already in use by another process is due to locking issues 
in hsqldb 1.8.0.7. Upgrading to 1.8.0.10 
alleviates the problem and the test passes successfully. Few changes that I did

1. Added a placeholder record-writer as PigOutputFormat calls close() on it 
throwing null pointer exception if we return null from our output format.
2. Looks like you missed the ivy.xml and build.xml changes to pull the correct 
hsqldb jar.
 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, 
 pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-05-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869692#action_12869692
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

Cool. I created PIG-1424 to track the Pig issue.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, 
 pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861177#action_12861177
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

Ankur,

The stack trace above is out of sync with trunk. Can you upload the patch with 
this alternative approach that you are trying. I think it might be possible to 
get this working.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857154#action_12857154
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

As per http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg02257.html 
thread I am wondering if it will be safe and possible to make sure that job 
using this storage has speculative execution turned-off.  Otherwise, with S.E. 
turned on, there are too many scenarios we would have to handle. What do you 
think?

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-13 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856761#action_12856761
 ] 

Ankur commented on PIG-1229:


Any updates ? 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-11 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855835#action_12855835
 ] 

Ankur commented on PIG-1229:


* Sigh *
The problem is with hadoop's Path implementation that has problems 
understanding JDBC URLs correctly. So turning relToAbsPathForStoreFunction() 
does NOT help. 
The URI SyntaxException is now propagated to the point of setting output path 
for the job. Here is the new trace from the text execution failure with 
suggested workaround

org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected 
error during execution.
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
at org.apache.pig.PigServer.execute(PigServer.java:828)
at org.apache.pig.PigServer.access$100(PigServer.java:105)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
at 
org.apache.pig.piggybank.test.storage.TestDBStorage.testWriteToDB(Unknown 
Source)
Caused by: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
 ERROR 2017: Internal error creating job configuration.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.init(Path.java:126)
at org.apache.hadoop.fs.Path.init(Path.java:45)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:459)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.init(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)


  

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854740#action_12854740
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

You  can get rid of this stack-trace by overriding 
relToAbsPathForStoreLocation() of StoreFunc which DBStorage extends and turning 
it into no-op. Since, DB location is always absolute, there is no need of 
default behavior which is there in StoreFunc.  

For DataType.find() I found even PigStorage does the same, so this patch is no 
worse then PigStorage in that way.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853843#action_12853843
 ] 

Ankur commented on PIG-1229:


So accepting the JDBC URL in setStoreLocation() exposes a flaw in Hadoop's Path 
class and it causes test case to fail with following exception

java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.init(Path.java:126)
at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:238)
at 
org.apache.pig.StoreFunc.relToAbsPathForStoreLocation(StoreFunc.java:60)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3587)
...
...
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.init(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)

Looking at the code of Path.java it seems like it extracts scheme based on the 
first occurrence of ':', this causes authority and path to be extracted 
incorrectly resulting in the above exception thrown java.net.URI. 
However if I try to initialize URI directly with the URL string, no exception 
is thrown.

As for DB reachability check, I think it is ok to check the availability at the 
runtime an fail if its available. We do this prepareToWrite(). 
For performance enhancement, I think we can track that via separate issue.

This patch has taken quite a while now and I wouldn't want to delay it further 
by depending on a hadoop fix.

So If a reviewer does not find any blocking issues then my suggestion is to go 
ahead with the commit. 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852190#action_12852190
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

Few suggestions:

Reading from test case, currently store statements look like:
{code}
 b = store a into 'dummy' using 
org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert
 into a...');
{code}
here 'dummy' is totally ignored. while this works, from a user experience 
following might be better:

{code}
 b = store a into 'jdbc:hsqldb:file:/tmp/batchtest' using 
org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert
 into a');
{code}
that is, have db url as store location and second param of store func as db 
params. you can use setStoreLocation() to store url. Apart from more intuitive 
store stmt, this will also allow you to check whether DB is reachable or not at 
compile time itself, instead of at runtime. You can do that via 
checkOutputSpecs(). 

Doing DataType.findType() on every element of every tuple will be expensive. I 
am wondering if you can get hold of schema in your store func and use that to 
map pig types to sql types.

All of these suggestions may come in as later patches. So, if you want to get 
this committed and track these separately I think that also will work as this 
patch is functionally complete. 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-31 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852243#action_12852243
 ] 

Ankur commented on PIG-1229:


Ashutosh,
   Thanks for the review comments. Accepting the store location via 
setStoreLocation() definitely makes sense. However I am not sure about checking 
database reachability in checkOutputSepcs() 
since that may be called on the client side as well and the DB machine may not 
be reachable from the client machine. Isn't OutputFormat's setupTask()  a 
better place to do a DB availability checks ?
This sounds like a reasonable ask before a commit. I will incorporate this and 
submit a new patch 

 Doing DataType.find() 
I assume this is what you have in mind :-
1. Getting DB Schema information for the table we are writing to.
2. Use checkSchema() API to validate this with Pig supplied schema and 
cache it.
3. Use the cached information in the putNext() method.

This is more of a performance enhancement and looks like more work. So I would 
prefer if we track this as a JIRA for DBStorage.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851455#action_12851455
 ] 

Olga Natkovich commented on PIG-1229:
-

Since we already branched, this feature will not go into 0.7.0 branch but would 
instead be committed to trunk and released as part of 0.8.0 release. I think 
this patch should work just fine against trunk since we have noit deviated much.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851665#action_12851665
 ] 

Hadoop QA commented on PIG-1229:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440249/jira-1229-v2.patch
  against trunk revision 928950.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/console

This message is automatically generated.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-21 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847909#action_12847909
 ] 

Ankur commented on PIG-1229:


@Ashtosh Chauhan 
I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . 
Besides that apache cocoon uses it. I think we should be ok pulling it through 
ivy.

I'll make the ivy and load-store related changes and submit a new patch on 
Monday.

Sorry for the delay.
 

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.7.0

 Attachments: hsqldb.jar, jira-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-03 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840963#action_12840963
 ] 

Olga Natkovich commented on PIG-1229:
-

Ashutosh, please, review and see if we can pull the jar from IVY.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.7.0

 Attachments: hsqldb.jar, jira-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841003#action_12841003
 ] 

Ashutosh Chauhan commented on PIG-1229:
---

Ankur,

With recent Load-Store interface changes, the patch doesn't compile. Can you 
regenerate it? And while you are at it, can you also make changes in ivy.xml so 
that hsqldb.jar is pulled over internet instead of needing it to be bundled 
with pig distribution.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.7.0

 Attachments: hsqldb.jar, jira-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-15 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833998#action_12833998
 ] 

Aaron Kimball commented on PIG-1229:


Looks much better - thanks for adding the test case too. Including hsqldb.jar 
in your patch didn't work, by the way -- you'll need to attach that jar 
separately to the issue I think.


 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.6.0

 Attachments: jira-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831052#action_12831052
 ] 

Aaron Kimball commented on PIG-1229:


Ian, 

This class looks reasonable to me. You'll probably need to format this as a 
patch to get it accepted into the project though.

Is there a test plan for this code and/or unit tests?

Some database-specific things I've noticed: 
* You create a PreparedStatement, and call its executeUpdate() method several 
times then call close() on the statement. This assumes you're in Auto-commit 
mode; I think you should configure the commit mode explicitly when creating the 
connection. Also, you'll probably get a lot better performance if you use 
addBatch() / executeBatch() for your batch size rather than individual 
executeUpdate() statements. You should then call connection.commit() and 
ps.clear() rather than closing the prepared statement and compiling a new one. 
* If user and pass are null, I think you may need to use 
DriverManager.getConnection(jdbcUrl) instead of 
DriverManager.getConnection(jdbcUrl, null, null). Worth a unit test.
* See org.apache.hadoop.mapreduce.lib.db.DBOutputFormat in the MapReduce 
project for some similar code to take inspiration from. 


 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Priority: Minor
 Attachments: DbStorage.java


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831337#action_12831337
 ] 

Ankur commented on PIG-1229:


Aaron, Thanks for the suggestions.
I'll have an updated patch coming soon.

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Attachments: DbStorage.java


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.