Hi,
apaches jira is down. So let me answer this by mail so I dont forget until tomorrow. :) I will attach the text as reference to the issue as soon jira is available again.
From a frist look into the code I guess this is what happens.
The data are stored successfully on disk the first time you call store. So PoStore adds an entry to materializedResults. What is basically a hashmap that holds OperatorKey - just a name and LocalResult - a pointer to the file you just wrote.

If you now trigger store again for the same alias, pig tries to optimize performance bt reusing the output file you just stored. This happens by first check if there is already materializedResults entry. What is the case - so in theory this could be reused just read and writte again to a new path. Now there are a couple of problems. First in your testcase you delete the output file (/tmp/testPigOutput) but pig tries to read in this file again to write it out again. What means you read and write at the same time into the same file. Another problem in your test you delete this file between the store calls , so it can't be read back.

Now a pig come in. Pig tries to read back in this file with the same object you used for storing this file. So the object need to implement LoadFunc und StoreFunc, what is not the case in your test you only implement storefunc, what makes sense from my pov. See POLoad, line 57, lf = (LoadFunc) PigContext.instantiateFuncFromSpec(fileSpec.getFuncSpec()); // the return value can be a StoreFunc only as well.

This worked so far since most of the StoreFunc and LoadFunc are implemented in one class, but not a good idea.

So now the question to the pig developers, how we can solve that problem? Only cache materialized files in case we do have a load and a store func available? Re process all required plans in case we can not load a materialized result?

Any thoughts?

Stefan





On Feb 20, 2008, at 3:59 PM, Johannes Zillmann (JIRA) wrote:


[ https://issues.apache.org/jira/browse/PIG-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johannes Zillmann updated PIG-114:
----------------------------------

   Attachment: pigPatch-storeTwice-620665.patch

store one alias/logicalPlan twice leads to instantiation of StoreFunc as LoadFunc
---------------------------------------------------------------------------------

               Key: PIG-114
               URL: https://issues.apache.org/jira/browse/PIG-114
           Project: Pig
        Issue Type: Bug
        Components: impl
          Reporter: Johannes Zillmann
       Attachments: pigPatch-storeTwice-620665.patch


Calling PigServer#store() twice for an alias results in following exception :
{noformat}
java.lang.RuntimeException: java.lang.ClassCastException: org.apache.pig.test.DummyStoreFunc cannot be cast to org.apache.pig.LoadFunc at org .apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java: 59) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:167) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.compile(LocalExecutionEngine.java:111) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.compile(LocalExecutionEngine.java:90) at org .apache .pig .backend .local .executionengine .LocalExecutionEngine.compile(LocalExecutionEngine.java:1)
        at org.apache.pig.PigServer.store(PigServer.java:330)
        at org.apache.pig.PigServer.store(PigServer.java:317)
        at org.apache.pig.test.StoreTwiceTest.testIt(StoreTwiceTest.java:31)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:589)
        at junit.framework.TestCase.runTest(TestCase.java:164)
        at junit.framework.TestCase.runBare(TestCase.java:130)
        at junit.framework.TestResult$1.protect(TestResult.java:110)
        at junit.framework.TestResult.runProtected(TestResult.java:128)
        at junit.framework.TestResult.run(TestResult.java:113)
        at junit.framework.TestCase.run(TestCase.java:120)
        at junit.framework.TestSuite.runTest(TestSuite.java:228)
        at junit.framework.TestSuite.run(TestSuite.java:223)
at org .junit .internal.runners.OldTestClassRunner.run(OldTestClassRunner.java:35) at org .eclipse .jdt .internal .junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45) at org .eclipse .jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 386) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 196) Caused by: java.lang.ClassCastException: org.apache.pig.test.DummyStoreFunc cannot be cast to org.apache.pig.LoadFunc at org .apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java: 57)
        ... 28 more
{noformat}
I will attach a patch with a test scenario for this. Basically the code is as follow:
{noformat}PigServer pig = new PigServer(ExecType.LOCAL);
       pig
.registerQuery("A = LOAD 'test/org/apache/pig/test/ StoreTwiceTest.java' USING "
                       + DummyLoadFunc.class.getName() + "();");
       pig.registerQuery("B = FOREACH A GENERATE * ;");
       File outputFile = new File("/tmp/testPigOutput");
       outputFile.delete();
pig.store("A", outputFile.getAbsolutePath(), DummyStoreFunc.class
               .getName()
               + "()");
       outputFile.delete();
pig.store("B", outputFile.getAbsolutePath(), DummyStoreFunc.class
               .getName()
               + "()");
       outputFile.delete();
       assertEquals(2, _storedTuples.size());
{noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com


Reply via email to