Hi,
apaches jira is down. So let me answer this by mail so I dont forget
until tomorrow. :) I will attach the text as reference to the issue as
soon jira is available again.
From a frist look into the code I guess this is what happens.
The data are stored successfully on disk the first time you call
store. So PoStore adds an entry to materializedResults.
What is basically a hashmap that holds OperatorKey - just a name and
LocalResult - a pointer to the file you just wrote.
If you now trigger store again for the same alias, pig tries to
optimize performance bt reusing the output file you just stored.
This happens by first check if there is already materializedResults
entry.
What is the case - so in theory this could be reused just read and
writte again to a new path.
Now there are a couple of problems. First in your testcase you delete
the output file (/tmp/testPigOutput) but pig tries to read in this
file again to write it out again. What means you read and write at the
same time into the same file. Another problem in your test you delete
this file between the store calls , so it can't be read back.
Now a pig come in. Pig tries to read back in this file with the same
object you used for storing this file.
So the object need to implement LoadFunc und StoreFunc, what is not
the case in your test you only implement storefunc, what makes sense
from my pov. See POLoad, line 57,
lf = (LoadFunc)
PigContext.instantiateFuncFromSpec(fileSpec.getFuncSpec()); // the
return value can be a StoreFunc only as well.
This worked so far since most of the StoreFunc and LoadFunc are
implemented in one class, but not a good idea.
So now the question to the pig developers, how we can solve that
problem?
Only cache materialized files in case we do have a load and a store
func available?
Re process all required plans in case we can not load a materialized
result?
Any thoughts?
Stefan
On Feb 20, 2008, at 3:59 PM, Johannes Zillmann (JIRA) wrote:
[ https://issues.apache.org/jira/browse/PIG-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johannes Zillmann updated PIG-114:
----------------------------------
Attachment: pigPatch-storeTwice-620665.patch
store one alias/logicalPlan twice leads to instantiation of
StoreFunc as LoadFunc
---------------------------------------------------------------------------------
Key: PIG-114
URL: https://issues.apache.org/jira/browse/PIG-114
Project: Pig
Issue Type: Bug
Components: impl
Reporter: Johannes Zillmann
Attachments: pigPatch-storeTwice-620665.patch
Calling PigServer#store() twice for an alias results in following
exception :
{noformat}
java.lang.RuntimeException: java.lang.ClassCastException:
org.apache.pig.test.DummyStoreFunc cannot be cast to
org.apache.pig.LoadFunc
at
org
.apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java:
59)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:167)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.compile(LocalExecutionEngine.java:111)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.compile(LocalExecutionEngine.java:90)
at
org
.apache
.pig
.backend
.local
.executionengine
.LocalExecutionEngine.compile(LocalExecutionEngine.java:1)
at org.apache.pig.PigServer.store(PigServer.java:330)
at org.apache.pig.PigServer.store(PigServer.java:317)
at org.apache.pig.test.StoreTwiceTest.testIt(StoreTwiceTest.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun
.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:589)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:228)
at junit.framework.TestSuite.run(TestSuite.java:223)
at
org
.junit
.internal.runners.OldTestClassRunner.run(OldTestClassRunner.java:35)
at
org
.eclipse
.jdt
.internal
.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at
org
.eclipse
.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org
.eclipse
.jdt
.internal
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at
org
.eclipse
.jdt
.internal
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at
org
.eclipse
.jdt
.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:
386)
at
org
.eclipse
.jdt
.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:
196)
Caused by: java.lang.ClassCastException:
org.apache.pig.test.DummyStoreFunc cannot be cast to
org.apache.pig.LoadFunc
at
org
.apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java:
57)
... 28 more
{noformat}
I will attach a patch with a test scenario for this. Basically the
code is as follow:
{noformat}PigServer pig = new PigServer(ExecType.LOCAL);
pig
.registerQuery("A = LOAD 'test/org/apache/pig/test/
StoreTwiceTest.java' USING "
+ DummyLoadFunc.class.getName() + "();");
pig.registerQuery("B = FOREACH A GENERATE * ;");
File outputFile = new File("/tmp/testPigOutput");
outputFile.delete();
pig.store("A", outputFile.getAbsolutePath(),
DummyStoreFunc.class
.getName()
+ "()");
outputFile.delete();
pig.store("B", outputFile.getAbsolutePath(),
DummyStoreFunc.class
.getName()
+ "()");
outputFile.delete();
assertEquals(2, _storedTuples.size());
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com