[jira] Updated: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-879: - Attachment: PIG-879.patch This patch addressed the above comments. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 53 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 339 release audit warnings (more than the trunk's current 338 warnings). > Pig should provide a way for input location string in load statement to be > passed as-is to the Loader > - > > Key: PIG-879 > URL: https://issues.apache.org/jira/browse/PIG-879 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Richard Ding > Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch, > PIG-879.patch, PIG-879.patch > > > Due to multiquery optimization, Pig always converts the filenames to > absolute URIs (see > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section > about Incompatible Changes - Path Names and Schemes). This is necessary since > the script may have "cd .." statements between load or store statements and > if the load statements have relative paths, we would need to convert to > absolute paths to know where to load/store from. To do this > QueryParser.massageFilename() has the code below[1] which basically gives the > fully qualified hdfs path > > However the issue with this approach is that if the filename string is > something like > "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2", > the code below[1] actually translates this to > hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 > and throws an exception that it is an incorrect path. > > Some loaders may want to interpret the filenames (the input location string > in the load statement) in any way they wish and may want Pig to not make > absolute paths out of them. > > There are a few options to address this: > 1)A command line switch to indicate to Pig that pathnames in the script > are all absolute and hence Pig should not alter them and pass them as-is to > Loaders and Storers. > 2)A keyword in the load and store statements to indicate the same intent > to pig > 3)A property which users can supply on cmdline or in pig.properties to > indicate the same intent. > 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String > curDir) which does the conversion to absolute - this way Loader can chose to > implement it as a noop. > Thoughts? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-879: - Attachment: PIG-879.patch This patch is generated after applying the patch for PIG-1094. Here is the test-patch results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 41 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 332 release audit warnings (more than the trunk's current 331 warnings). The only additional audit warning is about html: < [java] !? /homes/rding/apache-pig/load-store-redesign/build/pig-0.6.0-dev/docs/jdiff/changes/org.apache.pig.ReversibleLoadStoreFunc.html > Pig should provide a way for input location string in load statement to be > passed as-is to the Loader > - > > Key: PIG-879 > URL: https://issues.apache.org/jira/browse/PIG-879 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Richard Ding > Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch, > PIG-879.patch > > > Due to multiquery optimization, Pig always converts the filenames to > absolute URIs (see > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section > about Incompatible Changes - Path Names and Schemes). This is necessary since > the script may have "cd .." statements between load or store statements and > if the load statements have relative paths, we would need to convert to > absolute paths to know where to load/store from. To do this > QueryParser.massageFilename() has the code below[1] which basically gives the > fully qualified hdfs path > > However the issue with this approach is that if the filename string is > something like > "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2", > the code below[1] actually translates this to > hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 > and throws an exception that it is an incorrect path. > > Some loaders may want to interpret the filenames (the input location string > in the load statement) in any way they wish and may want Pig to not make > absolute paths out of them. > > There are a few options to address this: > 1)A command line switch to indicate to Pig that pathnames in the script > are all absolute and hence Pig should not alter them and pass them as-is to > Loaders and Storers. > 2)A keyword in the load and store statements to indicate the same intent > to pig > 3)A property which users can supply on cmdline or in pig.properties to > indicate the same intent. > 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String > curDir) which does the conversion to absolute - this way Loader can chose to > implement it as a noop. > Thoughts? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-879: - Attachment: PIG-879.patch This patch added apache header to the new unit test file. The remaining audit warning is html related. > Pig should provide a way for input location string in load statement to be > passed as-is to the Loader > - > > Key: PIG-879 > URL: https://issues.apache.org/jira/browse/PIG-879 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Richard Ding > Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch > > > Due to multiquery optimization, Pig always converts the filenames to > absolute URIs (see > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section > about Incompatible Changes - Path Names and Schemes). This is necessary since > the script may have "cd .." statements between load or store statements and > if the load statements have relative paths, we would need to convert to > absolute paths to know where to load/store from. To do this > QueryParser.massageFilename() has the code below[1] which basically gives the > fully qualified hdfs path > > However the issue with this approach is that if the filename string is > something like > "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2", > the code below[1] actually translates this to > hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 > and throws an exception that it is an incorrect path. > > Some loaders may want to interpret the filenames (the input location string > in the load statement) in any way they wish and may want Pig to not make > absolute paths out of them. > > There are a few options to address this: > 1)A command line switch to indicate to Pig that pathnames in the script > are all absolute and hence Pig should not alter them and pass them as-is to > Loaders and Storers. > 2)A keyword in the load and store statements to indicate the same intent > to pig > 3)A property which users can supply on cmdline or in pig.properties to > indicate the same intent. > 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String > curDir) which does the conversion to absolute - this way Loader can chose to > implement it as a noop. > Thoughts? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-879: - Attachment: PIG-879.patch > Pig should provide a way for input location string in load statement to be > passed as-is to the Loader > - > > Key: PIG-879 > URL: https://issues.apache.org/jira/browse/PIG-879 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Richard Ding > Attachments: PIG-879.patch, PIG-879.patch > > > Due to multiquery optimization, Pig always converts the filenames to > absolute URIs (see > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section > about Incompatible Changes - Path Names and Schemes). This is necessary since > the script may have "cd .." statements between load or store statements and > if the load statements have relative paths, we would need to convert to > absolute paths to know where to load/store from. To do this > QueryParser.massageFilename() has the code below[1] which basically gives the > fully qualified hdfs path > > However the issue with this approach is that if the filename string is > something like > "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2", > the code below[1] actually translates this to > hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 > and throws an exception that it is an incorrect path. > > Some loaders may want to interpret the filenames (the input location string > in the load statement) in any way they wish and may want Pig to not make > absolute paths out of them. > > There are a few options to address this: > 1)A command line switch to indicate to Pig that pathnames in the script > are all absolute and hence Pig should not alter them and pass them as-is to > Loaders and Storers. > 2)A keyword in the load and store statements to indicate the same intent > to pig > 3)A property which users can supply on cmdline or in pig.properties to > indicate the same intent. > 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String > curDir) which does the conversion to absolute - this way Loader can chose to > implement it as a noop. > Thoughts? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader
[ https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-879: - Attachment: PIG-879.patch This patch implements the option 1 of the above comment and is for load-store-redesign branch. > Pig should provide a way for input location string in load statement to be > passed as-is to the Loader > - > > Key: PIG-879 > URL: https://issues.apache.org/jira/browse/PIG-879 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Richard Ding > Attachments: PIG-879.patch > > > Due to multiquery optimization, Pig always converts the filenames to > absolute URIs (see > http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section > about Incompatible Changes - Path Names and Schemes). This is necessary since > the script may have "cd .." statements between load or store statements and > if the load statements have relative paths, we would need to convert to > absolute paths to know where to load/store from. To do this > QueryParser.massageFilename() has the code below[1] which basically gives the > fully qualified hdfs path > > However the issue with this approach is that if the filename string is > something like > "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2", > the code below[1] actually translates this to > hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2 > and throws an exception that it is an incorrect path. > > Some loaders may want to interpret the filenames (the input location string > in the load statement) in any way they wish and may want Pig to not make > absolute paths out of them. > > There are a few options to address this: > 1)A command line switch to indicate to Pig that pathnames in the script > are all absolute and hence Pig should not alter them and pass them as-is to > Loaders and Storers. > 2)A keyword in the load and store statements to indicate the same intent > to pig > 3)A property which users can supply on cmdline or in pig.properties to > indicate the same intent. > 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String > curDir) which does the conversion to absolute - this way Loader can chose to > implement it as a noop. > Thoughts? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.