[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=29&rev2=30 -- || Change || Section || Impact || Steps to address || Comments || || Load/Store interface changes || Changes to the Load and Store Functions || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || || - || Data compression becomes load/store function specific || Handling Compressed Data || Unknown but hopefully low || If compression is needed the underlying Input/Output format would need to support it || || + || Data compression becomes load/store function specific || Handling Compressed Data || Unknown but hopefully low || If compression is needed, the underlying Input/Output format would need to support it || || || Bzip compressed files in PigStorage format can no longer have .bz extension || Handling Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || || Switching to Hadoop's local mode || Local Mode || Low || None || Main change is 10-20x performance slowdown. Also, local mode now uses the same UDF interfaces to execute UDFs as the MR mode. || || Removing support for Load-Stream or Stream-Store optimization || Streaming || Low to None || None || This feature was never documented so it is unlikely it was ever used ||
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=28&rev2=29 -- || Change || Section || Impact || Steps to address || Comments || || Load/Store interface changes || Changes to the Load and Store Functions || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || || - || Data compression becomes load/store function specific || Handling Compressed Data || Unknow but hopefully low || If compression is needed the underlying Input/Output format would need to support it || || + || Data compression becomes load/store function specific || Handling Compressed Data || Unknown but hopefully low || If compression is needed the underlying Input/Output format would need to support it || || - || Bzip compressed files can no longer have .bz extension || Handling Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || + || Bzip compressed files in PigStorage format can no longer have .bz extension || Handling Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || - || Switch to Hadoop's local mode || Local Mode || Low || None || Main change is 10-20x performance slowdown. Also, local mode now uses the same UDF interfaces to execute UDFs as the MR mode. || + || Switching to Hadoop's local mode || Local Mode || Low || None || Main change is 10-20x performance slowdown. Also, local mode now uses the same UDF interfaces to execute UDFs as the MR mode. || - || Load-Stream or Stream-Store optimizations no longer supported || Streaming || Low to None || None || This feature was never documented so it is unlikely it was ever used || + || Removing support for Load-Stream or Stream-Store optimization || Streaming || Low to None || None || This feature was never documented so it is unlikely it was ever used || - || No longer support serialization and decerialization via load/store functions || Streaming || Unknown but hopefully low to medium || Implement new Serializer/Deserializer interfaces for non-standard serialization || || + || We no longer support serialization and decerialization via load/store functions || Streaming || Unknown but hopefully low to medium || Implement new PigToStream and StreamToPig interfaces for non-standard serialization || LoadStoreRedesignProposal || - || Removed BinaryStorage builtin || Streaming || Low to None || None || As far as we know, this class was only used internally by streaming || + || Removing BinaryStorage builtin || Streaming || Low to None || None || As far as we know, this class was only used internally by streaming || - || Split by file feature is removed || Split by File || Low to None || Input format of the loader would need to be used || We don't know that this feature was widely/ever used || + || Removing Split by file feature || Split by File || Low to None || Input format of the loader would need to support this || We don't know that this feature was widely/ever used || - || Local files no longer accessible from hadoop || Access to Local Files from Map-Reduce Mode || low to none || copy the file to the cluster using copyToLocal command || This feature was not documented || + || Local files no longer accessible from cluster || Access to Local Files from Map-Reduce Mode || low to none || copy the file to the cluster using copyToLocal command prior to the load || This feature was not documented || - || Removing Custom Comparators || Removing Custom Comparators || Low to None || None || This feature has been depricated since Pig 0.5.0 release. We don't have a single known use case || + || Removing Custom Comparators || Removing Custom Comparators || Low to None || None || This feature has been deprecated since Pig 0.5.0 release. We don't have a single known use case || == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=27&rev2=28 -- || Load-Stream or Stream-Store optimizations no longer supported || Streaming || Low to None || None || This feature was never documented so it is unlikely it was ever used || || No longer support serialization and decerialization via load/store functions || Streaming || Unknown but hopefully low to medium || Implement new Serializer/Deserializer interfaces for non-standard serialization || || || Removed BinaryStorage builtin || Streaming || Low to None || None || As far as we know, this class was only used internally by streaming || + || Split by file feature is removed || Split by File || Low to None || Input format of the loader would need to be used || We don't know that this feature was widely/ever used || + || Local files no longer accessible from hadoop || Access to Local Files from Map-Reduce Mode || low to none || copy the file to the cluster using copyToLocal command || This feature was not documented || + || Removing Custom Comparators || Removing Custom Comparators || Low to None || None || This feature has been depricated since Pig 0.5.0 release. We don't have a single known use case || == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=26&rev2=27 -- || Data compression becomes load/store function specific || Handling Compressed Data || Unknow but hopefully low || If compression is needed the underlying Input/Output format would need to support it || || || Bzip compressed files can no longer have .bz extension || Handling Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || || Switch to Hadoop's local mode || Local Mode || Low || None || Main change is 10-20x performance slowdown. Also, local mode now uses the same UDF interfaces to execute UDFs as the MR mode. || + || Load-Stream or Stream-Store optimizations no longer supported || Streaming || Low to None || None || This feature was never documented so it is unlikely it was ever used || + || No longer support serialization and decerialization via load/store functions || Streaming || Unknown but hopefully low to medium || Implement new Serializer/Deserializer interfaces for non-standard serialization || || + || Removed BinaryStorage builtin || Streaming || Low to None || None || As far as we know, this class was only used internally by streaming || == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=25&rev2=26 -- == Summary == - || Change || Impact || Steps to address || Comments || + || Change || Section || Impact || Steps to address || Comments || - || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || || + || Load/Store interface changes || Changes to the Load and Store Functions || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || || + || Data compression becomes load/store function specific || Handling Compressed Data || Unknow but hopefully low || If compression is needed the underlying Input/Output format would need to support it || || - || Bzip compressed files can no longer have .bz extension || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || + || Bzip compressed files can no longer have .bz extension || Handling Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || + || Switch to Hadoop's local mode || Local Mode || Low || None || Main change is 10-20x performance slowdown. Also, local mode now uses the same UDF interfaces to execute UDFs as the MR mode. || == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=24&rev2=25 -- = Backward incompatible changes in Pig 0.7.0 = Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. + + == Summary == || Change || Impact || Steps to address || Comments || || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || ||
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=23&rev2=24 -- Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. - | Change | Impact | Steps to address | Comments | + || Change || Impact || Steps to address || Comments || - | Load/Store interface changes | High | [[LoadStoreMigrationGuide | Load Store Migration Guide]] | | + || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || || - | Bzip compressed files can no longer have .bz extension | Low | 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension | This change is due to the fact that Text{Input/Output}Format only supports bz2 extension | + || Bzip compressed files can no longer have .bz extension || Low || 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This change is due to the fact that Text{Input/Output}Format only supports bz2 extension || == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=22&rev2=23 -- = Backward incompatible changes in Pig 0.7.0 = Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. + + | Change | Impact | Steps to address | Comments | + | Load/Store interface changes | High | [[LoadStoreMigrationGuide | Load Store Migration Guide]] | | + | Bzip compressed files can no longer have .bz extension | Low | 1. Rename existing .bz files to .bz2 files. 2. Update scripts to read/write files with bz2 extension | This change is due to the fact that Text{Input/Output}Format only supports bz2 extension | == Changes to the Load and Store Functions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=21&rev2=22 -- With Pig 0.7.0 the read/write functionality is taking over by Hadoop's Input/OutputFormat and how compression is handled or whether it is handled at all depends on the Input/OutputFormat used by the loader/store function. - The main input format that supports compression is TextInputFormat. It supports bzip files with .bz2 extension and gzip files with .gz extension. '''Note that it does not support .bz files'''. PigStorage is the only loader that comes with Pig that is derived from TextInputFormat which means it will be able to handle .bz2 and .gz files. Other laders such as BinStorage will no longer support compression. + The main input format that supports compression is TextInputFormat. It supports bzip files with .bz2 extension and gzip files with .gz extension. '''Note that it does not support .bz files'''. PigStorage is the only loader that comes with Pig that is derived from TextInputFormat which means it will be able to handle .bz2 and .gz files. Other loaders such as BinStorage will no longer support compression. On the store side, TextOutputFormat also supports compression but the store function needs do to additional work to enable it. Again, PigStorage will support compressions while other functions will not.
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=14&rev2=15 -- This functionality was added to deal with gap in Pig's early functionality - lack of numeric comparison in order by as well as lack of descending sort. This functionality has been present in last 4 releases and custom comparators has been depricated in the last several releases. They functionality is removed in this release. - == Open Questions == - - Q: Should String->Text conversion be part of this release. - A: Pros: 20-30% improved memory utilization; cons: more compatibility is broken. -
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=11&rev2=12 -- }}} - == Removing Custom Comparators + == Removing Custom Comparators == This functionality was added to deal with gap in Pig's early functionality - lack of numeric comparison in order by as well as lack of descending sort. This functionality has been present in last 4 releases and custom comparators has been depricated in the last several releases. They functionality is removed in this release.
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=10&rev2=11 -- }}} + == Removing Custom Comparators + + This functionality was added to deal with gap in Pig's early functionality - lack of numeric comparison in order by as well as lack of descending sort. This functionality has been present in last 4 releases and custom comparators has been depricated in the last several releases. They functionality is removed in this release. + == Open Questions == Q: Should String->Text conversion be part of this release.
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=8&rev2=9 -- First, in the initial (0.7.0) release, '''we will not support optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and the optimization was never documented and so unlikely to be used. - Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The default (PigStorage) format will continue to work. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. + Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The default (PigStorage) format will continue to work. This formar is now implemented by a class called org.apache.pig.impl.streaming.PigStreaming that can be also used directly in the streaming statement. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. + + We have also removed org.apache.pig.builtin.BinaryStorage loader/store function and org.apache.pig.builtin.PigDump which were only used from within straming. They can be restored if needed - we would just need to implement the corresponding Input/OutputFormats. == Split by File == @@ -44, +46 @@ We will have a different approach for streaming optimization if that functionality is necessary. - == Access to Local Files from Map-Reduce Mode + == Access to Local Files from Map-Reduce Mode == In the earlier version of Pig, you could access a local file from map-reduce mode by prepending file:// to the file location:
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=7&rev2=8 -- We will have a different approach for streaming optimization if that functionality is necessary. + == Access to Local Files from Map-Reduce Mode + + In the earlier version of Pig, you could access a local file from map-reduce mode by prepending file:// to the file location: + + {{{ + A = load 'file:/mydir/myfile'; + ... + }}} + + When Pig processed this statement, it would first copy the data to DFS and then import it into the execution pipeline. + + In Pig 0.7.0, you can no longer do this and if this functionality is still desired, you can add the copy into your script manually: + + {{{ + fs copyFromLocal src dist + A = load 'dist'; + + }}} + == Open Questions == Q: Should String->Text conversion be part of this release.
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=6&rev2=7 -- == Local Mode == - The main change here is that we switched from Pig's native local mode to Hadoop's local mode. This change should be transparent for most applications. Possible differnces you will see: + The main change here is that we switched from Pig's native local mode to Hadoop's local mode. This change should be transparent for most applications. Possible differnces you will see are: 1. Hadoop local mode is about order of magnitude slower than Pig's local mode. Something that Hadoop team promised to address. - 2. For algebraic functions, no the entire Algebraic interface will be used which is likely a good think if you are using local mode for testing your production applications. + 2. For algebraic functions, now the entire Algebraic interface will be used which is likely a good thing if you are using local mode for testing your production applications. == Streaming == There are two things that are changing in streaming. - First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlikely to be used. + First, in the initial (0.7.0) release, '''we will not support optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and the optimization was never documented and so unlikely to be used. Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The default (PigStorage) format will continue to work. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. @@ -46, +46 @@ == Open Questions == + Q: Should String->Text conversion be part of this release. + A: Pros: 20-30% improved memory utilization; cons: more compatibility is broken. +
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=5&rev2=6 -- == Changes to the Load and Store Functions == - TBW [Need to take a load (with and withoutcustom slicer) and a store function and create new versions as examples. Can use PigStorage for (1) and (3) but need some loader for (2).] + TBW [Need to take a load (with and without custom slicer) and a store function and create new versions as examples. Can use PigStorage for (1) and (3) but need to choose a loader for (2).] == Handling Compressed Data == @@ -32, +32 @@ There are two things that are changing in streaming. - First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlekly to be used. + First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlikely to be used. - Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The defaul (PigStorage) format will continue to work. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. + Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The default (PigStorage) format will continue to work. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. == Split by File ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=4&rev2=5 -- == Changes to the Load and Store Functions == - TBW + TBW [Need to take a load (with and withoutcustom slicer) and a store function and create new versions as examples. Can use PigStorage for (1) and (3) but need some loader for (2).] + == Handling Compressed Data == @@ -21, +22 @@ If you have a custom load/store function that needs to support compression, you would need to make sure that the underlying Input/OutputFormat supports this type of compression. == Local Mode == + + The main change here is that we switched from Pig's native local mode to Hadoop's local mode. This change should be transparent for most applications. Possible differnces you will see: + + 1. Hadoop local mode is about order of magnitude slower than Pig's local mode. Something that Hadoop team promised to address. + 2. For algebraic functions, no the entire Algebraic interface will be used which is likely a good think if you are using local mode for testing your production applications. + == Streaming == There are two things that are changing in streaming. First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlekly to be used. - Second, '''you can no longer use load/store functions for (de)serialization.''' + Second, '''you can no longer use load/store functions for (de)serialization.''' A new interface has been defined that needed to be implemented for custom (de)serializations. The defaul (PigStorage) format will continue to work. Details of the new interface are describe in http://wiki.apache.org/pig/LoadStoreRedesignProposal. == Split by File ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=3&rev2=4 -- Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. == Changes to the Load and Store Functions == + + TBW + == Handling Compressed Data == In 0.6.0 or earlier versions Pig supported bzip compressed files with extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. Pig was able to both read and write files in this format with the understanding that gzip compressed files could not be split across multiple maps while bzip compressed files could. Also, data compression was completely decoupled from the data format and Load/Store functions meaning that any loader could read compressed data and any store function could write it just by the virtue of having the right extension on the files it was reading or writing. @@ -19, +22 @@ == Local Mode == == Streaming == + + There are two things that are changing in streaming. + + First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlekly to be used. + + Second, '''you can no longer use load/store functions for (de)serialization.''' + == Split by File == In the earlier versions of Pig, a user could specify "split by file" on the loader statement which would make sure that each map got the entire file rather than the files were further divided into blocks. This feature was primarily design for streaming optimization but could also be used with loaders that can't deal with incomplete records. We don't believe that this functionality has been widely used.
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=2&rev2=3 -- == Local Mode == == Streaming == - == Other Changes == + == Split by File == - - Split by file + In the earlier versions of Pig, a user could specify "split by file" on the loader statement which would make sure that each map got the entire file rather than the files were further divided into blocks. This feature was primarily design for streaming optimization but could also be used with loaders that can't deal with incomplete records. We don't believe that this functionality has been widely used. + + Because the slicing of the data is no longer in Pig's control, we can't support this feature generically for every loader. If a particular loader needs this functionality, it will need to make sure that the underlying InputFormat supports it. + + We will have a different approach for streaming optimization if that functionality is necessary. == Open Questions ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=1&rev2=2 -- = Backward incompatible changes in Pig 0.7.0 = - Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of this changes will not be backward compatible and will require users to change the pig scripts or their UDFs. This document is intended to keep track of this changes to that we can document them for the release. + Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. - == Changes to the Load and Store functions == + == Changes to the Load and Store Functions == == Handling Compressed Data == + + In 0.6.0 or earlier versions Pig supported bzip compressed files with extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. Pig was able to both read and write files in this format with the understanding that gzip compressed files could not be split across multiple maps while bzip compressed files could. Also, data compression was completely decoupled from the data format and Load/Store functions meaning that any loader could read compressed data and any store function could write it just by the virtue of having the right extension on the files it was reading or writing. + + With Pig 0.7.0 the read/write functionality is taking over by Hadoop's Input/OutputFormat and how compression is handled or whether it is handled at all depends on the Input/OutputFormat used by the loader/store function. + + The main input format that supports compression is TextInputFormat. It supports bzip files with .bz2 extension and gzip files with .gz extension. '''Note that it does not support .bz files'''. PigStorage is the only loader that comes with Pig that is derived from TextInputFormat which means it will be able to handle .bz2 and .gz files. Other laders such as BinStorage will no longer support compression. + + On the store side, TextOutputFormat also supports compression but the store function needs do to additional work to enable it. Again, PigStorage will support compressions while other functions will not. + + If you have a custom load/store function that needs to support compression, you would need to make sure that the underlying Input/OutputFormat supports this type of compression. + == Local Mode == == Streaming == == Other Changes ==
[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges -- New page: = Backward incompatible changes in Pig 0.7.0 = Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of this changes will not be backward compatible and will require users to change the pig scripts or their UDFs. This document is intended to keep track of this changes to that we can document them for the release. == Changes to the Load and Store functions == == Handling Compressed Data == == Local Mode == == Streaming == == Other Changes == - Split by file == Open Questions ==