[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=29&rev2=30

--

  
  || Change || Section || Impact || Steps to address || Comments ||
  || Load/Store interface changes || Changes to the Load and Store Functions || 
High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || ||
- || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknown but hopefully low || If compression is needed the 
underlying Input/Output format would need to support it || ||
+ || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknown but hopefully low || If compression is needed, the 
underlying Input/Output format would need to support it || ||
  || Bzip compressed files in PigStorage format can no longer have .bz 
extension  || Handling Compressed Data || Low || 1. Rename existing .bz files 
to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This 
change is due to the fact that Text{Input/Output}Format only supports bz2 
extension ||
  || Switching to Hadoop's local mode || Local Mode || Low || None || Main 
change is 10-20x performance slowdown. Also, local mode now uses the same UDF 
interfaces to execute UDFs as the MR mode. ||
  || Removing support for Load-Stream or Stream-Store optimization || Streaming 
|| Low to None || None || This feature was never documented so it is unlikely 
it was ever used ||


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=28&rev2=29

--

  
  || Change || Section || Impact || Steps to address || Comments ||
  || Load/Store interface changes || Changes to the Load and Store Functions || 
High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || ||
- || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknow but hopefully low || If compression is needed the 
underlying Input/Output format would need to support it || ||
+ || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknown but hopefully low || If compression is needed the 
underlying Input/Output format would need to support it || ||
- || Bzip compressed files can no longer have .bz extension || Handling 
Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update 
scripts to read/write files with bz2 extension || This change is due to the 
fact that Text{Input/Output}Format only supports bz2 extension ||
+ || Bzip compressed files in PigStorage format can no longer have .bz 
extension  || Handling Compressed Data || Low || 1. Rename existing .bz files 
to .bz2 files. 2. Update scripts to read/write files with bz2 extension || This 
change is due to the fact that Text{Input/Output}Format only supports bz2 
extension ||
- || Switch to Hadoop's local mode || Local Mode || Low || None || Main change 
is 10-20x performance slowdown. Also, local mode now uses the same UDF 
interfaces to execute UDFs as the MR mode. ||
+ || Switching to Hadoop's local mode || Local Mode || Low || None || Main 
change is 10-20x performance slowdown. Also, local mode now uses the same UDF 
interfaces to execute UDFs as the MR mode. ||
- || Load-Stream or Stream-Store optimizations no longer supported || Streaming 
|| Low to None || None || This feature was never documented so it is unlikely 
it was ever used ||
+ || Removing support for Load-Stream or Stream-Store optimization || Streaming 
|| Low to None || None || This feature was never documented so it is unlikely 
it was ever used ||
- || No longer support serialization and decerialization via load/store 
functions || Streaming || Unknown but hopefully low to medium || Implement new 
Serializer/Deserializer interfaces for non-standard serialization || ||
+ || We no longer support serialization and decerialization via load/store 
functions || Streaming || Unknown but hopefully low to medium || Implement new 
PigToStream and StreamToPig interfaces for non-standard serialization || 
LoadStoreRedesignProposal ||
- || Removed BinaryStorage builtin || Streaming || Low to None || None || As 
far as we know, this class was only used internally by streaming ||
+ || Removing BinaryStorage builtin || Streaming || Low to None || None || As 
far as we know, this class was only used internally by streaming ||
- || Split by file feature is removed || Split by File || Low to None || Input 
format of the loader would need to be used || We don't know that this feature 
was widely/ever used ||
+ || Removing Split by file feature || Split by File || Low to None || Input 
format of the loader would need to support this || We don't know that this 
feature was widely/ever used ||
- || Local files no longer accessible from hadoop || Access to Local Files from 
Map-Reduce Mode || low to none || copy the file to the cluster using 
copyToLocal command || This feature was not documented ||
+ || Local files no longer accessible from cluster || Access to Local Files 
from Map-Reduce Mode || low to none || copy the file to the cluster using 
copyToLocal command prior to the load || This feature was not documented ||
- || Removing Custom Comparators || Removing Custom Comparators || Low to None 
|| None || This feature has been depricated since Pig 0.5.0 release. We don't 
have a single known use case ||
+ || Removing Custom Comparators || Removing Custom Comparators || Low to None 
|| None || This feature has been deprecated since Pig 0.5.0 release. We don't 
have a single known use case ||
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=27&rev2=28

--

  || Load-Stream or Stream-Store optimizations no longer supported || Streaming 
|| Low to None || None || This feature was never documented so it is unlikely 
it was ever used ||
  || No longer support serialization and decerialization via load/store 
functions || Streaming || Unknown but hopefully low to medium || Implement new 
Serializer/Deserializer interfaces for non-standard serialization || ||
  || Removed BinaryStorage builtin || Streaming || Low to None || None || As 
far as we know, this class was only used internally by streaming ||
+ || Split by file feature is removed || Split by File || Low to None || Input 
format of the loader would need to be used || We don't know that this feature 
was widely/ever used ||
+ || Local files no longer accessible from hadoop || Access to Local Files from 
Map-Reduce Mode || low to none || copy the file to the cluster using 
copyToLocal command || This feature was not documented ||
+ || Removing Custom Comparators || Removing Custom Comparators || Low to None 
|| None || This feature has been depricated since Pig 0.5.0 release. We don't 
have a single known use case ||
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=26&rev2=27

--

  || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknow but hopefully low || If compression is needed the 
underlying Input/Output format would need to support it || ||
  || Bzip compressed files can no longer have .bz extension || Handling 
Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update 
scripts to read/write files with bz2 extension || This change is due to the 
fact that Text{Input/Output}Format only supports bz2 extension ||
  || Switch to Hadoop's local mode || Local Mode || Low || None || Main change 
is 10-20x performance slowdown. Also, local mode now uses the same UDF 
interfaces to execute UDFs as the MR mode. ||
+ || Load-Stream or Stream-Store optimizations no longer supported || Streaming 
|| Low to None || None || This feature was never documented so it is unlikely 
it was ever used ||
+ || No longer support serialization and decerialization via load/store 
functions || Streaming || Unknown but hopefully low to medium || Implement new 
Serializer/Deserializer interfaces for non-standard serialization || ||
+ || Removed BinaryStorage builtin || Streaming || Low to None || None || As 
far as we know, this class was only used internally by streaming ||
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=25&rev2=26

--

  
  == Summary ==
  
- || Change || Impact || Steps to address || Comments ||
+ || Change || Section || Impact || Steps to address || Comments ||
- || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load 
Store Migration Guide]] || ||
+ || Load/Store interface changes || Changes to the Load and Store Functions || 
High || [[LoadStoreMigrationGuide || Load Store Migration Guide]] || ||
+ || Data compression becomes load/store function specific || Handling 
Compressed Data || Unknow but hopefully low || If compression is needed the 
underlying Input/Output format would need to support it || ||
- || Bzip compressed files can no longer have .bz extension || Low || 1. Rename 
existing .bz files to .bz2 files. 2. Update scripts to read/write files with 
bz2 extension || This change is due to the fact that Text{Input/Output}Format 
only supports bz2 extension ||
+ || Bzip compressed files can no longer have .bz extension || Handling 
Compressed Data || Low || 1. Rename existing .bz files to .bz2 files. 2. Update 
scripts to read/write files with bz2 extension || This change is due to the 
fact that Text{Input/Output}Format only supports bz2 extension ||
+ || Switch to Hadoop's local mode || Local Mode || Low || None || Main change 
is 10-20x performance slowdown. Also, local mode now uses the same UDF 
interfaces to execute UDFs as the MR mode. ||
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=24&rev2=25

--

  = Backward incompatible changes in Pig 0.7.0 =
  
  Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
+ 
+ == Summary ==
  
  || Change || Impact || Steps to address || Comments ||
  || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load 
Store Migration Guide]] || ||


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=23&rev2=24

--

  
  Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
  
- | Change | Impact | Steps to address | Comments |
+ || Change || Impact || Steps to address || Comments ||
- | Load/Store interface changes | High | [[LoadStoreMigrationGuide | Load 
Store Migration Guide]] | |
+ || Load/Store interface changes || High || [[LoadStoreMigrationGuide || Load 
Store Migration Guide]] || ||
- | Bzip compressed files can no longer have .bz extension | Low | 1. Rename 
existing .bz files to .bz2 files. 2. Update scripts to read/write files with 
bz2 extension | This change is due to the fact that Text{Input/Output}Format 
only supports bz2 extension |
+ || Bzip compressed files can no longer have .bz extension || Low || 1. Rename 
existing .bz files to .bz2 files. 2. Update scripts to read/write files with 
bz2 extension || This change is due to the fact that Text{Input/Output}Format 
only supports bz2 extension ||
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=22&rev2=23

--

  = Backward incompatible changes in Pig 0.7.0 =
  
  Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
+ 
+ | Change | Impact | Steps to address | Comments |
+ | Load/Store interface changes | High | [[LoadStoreMigrationGuide | Load 
Store Migration Guide]] | |
+ | Bzip compressed files can no longer have .bz extension | Low | 1. Rename 
existing .bz files to .bz2 files. 2. Update scripts to read/write files with 
bz2 extension | This change is due to the fact that Text{Input/Output}Format 
only supports bz2 extension |
  
  == Changes to the Load and Store Functions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-23 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=21&rev2=22

--

  
  With Pig 0.7.0 the read/write functionality is taking over by Hadoop's 
Input/OutputFormat and how compression is handled or whether it is handled at 
all depends on the Input/OutputFormat used by the loader/store function.
  
- The main input format that supports compression is TextInputFormat. It 
supports bzip files with .bz2 extension and gzip files with .gz extension. 
'''Note that it does not support .bz files'''. PigStorage is the only loader 
that comes with Pig that is derived from TextInputFormat which means it will be 
able to handle .bz2 and .gz files. Other laders such as BinStorage will no 
longer support compression.
+ The main input format that supports compression is TextInputFormat. It 
supports bzip files with .bz2 extension and gzip files with .gz extension. 
'''Note that it does not support .bz files'''. PigStorage is the only loader 
that comes with Pig that is derived from TextInputFormat which means it will be 
able to handle .bz2 and .gz files. Other loaders such as BinStorage will no 
longer support compression.
  
  On the store side, TextOutputFormat also supports compression but the store 
function needs do to additional work to enable it. Again, PigStorage will 
support compressions while other functions will not.
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2010-02-01 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=14&rev2=15

--

  
  This functionality was added to deal with gap in Pig's early functionality - 
lack of numeric comparison in order by as well as lack of descending sort. This 
functionality has been present in last 4 releases and custom comparators has 
been depricated in the last several releases. They functionality is removed in 
this release.
  
- == Open Questions ==
- 
- Q: Should String->Text conversion be part of this release.
- A: Pros: 20-30% improved memory utilization; cons: more compatibility is 
broken.
- 


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=11&rev2=12

--

  
  }}}
  
- == Removing Custom Comparators
+ == Removing Custom Comparators ==
  
  This functionality was added to deal with gap in Pig's early functionality - 
lack of numeric comparison in order by as well as lack of descending sort. This 
functionality has been present in last 4 releases and custom comparators has 
been depricated in the last several releases. They functionality is removed in 
this release.
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=10&rev2=11

--

  
  }}}
  
+ == Removing Custom Comparators
+ 
+ This functionality was added to deal with gap in Pig's early functionality - 
lack of numeric comparison in order by as well as lack of descending sort. This 
functionality has been present in last 4 releases and custom comparators has 
been depricated in the last several releases. They functionality is removed in 
this release.
+ 
  == Open Questions ==
  
  Q: Should String->Text conversion be part of this release.


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-22 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=8&rev2=9

--

  
  First, in the initial (0.7.0) release, '''we will not support optimization''' 
where if streaming follows load of compatible format or is followed by format 
compatible store the data is not parsed but passed in chunks from the loader or 
to the store. The main reason we are not porting the optimization is that the 
work is not trivial and the optimization was never documented and so unlikely 
to be used.
  
- Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The default (PigStorage) format will 
continue to work. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
+ Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The default (PigStorage) format will 
continue to work. This formar is now implemented by a class called 
org.apache.pig.impl.streaming.PigStreaming that can be also used directly in 
the streaming statement. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
+ 
+ We have also removed org.apache.pig.builtin.BinaryStorage loader/store 
function and org.apache.pig.builtin.PigDump which were only used from within 
straming. They can be restored if needed - we would just need to implement the 
corresponding Input/OutputFormats.
  
  == Split by File ==
  
@@ -44, +46 @@

  
  We will have a different approach for streaming optimization if that 
functionality is necessary.
  
- == Access to Local Files from Map-Reduce Mode
+ == Access to Local Files from Map-Reduce Mode ==
  
  In the earlier version of Pig, you could access a local file from map-reduce 
mode by prepending file:// to the file location:
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-22 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=7&rev2=8

--

  
  We will have a different approach for streaming optimization if that 
functionality is necessary.
  
+ == Access to Local Files from Map-Reduce Mode
+ 
+ In the earlier version of Pig, you could access a local file from map-reduce 
mode by prepending file:// to the file location:
+ 
+ {{{
+ A = load 'file:/mydir/myfile';
+ ...
+ }}}
+ 
+ When Pig processed this statement, it would first copy the data to DFS and 
then import it into the execution pipeline.
+ 
+ In Pig 0.7.0, you can no longer do this and if this functionality is still 
desired, you can add the copy into your script manually:
+ 
+ {{{
+ fs copyFromLocal src dist
+ A = load 'dist';
+ 
+ }}}
+ 
  == Open Questions ==
  
  Q: Should String->Text conversion be part of this release.


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=6&rev2=7

--

  
  == Local Mode ==
  
- The main change here is that we switched from Pig's native local mode to 
Hadoop's local mode. This change should be transparent for most applications. 
Possible differnces you will see:
+ The main change here is that we switched from Pig's native local mode to 
Hadoop's local mode. This change should be transparent for most applications. 
Possible differnces you will see are:
  
   1. Hadoop local mode is about order of magnitude slower than Pig's local 
mode. Something that Hadoop team promised to address.
-  2. For algebraic functions, no the entire Algebraic interface will be used 
which is likely a good think if you are using local mode for testing your 
production applications.
+  2. For algebraic functions, now the entire Algebraic interface will be used 
which is likely a good thing if you are using local mode for testing your 
production applications.
  
  == Streaming ==
  
  There are two things that are changing in streaming.
  
- First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlikely to be used.
+ First, in the initial (0.7.0) release, '''we will not support optimization''' 
where if streaming follows load of compatible format or is followed by format 
compatible store the data is not parsed but passed in chunks from the loader or 
to the store. The main reason we are not porting the optimization is that the 
work is not trivial and the optimization was never documented and so unlikely 
to be used.
  
  Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The default (PigStorage) format will 
continue to work. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
  
@@ -46, +46 @@

  
  == Open Questions ==
  
+ Q: Should String->Text conversion be part of this release.
+ A: Pros: 20-30% improved memory utilization; cons: more compatibility is 
broken.
+ 


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=5&rev2=6

--

  
  == Changes to the Load and Store Functions ==
  
- TBW [Need to take a load (with and withoutcustom slicer) and a store function 
and create new versions as examples. Can use PigStorage for (1) and (3) but 
need some loader for (2).]
+ TBW [Need to take a load (with and without custom slicer) and a store 
function and create new versions as examples. Can use PigStorage for (1) and 
(3) but need to choose a loader for (2).]
  
  
  == Handling Compressed Data ==
@@ -32, +32 @@

  
  There are two things that are changing in streaming.
  
- First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlekly to be used.
+ First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlikely to be used.
  
- Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The defaul (PigStorage) format will 
continue to work. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
+ Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The default (PigStorage) format will 
continue to work. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
  
  == Split by File ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=4&rev2=5

--

  
  == Changes to the Load and Store Functions ==
  
- TBW
+ TBW [Need to take a load (with and withoutcustom slicer) and a store function 
and create new versions as examples. Can use PigStorage for (1) and (3) but 
need some loader for (2).]
+ 
  
  == Handling Compressed Data ==
  
@@ -21, +22 @@

  If you have a custom load/store function that needs to support compression, 
you would need to make sure that the underlying Input/OutputFormat supports 
this type of compression.
  
  == Local Mode ==
+ 
+ The main change here is that we switched from Pig's native local mode to 
Hadoop's local mode. This change should be transparent for most applications. 
Possible differnces you will see:
+ 
+  1. Hadoop local mode is about order of magnitude slower than Pig's local 
mode. Something that Hadoop team promised to address.
+  2. For algebraic functions, no the entire Algebraic interface will be used 
which is likely a good think if you are using local mode for testing your 
production applications.
+ 
  == Streaming ==
  
  There are two things that are changing in streaming.
  
  First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlekly to be used.
  
- Second, '''you can no longer use load/store functions for 
(de)serialization.''' 
+ Second, '''you can no longer use load/store functions for 
(de)serialization.''' A new interface has been defined that needed to be 
implemented for custom (de)serializations. The defaul (PigStorage) format will 
continue to work. Details of the new interface are describe in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal.
  
  == Split by File ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=3&rev2=4

--

  Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
  
  == Changes to the Load and Store Functions ==
+ 
+ TBW
+ 
  == Handling Compressed Data ==
  
  In 0.6.0 or earlier versions Pig supported bzip compressed files with 
extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. 
Pig was able to both read and write files in this format with the understanding 
that gzip compressed files could not be split across multiple maps while bzip 
compressed files could. Also, data compression was completely decoupled from 
the data format and Load/Store functions meaning that any loader could read 
compressed data and any store function could write it just by the virtue of 
having the right extension on the files it was reading or writing.
@@ -19, +22 @@

  
  == Local Mode ==
  == Streaming ==
+ 
+ There are two things that are changing in streaming.
+ 
+ First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlekly to be used.
+ 
+ Second, '''you can no longer use load/store functions for 
(de)serialization.''' 
+ 
  == Split by File ==
  
  In the earlier versions of Pig, a user could specify "split by file" on the 
loader statement which would make sure that each map got the entire file rather 
than the files were further divided into blocks. This feature was primarily 
design for streaming optimization but could also be used with loaders that 
can't deal with incomplete records. We don't believe that this functionality 
has been widely used.


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=2&rev2=3

--

  
  == Local Mode ==
  == Streaming ==
- == Other Changes ==
+ == Split by File ==
  
- - Split by file
+ In the earlier versions of Pig, a user could specify "split by file" on the 
loader statement which would make sure that each map got the entire file rather 
than the files were further divided into blocks. This feature was primarily 
design for streaming optimization but could also be used with loaders that 
can't deal with incomplete records. We don't believe that this functionality 
has been widely used.
+ 
+ Because the slicing of the data is no longer in Pig's control, we can't 
support this feature generically for every loader. If a particular loader needs 
this functionality, it will need to make sure that the underlying InputFormat 
supports it. 
+ 
+ We will have a different approach for streaming optimization if that 
functionality is necessary.
  
  == Open Questions ==
  


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=1&rev2=2

--

  = Backward incompatible changes in Pig 0.7.0 =
  
- Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of this changes will 
not be backward compatible and will require users to change the pig scripts or 
their UDFs. This document is intended to keep track of this changes to that we 
can document them for the release.
+ Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
  
- == Changes to the Load and Store functions ==
+ == Changes to the Load and Store Functions ==
  == Handling Compressed Data ==
+ 
+ In 0.6.0 or earlier versions Pig supported bzip compressed files with 
extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. 
Pig was able to both read and write files in this format with the understanding 
that gzip compressed files could not be split across multiple maps while bzip 
compressed files could. Also, data compression was completely decoupled from 
the data format and Load/Store functions meaning that any loader could read 
compressed data and any store function could write it just by the virtue of 
having the right extension on the files it was reading or writing.
+ 
+ With Pig 0.7.0 the read/write functionality is taking over by Hadoop's 
Input/OutputFormat and how compression is handled or whether it is handled at 
all depends on the Input/OutputFormat used by the loader/store function.
+ 
+ The main input format that supports compression is TextInputFormat. It 
supports bzip files with .bz2 extension and gzip files with .gz extension. 
'''Note that it does not support .bz files'''. PigStorage is the only loader 
that comes with Pig that is derived from TextInputFormat which means it will be 
able to handle .bz2 and .gz files. Other laders such as BinStorage will no 
longer support compression.
+ 
+ On the store side, TextOutputFormat also supports compression but the store 
function needs do to additional work to enable it. Again, PigStorage will 
support compressions while other functions will not.
+ 
+ If you have a custom load/store function that needs to support compression, 
you would need to make sure that the underlying Input/OutputFormat supports 
this type of compression.
+ 
  == Local Mode ==
  == Streaming ==
  == Other Changes ==


[Pig Wiki] Update of "Pig070IncompatibleChanges" by Olg aN

2009-12-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by OlgaN.
http://wiki.apache.org/pig/Pig070IncompatibleChanges

--

New page:
= Backward incompatible changes in Pig 0.7.0 =

Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of this changes will 
not be backward compatible and will require users to change the pig scripts or 
their UDFs. This document is intended to keep track of this changes to that we 
can document them for the release.

== Changes to the Load and Store functions ==
== Handling Compressed Data ==
== Local Mode ==
== Streaming ==
== Other Changes ==

- Split by file

== Open Questions ==