Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "LoadStoreRedesignProposal" page has been changed by AlanGates: http://wiki.apache.org/pig/LoadStoreRedesignProposal?action=diff&rev1=6&rev2=7 type conversion on this data will be done in the same way as noted above for !InputFormatLoader. Open Questions: - 1. Does all this force us to switch to Hadoop for local mode as well? We aren't opposed to using Hadoop for local mode it just needs to get reasonable fast. Can we use !InputFormat ''et. al.'' on local files without using the whole HDFS structure? + 1. Does all this force us to switch to Hadoop for local mode as well? We aren't opposed to using Hadoop for local mode it just needs to get reasonable fast. Can we use !InputFormat ''et. al.'' on local files without using the whole HDFS structure? '''Answer''' According to Hadoop documentation !TextInputFormat works on local files as well as hdfs files. We may need to catch that we are in local mode and change the filename to `file://` + 1. How will we worked with compressed files? !FileInputFormat already works with bzip and gzip compressed files, producing reasonable splits. !PigStorage will be reworked to depend on !FileInputFormat (or a descendant thereof, see next item) and should therefore be able to use this functionality. + 1. How will the need for mark and seek in index construction for merge join be handled? In the long term we'd like Hadoop to handle this for us by creating a !SeekableInputFormat that would add this functionality. In the meantime we can extend !FileInputFormat to !PigFileInputFormat. We can add getPos() call to this class that will provide a position to start reading at to find the tuple being indexed. Note that this position will not necessarily be the exact position of the tuple, but a position from which the tuple can be found. We can also change the getSplits call on this method to return a split that is specific to a given position so that it can be used during the join. == Changes == Sept 23 2009, Gates @@ -478, +480 @@ Sept 25 2009, Gates * Added allFinished call to !StoreFunc + Sept 29 2009, Gates + * Added answer for open question 1. Added and answered open questions 2 and 3. +