Hi,

In Pig 0.6 you can extend the PigStorage and grab the name of the file with
something like this:

  @Override
  public void bindTo(String fileName, BufferedPositionedInputStream is, long
offset, long end)
      throws IOException {
    super.bindTo(fileName, is, offset, end);

    this.fileName = fileName; // In your case match with a regexp and get
the group with the name only (e.g. google, baidu)
  }

  @Override
  public Tuple getNext() throws IOException {
    Tuple next = super.getNext();

    if (next != null) {
      next.append(fileName);
    }

    return next;
  }

Then you can group on the name and split on it.

Thanks,

Romain

On Mon, Mar 1, 2010 at 3:09 AM, Jumping <[email protected]> wrote:

> Hi,
> Could pig recognize files name are importing ? If could, how to do ? I want
> to combine them according filename.
>
> Exp:
> google_2009_12_21.csv, google_2010_01_21.csv, google_2010_02_21.csv,
> baidu_2009_11_22.csv, baidu_2010_01_01.csv, baidu_2010_02_03.csv, ....
>
> Sort and combine by name, then output two files:  google_all.csv,
> baidu_all.csv  in a pig script.
>
>
> Best Regards,
> Jumping Qu
>
> ------
> Don't tell me how many enemies we have, but where they are!
> (ADV:Perl -- It's like Java, only it lets you deliver on time and under
> budget.)
>

Reply via email to