RE: [jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-04-28 Thread Richard Ding
Hi David,

This is exactly the problem that the multi-query optimization project is
addressing. Please see the following link for details:

http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification


Thanks,
-Richard

-Original Message-
From: David Ciemiewicz (JIRA) [mailto:j...@apache.org] 
Sent: Tuesday, April 28, 2009 7:43 AM
To: pig-dev@hadoop.apache.org
Subject: [jira] Commented: (PIG-777) Code refactoring: Create
optimization out of store/load post processing code


[
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703659#ac
tion_12703659 ] 

David Ciemiewicz commented on PIG-777:
--

This seems like it could be useful but I don't understand the full issue
as a user.

I often want to compute intermediate summaries, store them, and then
continue computation.

{code}A = load ...
...
store D into ...
E = group D by ...
...
store H into ...{code}

The problem I encountered in earlier versions of Pig was that to PREVENT
two executions of steps
A thru D, I had to introduce a load step before E:

{code}A = load ...
...
store D into ...
D = load ...
E = group D by ...
...
store H into ...{code}

It's great that you will be introducing code that possibly eliminates D
= load in the execution.

However, is anything being done so that I don't need to introduce D =
load in the first place?

 Code refactoring: Create optimization out of store/load post
processing code




 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner

 The postProcessing method in the pig server checks whether a logical
graph contains stores to and loads from the same location. If so, it
will either connect the store and load, or optimize by throwing out the
load and connecting the store predecessor with the successor of the
load.
 Ideally the introduction of the store and load connection should
happen in the query compiler, while the optimization should then happen
in an separate optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-04-28 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703764#action_12703764
 ] 

David Ciemiewicz commented on PIG-777:
--

Another thing ...

If you eliminate the D = load statement, could you provide some information to 
the user that this optimization is taking place?

It would help me immensely with code maintenance if I could eliminate the D = 
load steps which often require recoding the AS clause schema.

 Code refactoring: Create optimization out of store/load post processing code
 

 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner

 The postProcessing method in the pig server checks whether a logical graph 
 contains stores to and loads from the same location. If so, it will either 
 connect the store and load, or optimize by throwing out the load and 
 connecting the store predecessor with the successor of the load.
 Ideally the introduction of the store and load connection should happen in 
 the query compiler, while the optimization should then happen in an separate 
 optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703792#action_12703792
 ] 

Alan Gates commented on PIG-627:


Checked in multiquery-phase3_0423.patch to multiquery branch.

 PERFORMANCE: multi-query optimization
 -

 Key: PIG-627
 URL: https://issues.apache.org/jira/browse/PIG-627
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: doc-fix.patch, error_handling_0415.patch, 
 error_handling_0416.patch, file_cmds-0305.patch, fix_store_prob.patch, 
 merge-041409.patch, merge_741727_HEAD__0324.patch, 
 merge_741727_HEAD__0324_2.patch, merge_trunk_to_branch.patch, 
 multi-store-0303.patch, multi-store-0304.patch, multiquery-phase2_0313.patch, 
 multiquery-phase2_0323.patch, multiquery-phase3_0423.patch, 
 multiquery_0223.patch, multiquery_0224.patch, multiquery_0306.patch, 
 multiquery_explain_fix.patch, non_reversible_store_load_dependencies.patch, 
 non_reversible_store_load_dependencies_2.patch, 
 noop_filter_absolute_path_flag.patch, 
 noop_filter_absolute_path_flag_0401.patch, streaming-fix.patch


 Currently, if your Pig script contains multiple stores and some shared 
 computation, Pig will execute several independent queries. For instance:
 A = load 'data' as (a, b, c);
 B = filter A by a  5;
 store B into 'output1';
 C = group B by b;
 store C into 'output2';
 This script will result in map-only job that generated output1 followed by a 
 map-reduce job that generated output2. As the resuld data is read, parsed and 
 filetered twice which is unnecessary and costly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-774) Pig does not handle Chinese characters (in both the parameter subsitution using -param_file or embedded in the Pig script) correctly

2009-04-28 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703937#action_12703937
 ] 

Viraj Bhat commented on PIG-774:


Daniel, 
 Thanks again for your patch, I worked with Pradeep and changed the parser code 
to invoke that behavior you suggested and then filed Jira PIG-774. 
Here is one problem that I faced..
Suppose I have a script like this, known as chinese_data.pig
{code}
rmf chineseoutput;
%default querystring 'myquery';
I = load '/user/viraj/chinese.txt' using PigStorage('\u0001');

--dump I;

J = filter I by $0 == '$querystring';
--J = filter I by $0 == '   歌手香港情牽女人心演唱會';

--store J into 'chineseoutput';
dump J;
{code}

I have a parameter file known as nextgen_paramfile which contains the 
$querystring variable..

{code}
querystring=   歌手香港情牽女人心演唱會
{code}

I run the above script and parameter file as:
{code}
java -cp pig.jar:/home/viraj/hadoop-0.18.0-dev/conf/ -Dhod.server='' 
org.apache.pig.Main -param_file nextgen_paramfile chinese_data.pig
{code}

I get the following error:

2009-04-29 01:05:14,979 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://localhost:9000
2009-04-29 01:05:16,328 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
map-reduce job tracker at: localhost:9001
2009-04-29 01:05:16,907 [main] INFO  org.apache.pig.PigServer - Create a new 
graph.
2009-04-29 01:05:17,794 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Lexical error at line 7, column 33.  Encountered: 
\u6b4c (27468), after : 

I realized that it was something to do with the commented line in the pig 
script. 
{code}
--J = filter I by $0 == '   歌手香港情牽女人心演唱會';
{code}
Why is that so, I am attaching the pig_*log on this Jira.

Additionally I found that the parameter substitution is happening correctly 
when I run the script as:
{code}
java -cp pig.jar:/home/viraj/hadoop-0.18.0-dev/conf/ -Dhod.server='' 
org.apache.pig.Main -param_file nextgen_paramfile -r chinese_data.pig
{code}
The substituted file, chinese_data.pig.substituted is correct.

Viraj

 Pig does not handle Chinese characters (in both the parameter subsitution 
 using -param_file or embedded in the Pig script) correctly
 

 Key: PIG-774
 URL: https://issues.apache.org/jira/browse/PIG-774
 Project: Pig
  Issue Type: Bug
  Components: grunt, impl
Affects Versions: 0.0.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.0.0

 Attachments: chinese.txt, chinese_data.pig, nextgen_paramfile, 
 utf8_parser-1.patch


 I created a very small test case in which I did the following.
 1) Created a UTF-8 file which contained a query string in Chinese and wrote 
 it to HDFS. I used this dfs file as an input for the tests.
 2) Created a parameter file which also contained the same query string as in 
 Step 1.
 3) Created a Pig script which takes in the parametrized query string and hard 
 coded Chinese character.
 
 Pig script: chinese_data.pig
 
 {code}
 rmf chineseoutput;
 I = load '/user/viraj/chinese.txt' using PigStorage('\u0001');
 J = filter I by $0 == '$querystring';
 --J = filter I by $0 == ' 歌手香港情牽女人心演唱會';
 store J into 'chineseoutput';
 dump J;
 {code}
 =
 Parameter file: nextgen_paramfile
 =
 queryid=20090311
 querystring='   歌手香港情牽女人心演唱會'
 =
 Input file: /user/viraj/chinese.txt
 =
 shell$ hadoop fs -cat /user/viraj/chinese.txt
 歌手香港情牽女人心演唱會
 =
 I ran the above set of inputs in the following ways:
 Run 1:
 =
 {code}
 java -cp pig.jar:/home/viraj/hadoop-0.18.0-dev/conf/ -Dhod.server='' 
 org.apache.pig.Main -param_file nextgen_paramfile chinese_data.pig
 {code}
 =
 2009-04-22 01:31:35,703 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the
 arguments. Applications should implement Tool for the same.
 2009-04-22 01:31:40,700 [main] INFO  
 

[jira] Commented: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-04-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703940#action_12703940
 ] 

Alan Gates commented on PIG-619:


Does fixing this still make sense?  IIRC the main reason for doing the 
store/load thing in the middle was to deal with the fact that Pig couldn't do 
multiple stores in one script without re-running the entire script.  But since 
that is in the process of being changed (see PIG-627), this should no longer be 
necessary.

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
Assignee: Alan Gates
 Attachments: mydata.txt, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-789) coupling load and store in script no longer works

2009-04-28 Thread Alan Gates (JIRA)
coupling load and store in script no longer works
-

 Key: PIG-789
 URL: https://issues.apache.org/jira/browse/PIG-789
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Alan Gates


Many user's pig script do something like this:

a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
c = filter a by age  500;
e = group c by (name, age);
f = foreach e generate group, COUNT($1);
store f into 'bla';
f1 = load 'bla';
g = order f1 by $1;
dump g;

With the inclusion of the multi-query phase2 patch this appears to no longer 
work.  You get an error:

2009-04-28 18:24:50,776 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2100: hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/gates/bla does not exist.

We shouldn't be checking for bla's existence here because it will be created 
eventually by the script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-04-28 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703941#action_12703941
 ] 

Viraj Bhat commented on PIG-619:


So when does the Multi-Store query optimization get committed/merged  into the 
main branch, (where this is default way the multi-store happens). 
Viraj

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
Assignee: Alan Gates
 Attachments: mydata.txt, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-04-28 Thread Viraj Bhat (JIRA)
Error message should indicate in which line number in the Pig script the error 
occured (debugging BinCond)
--

 Key: PIG-790
 URL: https://issues.apache.org/jira/browse/PIG-790
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.0.0
Reporter: Viraj Bhat
Priority: Minor


I have a simple Pig script which loads integer data and does an Bincond, where 
it compares, col1 eq ''. There is an error message that is generated in this 
case, but it does not specify the line number in the script. 
{code}
MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
col2:int);

MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
 ((col1 neq '') ? col1 - col2 : 16)
as time_diff;

dump MYDATA_PROJECT;
{code}

==
2009-04-29 02:33:07,182 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://localhost:9000
2009-04-29 02:33:08,584 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
map-reduce job tracker at: localhost:9001
2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
graph.
2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1039: Incompatible types in EqualTo Operator left hand side:int right hand 
side:chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
==
It would be good if the error message has a line number and a copy of the line 
in the script which is causing the problem.

Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-774) Pig does not handle Chinese characters (in both the parameter subsitution using -param_file or embedded in the Pig script) correctly

2009-04-28 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703977#action_12703977
 ] 

Viraj Bhat commented on PIG-774:


I modified the file PigScriptParser.jj,  and it works.

 Pig does not handle Chinese characters (in both the parameter subsitution 
 using -param_file or embedded in the Pig script) correctly
 

 Key: PIG-774
 URL: https://issues.apache.org/jira/browse/PIG-774
 Project: Pig
  Issue Type: Bug
  Components: grunt, impl
Affects Versions: 0.0.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.0.0

 Attachments: chinese.txt, chinese_data.pig, nextgen_paramfile, 
 pig_1240967860835.log, utf8_parser-1.patch, utf8_parser-2.patch


 I created a very small test case in which I did the following.
 1) Created a UTF-8 file which contained a query string in Chinese and wrote 
 it to HDFS. I used this dfs file as an input for the tests.
 2) Created a parameter file which also contained the same query string as in 
 Step 1.
 3) Created a Pig script which takes in the parametrized query string and hard 
 coded Chinese character.
 
 Pig script: chinese_data.pig
 
 {code}
 rmf chineseoutput;
 I = load '/user/viraj/chinese.txt' using PigStorage('\u0001');
 J = filter I by $0 == '$querystring';
 --J = filter I by $0 == ' 歌手香港情牽女人心演唱會';
 store J into 'chineseoutput';
 dump J;
 {code}
 =
 Parameter file: nextgen_paramfile
 =
 queryid=20090311
 querystring='   歌手香港情牽女人心演唱會'
 =
 Input file: /user/viraj/chinese.txt
 =
 shell$ hadoop fs -cat /user/viraj/chinese.txt
 歌手香港情牽女人心演唱會
 =
 I ran the above set of inputs in the following ways:
 Run 1:
 =
 {code}
 java -cp pig.jar:/home/viraj/hadoop-0.18.0-dev/conf/ -Dhod.server='' 
 org.apache.pig.Main -param_file nextgen_paramfile chinese_data.pig
 {code}
 =
 2009-04-22 01:31:35,703 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the
 arguments. Applications should implement Tool for the same.
 2009-04-22 01:31:40,700 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 0% complete
 2009-04-22 01:31:50,720 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 100% complete
 2009-04-22 01:31:50,720 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 Success!
 =
 Run 2: removed the parameter substitution in the Pig script instead used the 
 following statement.
 =
 {code}
 J = filter I by $0 == ' 歌手香港情牽女人心演唱會';
 {code}
 =
 java -cp pig.jar:/home/viraj/hadoop-0.18.0-dev/conf/ -Dhod.server='' 
 org.apache.pig.Main chinese_data_withoutparam.pig
 =
 2009-04-22 01:35:22,402 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the
 arguments. Applications should implement Tool for the same.
 2009-04-22 01:35:27,399 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 0% complete
 2009-04-22 01:35:32,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 100% complete
 2009-04-22 01:35:32,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  -
 Success!
 =
 In both cases:
 =
 {code}
 shell $ hadoop fs -ls /user/viraj/chineseoutput
 Found 2 items
 drwxr-xr-x   - viraj supergroup  0 2009-04-22 01:37 
 /user/viraj/chineseoutput/_logs
 -rw-r--r--   3 viraj supergroup  0 2009-04-22 01:37 
 /user/viraj/chineseoutput/part-0
 {code}
 =
 Additionally tried the dry-run option to figure out if the parameter 
 substitution was occurring