Hi Daniel, PIG-4047 is created for tracing this discussion and patch is available for review. Could you help to review it? The patch file is based on trunk branch. I assume site is meant to trunk branch. Please correct me if I was wrong.
Thanks At 2014-07-02 08:33:21, "Daniel Dai" <da...@hortonworks.com> wrote: >I see what's happening. JarManager finds the enclosing jar from >classpath and wrap those into job.jar. Originally I want to ship >dependent jars separately through distributed cache, so we don't have >to create job.jar every time, and those jars will get reused due to >PIG-2672, and for Pig on tez, there are more chance we can get >session/container reuse since we don't rely on job.jar which differs >job by job. This will solve the pig-withouthadoop.jar eventually. But >now I realize that pig-withouthadoop.jar issue can be solved >independently by just putting dependent jars on lib and use >pig-core.jar instead of pig-withouthadoop.jar. That's doable, and even >if I go for distributed cache, I still need to put pig-core.jar in the >place of pig-withouthadoop.jar and dependent jars in lib, there is no >conflict. So go ahead do that. > >Thanks, >Daniel > >On Mon, Jun 30, 2014 at 11:34 PM, lulynn_2008 <lulynn_2...@163.com> wrote: >> Hi Deniel, >> with new struntured pig package, the scripts run succeeded with pig-0.12.0. >> I just did following: >> 1. devided without hadoop jar into pig core and pig core dependencies. >> 2. save jars in 1# in lib directory >> 3. in pig script, always add all the jars in lib directory into classpath >> and add pig core jar into classpath. >> >> Here is the test case and result: >> [pig@hostname bin]$ ./pig >> grunt> a = load 'studenttab10k' as (name:chararray, age:int, gpa:double); >> grunt> b = filter a by name matches '.*or.*'; >> grunt> dump b; >> INFO [JobControl] org.apache.hadoop.mapreduce.lib.input.FileInputFormat >> - Total input paths to process : 1 >> INFO [main] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - >> Total input paths to process : 1 >> (.hor.h,11,8.0) >> grunt> dump a; >> INFO [JobControl] org.apache.hadoop.mapreduce.lib.input.FileInputFormat >> - Total input paths to process : 1 >> INFO [main] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - >> Total input paths to process : 1 >> (lynn,18,4.5) >> (lulu,12,4.4) >> (fat,12,5.0) >> (.hor.h,11,8.0) >> Please forgive me if I was wrong. I think they should be the same to >> backend, if we add without hadoop into classpath or add pig core and core >> dependencies one by one. Pig should be OK once we added all the dependencies >> into classpath. Not matter in which way. >> >> Do you have any suggestion on other test or comment? >> >> Thanks >> >> >> >> >> >> >> >> At 2014-06-25 11:51:24, "Daniel Dai" <da...@hortonworks.com> wrote: >>>Pig will not fail every time, but you will hit it depending on your >>>script. Try the following script: >>> >>>a = load 'studenttab10k' as (name:chararray, age:int, gpa:double); >>>b = filter a by name matches '.*or.*'; >>>dump b; >>> >>>Thanks, >>>Daniel >>> >>>On Tue, Jun 24, 2014 at 7:23 PM, lulynn_2008 <lulynn_2...@163.com> wrote: >>>> Hi Deniel, >>>> Thanks for your detail. >>>> >>>> For your following reply: >>>> " Yes, that's a better solution. However, those jars are also needed in the >>>> backend and we need to ship those jars into distributed cache. There will >>>> be >>>> some code change to make it happen. Just change pig script is not >>>> sufficient." >>>> To distributed cache part, could you provide an example? To show how thoese >>>> jars are shipped. Thanks >>>> >>>> Since release 0.10.0, I am using pig with the structure I mentioned at 2#. >>>> That is >>>> - deviding withouthadoop jar into pig core and dependencies. >>>> - case all the dependencies into lib directory >>>> - add pig core and these dependencies into classpath in pig script instead >>>> of withouthadoop jar. >>>> I also used pig-0.11.1, 0.12.0, and alwayse running pig with pig grunt. Did >>>> not encounter issues/errors till now. I run avro/hbase/hive/hadoop related >>>> things. Did I miss anything to test this part? Please give your >>>> suggestions. >>>> Thanks >>>> >>>> >>>> >>>> >>>> At 2014-06-25 02:34:03, "Daniel Dai" <da...@hortonworks.com> wrote: >>>>>On Mon, Jun 23, 2014 at 8:13 PM, lulynn_2008 <lulynn_2...@163.com> wrote: >>>>>> Hi All, >>>>>> In build.xml in branch-0.13, >>>>>> 1. following jars are included into lib directory during packaging: >>>>>> <copy todir="${tar.dist.dir}/lib"> >>>>>> <fileset dir="${ivy.lib.dir}" includes="jython-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="jruby-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="groovy-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="js-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="hbase-*.jar" >>>>>> excludes="hbase-*tests.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" >>>>>> includes="protobuf-java-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="zookeeper-*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="accumulo-*.jar" >>>>>> excludes="accumulo-minicluster*.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="avro-*.jar" >>>>>> excludes="avro-*tests.jar"/> >>>>>> <fileset dir="${ivy.lib.dir}" includes="json-simple-*.jar"/> >>>>>> </copy> >>>>>> 2. following jars are included in without hadoop jar file: >>>>>> <fileset dir="${ivy.lib.dir}" >>>>>> id="runtime.dependencies-withouthadoop.jar"> >>>>>> <patternset id="pattern.runtime.dependencies-withouthadoop.jar"> >>>>>> <include name="antlr-runtime-${antlr.version}.jar"/> >>>>>> <include name="ST4-${stringtemplate.version}.jar"/> >>>>>> <include name="jline-${jline.version}.jar"/> >>>>>> <include name="jackson-mapper-asl-${jackson.version}.jar"/> >>>>>> <include name="jackson-core-asl-${jackson.version}.jar"/> >>>>>> <include name="joda-time-${joda-time.version}.jar"/> >>>>>> <include name="guava-${guava.version}.jar"/> >>>>>> <include name="automaton-${automaton.version}.jar"/> >>>>>> <include name="jansi-${jansi.version}.jar"/> >>>>>> <include name="avro-${avro.version}.jar"/> >>>>>> <include name="avro-mapred-${avro.version}.jar"/> >>>>>> <include name="trevni-core-${avro.version}.jar"/> >>>>>> <include name="trevni-avro-${avro.version}.jar"/> >>>>>> <include name="snappy-java-${snappy. >>>> version}.jar"/> >>>>>> <include name="asm*.jar"/> >>>>>> </patternset> >>>>>> </fileset> >>>>>> Questions: >>>>>> 1. Could you tell what the jars in 1# and 2# are used for? What are the >>>>>> differences between them? >>>>>> 1. Seems all the jars in 1# and 2# will be necessary when pig running at >>>>>> hadoop cluster. Correct? >>>>> >>>>>#2 are core dependency Pig depends on, those will be wrapped into >>>>>pig-withouthadoop.jar. #1 are the convenience jar used only when user >>>>>are using specific loader, such as HBaseStorage/AvroStorage, those are >>>>>not needed if you don't use such loader, and will not be wrapped into >>>>>pig-withouthadoop.jar >>>>> >>>>>> 2. without hadoop jar file will be invoked during pig running in hadoop >>>>>> cluster, and this jar file includes pig core and some dependencies. >>>>>> Could we >>>>>> just generate pig core jar and move all the dependencies into lib >>>>>> directory? >>>>>> Then in pig script, we could alwayse add pig core into classpath add the >>>>>> dependencies into classpath in "if [ -n "$HADOOP_BIN" ]; then" option. If >>>>>> this: >>>>>> - it would be clear for user to check dependencies version >>>>>> - it would be easiler to mantain dependencies, Like version update. >>>>>> Sometime, I would like to check whether pig could work with difference >>>>>> version dependencies. >>>>> >>>>>Yes, that's a better solution. However, those jars are also needed in >>>>>the backend and we need to ship those jars into distributed cache. >>>>>There will be some code change to make it happen. Just change pig >>>>>script is not sufficient. >>>>> >>>>>> Do you have any concern? >>>>>> >>>>>> Thanks >>>>>> >>>>> >>>>>-- >>>>>CONFIDENTIALITY NOTICE >>>>>NOTICE: This message is intended for the use of the individual or entity to >>>>>which it is addressed and may contain information that is confidential, >>>>>privileged and exempt from disclosure under applicable law. If the reader >>>>>of this message is not the intended recipient, you are hereby notified that >>>>>any printing, copying, dissemination, distribution, disclosure or >>>>>forwarding of this communication is strictly prohibited. If you have >>>>>received this communication in error, please contact the sender immediately >>>>>and delete it from your system. Thank You. >>>> >>>> >>>> >>> >>>-- >>>CONFIDENTIALITY NOTICE >>>NOTICE: This message is intended for the use of the individual or entity to >>>which it is addressed and may contain information that is confidential, >>>privileged and exempt from disclosure under applicable law. If the reader >>>of this message is not the intended recipient, you are hereby notified that >>>any printing, copying, dissemination, distribution, disclosure or >>>forwarding of this communication is strictly prohibited. If you have >>>received this communication in error, please contact the sender immediately >>>and delete it from your system. Thank You. > >-- >CONFIDENTIALITY NOTICE >NOTICE: This message is intended for the use of the individual or entity to >which it is addressed and may contain information that is confidential, >privileged and exempt from disclosure under applicable law. If the reader >of this message is not the intended recipient, you are hereby notified that >any printing, copying, dissemination, distribution, disclosure or >forwarding of this communication is strictly prohibited. If you have >received this communication in error, please contact the sender immediately >and delete it from your system. Thank You.