[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097845#comment-16097845 ] liyunzhang_intel commented on PIG-5157: --- [~szita]: {quote} it looks like you've missed adding an entry to CHANGES.txt upon commit. I've added it now: {quote} thanks for catch that. [~szita] or [~nkollar]: spend some time to review PIG-5246 if have time, thanks! > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157_15.patch, PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097247#comment-16097247 ] Adam Szita commented on PIG-5157: - [~nkollar] thanks for adding this feature! [~kellyzly] it looks like you've missed adding an entry to CHANGES.txt upon commit. I've added it now: https://github.com/apache/pig/commit/ce8aa41d16dbb3e7c3b97b23c18fd2473e9a5938 > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157_15.patch, PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092811#comment-16092811 ] Nandor Kollar commented on PIG-5157: Thanks [~rohini], [~kellyzly] and [~szita] for your help to resolve this feature! > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157_15.patch, PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092460#comment-16092460 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: commit PIG-5157_15.patch to the trunk, thanks for your development as upgrading to spark2 is a big feature. and also thanks the review of [~szita] and [~rohini]. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089381#comment-16089381 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: left comment on review board, just a small fix! meanwhile , please help review PIG-5246, thanks! > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085443#comment-16085443 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: attached SkewedJoinInput1.txt, SkewedJoinInput2.txt which i used in the testJoin.pig. Please help view whether there is similar error in your env. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch, SkewedJoinInput1.txt, > SkewedJoinInput2.txt > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079868#comment-16079868 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: sorry to reply so late. here is the result after solving the exception i mentioned last time in spark1 with yarn-client mode: {noformat} export SPARK_JAR=hdfs://zly1.sh.intel.com:8020/user/root/spark-assembly-1.6.1-hadoop2.6.0.jar export SPARK_HOME=$SPARK161 #donwload the spark1.6.1 export HADOOP_USER_CLASSPATH_FIRST="true" $PIG_HOME/bin/pig -x spark $PIG_HOME/bin/testJoin.pig {noformat} pig.properties {noformat} pig.sort.readonce.loadfuncs=org.apache.pig.backend.hadoop.hbase.HBaseStorage,org.apache.pig.backend.hadoop.accumulo.AccumuloStorage spark.master=yarn-client {noformat} testJoin.pig {code} A = load './SkewedJoinInput1.txt' as (id,name,n); B = load './SkewedJoinInput2.txt' as (id,name); D = join A by (id,name), B by (id,name) parallel 10; store D into './testJoin.out'; {code} the script fails to generate result and exception found in the log is {noformat} [task-result-getter-0] 2017-07-10 12:16:45,667 WARN scheduler.TaskSetManager (Logging.scala:logWarning(70)) - Lost task 0.0 in stage 0.0 (TID 0, zly1.sh.intel.com): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} can you verify whether there is the same problem your cluster in yarn-client mode(in my cluster, it passed on local mode but failed in yarn-client mode)? the error seems like a problem about datanode but i verified the environment with spark branch code and it passed. So I guess the problem is caused by the patch. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069791#comment-16069791 ] Adam Szita commented on PIG-5157: - [~kellyzly] can you check what Spark jars are on the classpath of the process which is throwing this exception? It seems like a mismatch between Spark classes (what Pig was built with and what's on the cluster now used for execution) > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069728#comment-16069728 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: with PIG-5157_13.patch on review board. the test result : pass on spark1 or spark2 on local mode failed on yarn-client mode( add spark.master=yarn-client in conf/pig.properties) exception message found in log {noformat} [shuffle-server-0] 2017-06-30 14:24:25,501 WARN server.TransportChannelHandler (TransportChannelHandler.java:exceptionCaught(79)) - Excepti on in connection from /10.239.47.58:58214 1264 java.lang.NoSuchMethodError: org.apache.spark.network.client.TransportClient.getChannel()Lio/netty/channel/Channel; {noformat} If you can not reproduce in your env, also tell me, will check whether it is caused by configuraion or not. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062815#comment-16062815 ] Nandor Kollar commented on PIG-5157: [~kellyzly] ok, I'm now executing the full e2e test suite on Spark 1.x, waiting for the results. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062805#comment-16062805 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: with PIG-5157_11.patch , pass in TestGrunt unit test. But has problem in yarn-cluster env, will investigate more. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060633#comment-16060633 ] Nandor Kollar commented on PIG-5157: I'll update on RB soon, ran with my latest patch: {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test ... [junit] Tests run: 67, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 64.505 sec [delete] Deleting directory /var/folders/0n/97lfzsrs3dj1nlgfghj3221wgp/T/pig_junit_tmp1871324592 BUILD SUCCESSFUL Total time: 3 minutes 8 seconds {code} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060622#comment-16060622 ] Nandor Kollar commented on PIG-5157: [~kellyzly] I'll have look at it. It seems that there's an other issue noticed while trying to execute a simple script (not from unit test), Pig hangs. The problem is with synchronized methods in JobMetricsListener and SparkListener: since these are now two different objects, synchronized will lock on different object, and wait/notify is also called on different instances. I'll update the patch so that every method in JobMetricsListener synchronizes on the its sparkListener private field, and calls wait() on this instance. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060574#comment-16060574 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]:looks good, but met some problem when testing in local and yarn-client, give me more time to verify the problem is caused by the configuration or others. thanks! after using this patch, the result of unit test {code} ant -Dtest.junit.output.format=xml clean -Dtestcase=TestGrunt -Dexectype=spark -Dhadoopversion=2 test {code} the result: {noformat} Tests run: 67, Failures: 1, Errors: 5, Skipped: 4, Time elapsed: 138.459 sec {noformat} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059727#comment-16059727 ] Rohini Palaniswamy commented on PIG-5157: - +1 for https://reviews.apache.org/r/59530/diff/7/ . [~nkollar], Can you upload the final patch here? [~kellyzly], Can you retry with the new patch and verify if it works for you? Please go ahead and commit it you are +1 on the patch. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057075#comment-16057075 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: after using the patch and test a simple query in yarn-client env. build jar: {noformat}ant clean -v -Dhadoopversion=2 jar-spark12{noformat} testJoin.pig {code} A = load './SkewedJoinInput1.txt' as (id,name,n); B = load './SkewedJoinInput2.txt' as (id,name); D = join A by (id,name), B by (id,name) parallel 10; store D into './testJoin.out'; {code} spark1: export SPARK_HOME= export export SPARK_JAR=hdfs://:8020/user/root/spark-assembly-1.6.1-hadoop2.6.0.jar $PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig error in logs/pig {noformat} java.lang.NoClassDefFoundError: org/apache/spark/scheduler/SparkListenerInterface at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.(SparkExecutionEngine.java:35) at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42) at org.apache.pig.impl.PigContext.(PigContext.java:269) at org.apache.pig.impl.PigContext.(PigContext.java:256) at org.apache.pig.Main.run(Main.java:389) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListenerInterface at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more {noformat} spark2( patch PIG-5246_2.patch) export SPARK_HOME= $PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig error in logs/pig {noformat} [main] 2017-06-21 14:14:05,791 ERROR spark.JobGraphBuilder (JobGraphBuilder.java:sparkOperToRDD(187)) - throw exception in sparkOperToRDD: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:763) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:762) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:762) at org.apache.spark.api.java.JavaRDDLike$class.mapPartitions(JavaRDDLike.scala:166) at org.apache.spark.api.java.AbstractJavaRDDLike.mapPartitions(JavaRDDLike.scala:45) at org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:64) at org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:45) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:292) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:182) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053644#comment-16053644 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: It builds successful in my env when using ant clean jar-spark12, but give me more time to test it on spark1 and spark2. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050198#comment-16050198 ] Adam Szita commented on PIG-5157: - [~nkollar], [~kellyzly]: I checked latest patch on RB and built the jar successfully by: {code} ant clean jar-spark12 {code} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050123#comment-16050123 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: after i download the latest patch from RB, how to compile now? when i use following command {code} ant clean -v -Dhadoopversion=2 jar-spark12 {code} I got following err {noformat} [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.7 94937 [javac] /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim2.java:29: error: cannot find symbol 94938 [javac] import org.apache.spark.api.java.Optional; 94939 [javac] ^ 94940 [javac] symbol: class Optional 94941 [javac] location: package org.apache.spark.api.java 94942 [javac] /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim2.java:99: error: no interface expected here 94943 [javac] private static class JobMetricsListener extends SparkListener { 94944 [javac] ^ 94945 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-client-1.2.4.jar(org/apache/hadoop/hbase/filter/FilterList.class): warning: Canno t find annotation method 'value()' in type 'SuppressWarnings': class file for edu.umd.cs.findbugs.annotations.SuppressWarnings not found 94946 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-client-1.2.4.jar(org/apache/hadoop/hbase/filter/FilterList.class): warning: Canno t find annotation method 'justification()' in type 'SuppressWarnings' 94947 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-common-1.2.4.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class): warnin g: Cannot find annotation method 'value()' in type 'SuppressWarnings' 94948 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-common-1.2.4.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class): warnin g: Cannot find annotation method 'justification()' in type 'SuppressWarnings' 94949 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-server-1.2.4.jar(org/apache/hadoop/hbase/mapreduce/TableInputFormat.class): warni ng: Cannot find annotation method 'value()' in type 'SuppressWarnings' 94950 [javac] /home/zly/prj/oss/pig/build/ivy/lib/Pig/hbase-server-1.2.4.jar(org/apache/hadoop/hbase/mapreduce/TableInputFormat.class): warni ng: Cannot find annotation method 'justification()' in type 'SuppressWarnings' 94951 [javac] /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim2.java:49: error: is not abstract and does not override abstract method call(T) in FlatMapFunction 94952 [javac] return new FlatMapFunction() { 94953 [javac]^ {noformat} > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048906#comment-16048906 ] Nandor Kollar commented on PIG-5157: [~kellyzly] updated the patch on RB, can you apply it now? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047531#comment-16047531 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: leave some comments on review board. can you update patch with latest code? latest code {noformat} * 5c55102 - (origin/trunk, origin/HEAD) PIG-4700: Enable progress reporting for Tasks in Tez (satishsaley via rohini) (7 days ago) {noformat} when i download the patch from review board and patch like following {code} patch -p0 Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047080#comment-16047080 ] Nandor Kollar commented on PIG-5157: Updated RB with Rohini's proposal (see discussion on PIG-5246 for details) [~szita], [~rohini], [~kellyzly] could you please review? Let me know your ideas about it. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032514#comment-16032514 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: have tested that we can remove JobLogger in spark16. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030991#comment-16030991 ] Nandor Kollar commented on PIG-5157: Ok, thanks, then I'll uncomment that and test it. As for {{spark.eventLog.enabled}}, it requires {{spark.eventLog.dir}} defined too, I think for Spark 2.x we don't have to care about it, since the user can set these if required. Though my change removed the logger completely, and it seems these property are not available for Spark 1.x. My question is: do we need this for Spark 1.x? If so, I'm afraid this should be included into the shims too. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030799#comment-16030799 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: bq. in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? the reason to modify it is because [~rohini] suggested that [memory| is used a lot if we update metric info in onTaskEnd()(suppose there are thousand tasks) in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of spark21, we should use code like following notice: not fully test, can not guarantee it is right. {code} public void onStageCompleted(SparkListenerStageCompleted stageCompleted) { if we update taskMetrics in onTaskEnd(), it consumes lot of memory. int stageId = stageCompleted.stageInfo().stageId(); int stageAttemptId = stageCompleted.stageInfo().attemptId(); String stageIdentifier = stageId + "_" + stageAttemptId; Integer jobId = stageIdToJobId.get(stageId); if (jobId == null) { LOG.warn("Cannot find job id for stage[" + stageId + "]."); } else { MapjobMetrics = allJobMetrics.get(jobId); if (jobMetrics == null) { jobMetrics = Maps.newHashMap(); allJobMetrics.put(jobId, jobMetrics); } List stageMetrics = jobMetrics.get(stageIdentifier); if (stageMetrics == null) { stageMetrics = Lists.newLinkedList(); jobMetrics.put(stageIdentifier, stageMetrics); } stageMetrics.add(stageCompleted.stageInfo().taskMetrics()); } } public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) { } {code} bq. I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2. It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2 > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030795#comment-16030795 ] hsj commented on PIG-5157: -- [~nkollar]: bq. in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? the reason to modify it is because [~rohini] suggested that [memory| is used a lot if we update metric info in onTaskEnd()(suppose there are thousand tasks) in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of spark21, we should use code like following notice: not fully test, can not guarantee it is right. {code} public void onStageCompleted(SparkListenerStageCompleted stageCompleted) { if we update taskMetrics in onTaskEnd(), it consumes lot of memory. int stageId = stageCompleted.stageInfo().stageId(); int stageAttemptId = stageCompleted.stageInfo().attemptId(); String stageIdentifier = stageId + "_" + stageAttemptId; Integer jobId = stageIdToJobId.get(stageId); if (jobId == null) { LOG.warn("Cannot find job id for stage[" + stageId + "]."); } else { MapjobMetrics = allJobMetrics.get(jobId); if (jobMetrics == null) { jobMetrics = Maps.newHashMap(); allJobMetrics.put(jobId, jobMetrics); } List stageMetrics = jobMetrics.get(stageIdentifier); if (stageMetrics == null) { stageMetrics = Lists.newLinkedList(); jobMetrics.put(stageIdentifier, stageMetrics); } stageMetrics.add(stageCompleted.stageInfo().taskMetrics()); } } public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) { } {code} bq. I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2. It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2 > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028651#comment-16028651 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: will review tomorrow as Monday and Tuesday, i am out of office. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028477#comment-16028477 ] Nandor Kollar commented on PIG-5157: [~kellyzly] could you please have a look at my patch? There are two questionable change: - in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that? - I didn't find a proper replacement for JobLogger, hence it is removed. What was it used for? It looks like it became deprecated and was removed from Spark. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.17.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026932#comment-16026932 ] Rohini Palaniswamy commented on PIG-5157: - +1. [~szita], could you take care of committing this as well after the spark branch merge along with PIG-5207 and PIG-5194? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022694#comment-16022694 ] Jeff Zhang commented on PIG-5157: - I think we can move to DataFrame for spark both spark 1.6 and 2.x > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022529#comment-16022529 ] Nandor Kollar commented on PIG-5157: Thanks Jeff! Looks like you're more familiar with Spark than me. :) What do you think, should we keep RDDs for Spark 1.6, or we should move it to DataFrames too? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022499#comment-16022499 ] Jeff Zhang commented on PIG-5157: - bq. I think (and correct me if I'm wrong) we don't have to change physical and logical plan, but we've to modify how the plan is mapped to Spark: modify the converters from RDD converter to DataSet converter. That's correct. bq. we should try to migrate to DataSet API only for spark 2.1. As far as I know Spark 1.6 has DataFrames API, but since it was experimental that time, I think we shouldn't change that, RDDs are fine for Spark 1.6 DataFrame API is not experimental for spark 1.6, it is pretty stable for 1.6. I guess you mean DataSet API instead of DataFrame API. In Spark 2.x DataFrame is a just a alias of DataSet[Row]. I think pig don't need DataSet, it only needs DataFrame, DataSet is for strong typing such as java beans, but seems pig only use Tuple, so pig don't needs the feature of DataSet, DataFrame is sufficient for pig. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022491#comment-16022491 ] Nandor Kollar commented on PIG-5157: [~kellyzly], [~jeffzhang] I think (and correct me if I'm wrong) we don't have to change physical and logical plan, but we've to modify how the plan is mapped to Spark: modify the converters from RDD converter to DataSet converter. I'd recommend to split this into two task. First is upgrading to Spark 2.1 while still being able to compile with Spark 1.6. I'm close to finish this, there were few API changes, I'll attaching the patch soon for comments. Once this is done, we should try to migrate to DataSet API only for spark 2.1. As far as I know Spark 1.6 has DataFrames API, but since it was experimental that time, I think we shouldn't change that, RDDs are fine for Spark 1.6. Any thoughts? [~pallavi.rao] I saw you investigated DataFrames API for PoS before, but didn't find it suitable. What was the issue with it? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022455#comment-16022455 ] Jeff Zhang commented on PIG-5157: - I think pig still have LogicalPlan & PhysicalPlan for spark engine. But there's no difference between spark's LogicalPlan & PhysicalPlan, because that is delegated to spark's dataframe. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022446#comment-16022446 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: bq. the optimizations offered (project Tungsten and Catalyst optimizer) looks promising If use catalyst optimizer, do we need {{org.apache.pig.newplan.logical.relational.LogicalPlan}},{{org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan}}? {{Catalyst optimizer}} optimizes the spark plan generated by spark sql. bq. however it seems that it is build around Java beans I guess DataSet/DataFrame api provide row-based operation. see the [patch|https://issues.apache.org/jira/secure/attachment/12847623/PIG-5080-1.patch] of PIG-5080 {code} SparkContext context = SparkContext.getOrCreate(); SQLContext sqlContext = SQLContext.getOrCreate(context); DataFrame df = sqlContext.table("complex_data"); Row[] rows = df.collect(); assertEquals(10, rows.length); for (int i = 0; i < rows.length; i++) { assertEquals(i, rows[i].getJavaMap(0).get("key_" + i)); } {code} [~zjffdu]: appreciate if you can give us your suggetion as you are more familiar with spark. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022041#comment-16022041 ] Jeff Zhang commented on PIG-5157: - [~nkollar] [~kellyzly] IMO We should use DataFrame (aka DataSet[Row]) which match with Tuple of pig perfectly. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021051#comment-16021051 ] Nandor Kollar commented on PIG-5157: [~kellyzly] what you think, should we try Spark's DataFrames or DataSets API? I read a couple of blog posts, and the optimizations offered (project Tungsten and Catalyst optimizer) looks promising, however it seems that it is build around Java beans, not sure if this fits well into our generic Tuple-based data model. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019070#comment-16019070 ] liyunzhang_intel commented on PIG-5157: --- [~zjffdu] and [~rohini]: thanks for your suggestion. [~zjffdu]: bq.Supporting to spark2 could be done in the next release, maybe also changing from the rdd api to dataframe api in the next release. yes, we will definitely not support spark2 in the first release. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017893#comment-16017893 ] Jeff Zhang commented on PIG-5157: - A lot of users are still using spark 1.x as spark 2 is incompatible with spark 1.x. And I don't think spark 1.x will be dropped in short time. So I think we should still support spark 1.x. And actually I would suggest to use spark 1.x as the only supported version of pig on spark. Because I think pig on spark has already behind the schedule, and lots of people are looking forward that. Adding support for spark 2 would take more time and effort, and may bring in some issues, so I would suggest to only support spark 1.x in the first release of pig on spark. For users, it is transparent and it is easy to upgrade from spark1 to spark2. Supporting to spark2 could be done in the next release, maybe also changing from the rdd api to dataframe api in the next release. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017712#comment-16017712 ] Rohini Palaniswamy commented on PIG-5157: - I am fine with supporting Spark 2.x or supporting both versions. This depends on two things. 1) How well Spark 2 is adopted and how many distributions or users are still on Spark 1.x 2) When is spark community planning to deprecate or EOL Spark 1.x [~szita] and [~nkollar] might be able to make a better call based on their users. [~zjffdu], Do you have knowledge when support for Spark 1.x will be dropped by Spark community? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017009#comment-16017009 ] Nandor Kollar commented on PIG-5157: We can use reflection, or we can also use shims instead. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017007#comment-16017007 ] liyunzhang_intel commented on PIG-5157: --- [~rohini],[~xuefuz],[~zjffdu]: Should we support spark2 or support both spark1.6 and spark2? It may use reflection to support both version(still investigation). Please give us your opinion, in my view, we don't suppport spark1.6 if we upgrade to spark2.0. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940012#comment-15940012 ] liyunzhang_intel commented on PIG-5157: --- after discussion with [~nkollar], [~szita] ,[~kexianda], we don't plan to support spark2.0 before the first release in April. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939967#comment-15939967 ] Jeff Zhang commented on PIG-5157: - Yeah, that's what I mean. So for this ticket, we need to run test for 2 spark versions. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939939#comment-15939939 ] liyunzhang_intel commented on PIG-5157: --- [~zjffdu]: it is better to let user to choose spark version when building pig. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939467#comment-15939467 ] Jeff Zhang commented on PIG-5157: - BTW, spark 2.1.1 will be release soon. Another thing I want to bring is that does this ticket mean spark 1.6 is not supported, or user could choose spark version when building pig ? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892269#comment-15892269 ] Nandor Kollar commented on PIG-5157: Thanks Jeff! > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889267#comment-15889267 ] liyunzhang_intel commented on PIG-5157: --- [~zjffdu]: thanks your suggestion. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888079#comment-15888079 ] Jeff Zhang commented on PIG-5157: - I would suggest to use spark 2.0.2 or spark 2.1.0 which is much more stable. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887655#comment-15887655 ] Nandor Kollar commented on PIG-5157: I don't know which would make more sense, but my guess is upgrading the the latest one is better unless we have a good reason not to do it. > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886570#comment-15886570 ] liyunzhang_intel commented on PIG-5157: --- [~nkollar]: it is sensible to upgrade to latest spark version, 2.0 or 2.1 which is more sensible or either is ok? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0
[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885596#comment-15885596 ] Nandor Kollar commented on PIG-5157: Once PIG-5132 is completed we should upgrade to the latest Spark version. [~kellyzly] what do you think? > Upgrade to Spark 2.0 > > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Nandor Kollar > Fix For: spark-branch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.3.15#6346)