[jira] [Created] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-05-11 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-15285:
---

 Summary: Generated SpecificSafeProjection.apply method grows 
beyond 64 KB
 Key: SPARK-15285
 URL: https://issues.apache.org/jira/browse/SPARK-15285
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1, 2.0.0
Reporter: Konstantin Shaposhnikov


The following code snippet results in 
{noformat}
 org.codehaus.janino.JaninoRuntimeException: Code of method 
"(Ljava/lang/Object;)Ljava/lang/Object;" of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection"
 grows beyond 64 KB
  at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
{noformat}

{code}
case class S100(s1:String="1", s2:String="2", s3:String="3", s4:String="4", 
s5:String="5", s6:String="6", s7:String="7", s8:String="8", s9:String="9", 
s10:String="10", s11:String="11", s12:String="12", s13:String="13", 
s14:String="14", s15:String="15", s16:String="16", s17:String="17", 
s18:String="18", s19:String="19", s20:String="20", s21:String="21", 
s22:String="22", s23:String="23", s24:String="24", s25:String="25", 
s26:String="26", s27:String="27", s28:String="28", s29:String="29", 
s30:String="30", s31:String="31", s32:String="32", s33:String="33", 
s34:String="34", s35:String="35", s36:String="36", s37:String="37", 
s38:String="38", s39:String="39", s40:String="40", s41:String="41", 
s42:String="42", s43:String="43", s44:String="44", s45:String="45", 
s46:String="46", s47:String="47", s48:String="48", s49:String="49", 
s50:String="50", s51:String="51", s52:String="52", s53:String="53", 
s54:String="54", s55:String="55", s56:String="56", s57:String="57", 
s58:String="58", s59:String="59", s60:String="60", s61:String="61", 
s62:String="62", s63:String="63", s64:String="64", s65:String="65", 
s66:String="66", s67:String="67", s68:String="68", s69:String="69", 
s70:String="70", s71:String="71", s72:String="72", s73:String="73", 
s74:String="74", s75:String="75", s76:String="76", s77:String="77", 
s78:String="78", s79:String="79", s80:String="80", s81:String="81", 
s82:String="82", s83:String="83", s84:String="84", s85:String="85", 
s86:String="86", s87:String="87", s88:String="88", s89:String="89", 
s90:String="90", s91:String="91", s92:String="92", s93:String="93", 
s94:String="94", s95:String="95", s96:String="96", s97:String="97", 
s98:String="98", s99:String="99", s100:String="100")

case class S(s1: S100=S100(), s2: S100=S100(), s3: S100=S100(), s4: 
S100=S100(), s5: S100=S100(), s6: S100=S100(), s7: S100=S100(), s8: 
S100=S100(), s9: S100=S100(), s10: S100=S100())

val ds = Seq(S(),S(),S()).toDS
ds.show()
{code}

I could reproduce this with Spark built from 1.6 branch and with 
https://home.apache.org/~pwendell/spark-nightly/spark-master-bin/spark-2.0.0-SNAPSHOT-2016_05_11_01_03-8beae59-bin/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2016-01-12 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093511#comment-15093511
 ] 

Konstantin Shaposhnikov commented on SPARK-2984:


I am seeing the same error message with Spark 1.6 and HDFS. This happens after 
an earlier job failure (ClassCastException)

> FileNotFoundException on _temporary directory
> -
>
> Key: SPARK-2984
> URL: https://issues.apache.org/jira/browse/SPARK-2984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.3.0
>
>
> We've seen several stacktraces and threads on the user mailing list where 
> people are having issues with a {{FileNotFoundException}} stemming from an 
> HDFS path containing {{_temporary}}.
> I ([~aash]) think this may be related to {{spark.speculation}}.  I think the 
> error condition might manifest in this circumstance:
> 1) task T starts on a executor E1
> 2) it takes a long time, so task T' is started on another executor E2
> 3) T finishes in E1 so moves its data from {{_temporary}} to the final 
> destination and deletes the {{_temporary}} directory during cleanup
> 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but 
> those files no longer exist!  exception
> Some samples:
> {noformat}
> 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job 
> 140774430 ms.0
> java.io.FileNotFoundException: File 
> hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
> at 
> org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
> at 
> org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643)
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> -- Chen Song at 
> http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFiles-file-not-found-exception-td10686.html
> {noformat}
> I am running a Spark Streaming job that uses saveAsTextFiles to save results 
> into hdfs files. However, it has an exception after 20 batches
> result-140631234/_temporary/0/task_201407251119__m_03 does not 
> exist.
> {noformat}
> and
> {noformat}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not exist. 
> Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open files.
>   at 
> 

[jira] [Commented] (SPARK-10066) Can't create HiveContext with spark-shell or spark-sql on snapshot

2015-10-21 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966346#comment-14966346
 ] 

Konstantin Shaposhnikov commented on SPARK-10066:
-

I have the same problem when creating HiveContext programmatically (from a 
Scala app) on Windows.


> Can't create HiveContext with spark-shell or spark-sql on snapshot
> --
>
> Key: SPARK-10066
> URL: https://issues.apache.org/jira/browse/SPARK-10066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.5.0
> Environment: Centos 6.6
>Reporter: Robert Beauchemin
>Priority: Minor
>
> Built the 1.5.0-preview-20150812 with the following:
> ./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
> -Phive-thriftserver -Psparkr -DskipTests
> Starting spark-shell or spark-sql returns the following error: 
> java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable. Current permissions are: rwx--
> at 
> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
>  [elided]
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)   
> 
> It's trying to create a new HiveContext. Running pySpark or sparkR works and 
> creates a HiveContext successfully. SqlContext can be created successfully 
> with any shell.
> I've tried changing permissions on that HDFS directory (even as far as making 
> it world-writable) without success. Tried changing SPARK_USER and also 
> running spark-shell as different users without success.
> This works on same machine on 1.4.1 and on earlier pre-release versions of 
> Spark 1.5.0 (same make-distribution parms) sucessfully. Just trying the 
> snapshot... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10066) Can't create HiveContext with spark-shell or spark-sql on snapshot

2015-10-21 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966346#comment-14966346
 ] 

Konstantin Shaposhnikov edited comment on SPARK-10066 at 10/21/15 7:19 AM:
---

I have the same problem when creating HiveContext programmatically (from a 
Scala app) on Windows using the latest Spark from 1.5 branch



was (Author: k.shaposhni...@gmail.com):
I have the same problem when creating HiveContext programmatically (from a 
Scala app) on Windows.


> Can't create HiveContext with spark-shell or spark-sql on snapshot
> --
>
> Key: SPARK-10066
> URL: https://issues.apache.org/jira/browse/SPARK-10066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 1.5.0
> Environment: Centos 6.6
>Reporter: Robert Beauchemin
>Priority: Minor
>
> Built the 1.5.0-preview-20150812 with the following:
> ./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
> -Phive-thriftserver -Psparkr -DskipTests
> Starting spark-shell or spark-sql returns the following error: 
> java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: 
> /tmp/hive on HDFS should be writable. Current permissions are: rwx--
> at 
> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
>  [elided]
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)   
> 
> It's trying to create a new HiveContext. Running pySpark or sparkR works and 
> creates a HiveContext successfully. SqlContext can be created successfully 
> with any shell.
> I've tried changing permissions on that HDFS directory (even as far as making 
> it world-writable) without success. Tried changing SPARK_USER and also 
> running spark-shell as different users without success.
> This works on same machine on 1.4.1 and on earlier pre-release versions of 
> Spark 1.5.0 (same make-distribution parms) sucessfully. Just trying the 
> snapshot... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8824) Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS

2015-08-11 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681418#comment-14681418
 ] 

Konstantin Shaposhnikov commented on SPARK-8824:


Ok, thank you for the update.

 Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS
 ---

 Key: SPARK-8824
 URL: https://issues.apache.org/jira/browse/SPARK-8824
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Cheng Lian





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8824) Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS

2015-08-10 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681146#comment-14681146
 ] 

Konstantin Shaposhnikov commented on SPARK-8824:


parquet-mr 1.7.0+ depends on parquet-format 2.3.0-incubating that includes 
support for TIMESTAMP_MILLIS: 
https://github.com/apache/parquet-format/blob/apache-parquet-format-2.3.0-incubating/src/thrift/parquet.thrift#L104

TIMESTAMP_MICROS is not there indeed.

 Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS
 ---

 Key: SPARK-8824
 URL: https://issues.apache.org/jira/browse/SPARK-8824
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Cheng Lian





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8824) Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS

2015-08-03 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653060#comment-14653060
 ] 

Konstantin Shaposhnikov commented on SPARK-8824:


Can TIMESTAMP_MILLIS support (for INT64 type) be implemented first? Would a 
pull request for this be accepted for Spark 1.5? Thank you.

 Support Parquet logical types TIMESTAMP_MILLIS and TIMESTAMP_MICROS
 ---

 Key: SPARK-8824
 URL: https://issues.apache.org/jira/browse/SPARK-8824
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Cheng Lian





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8819) Spark doesn't compile with maven 3.3.x

2015-07-30 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648680#comment-14648680
 ] 

Konstantin Shaposhnikov commented on SPARK-8819:


[MSHADE-148] has been fixed in maven-shade-plugin 2.4.1

It would be good to update pom.xml to use it and remove 
{{create.dependency.reduced.pom}} workaround.

 Spark doesn't compile with maven 3.3.x
 --

 Key: SPARK-8819
 URL: https://issues.apache.org/jira/browse/SPARK-8819
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.2, 1.4.0, 1.5.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.3.2, 1.4.1, 1.5.0


 Simple reproduction: Install maven 3.3.3 and run build/mvn clean package 
 -DskipTests
 This works just fine for maven 3.2.1 but not for 3.3.x. The result is an 
 infinite loop caused by MSHADE-148:
 {code}
 [INFO] Replacing 
 /Users/andrew/Documents/dev/spark/andrew-spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with 
 /Users/andrew/Documents/dev/spark/andrew-spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at: 
 /Users/andrew/Documents/dev/spark/andrew-spark/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at: 
 /Users/andrew/Documents/dev/spark/andrew-spark/bagel/dependency-reduced-pom.xml
 ...
 {code}
 This is ultimately caused by SPARK-7558 (master 
 9eb222c13991c2b4a22db485710dc2e27ccf06dd) but is recently revealed through 
 SPARK-8781 (master 82cf3315e690f4ac15b50edea6a3d673aa5be4c0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8781) Pusblished POMs are no longer effective POMs

2015-07-02 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611600#comment-14611600
 ] 

Konstantin Shaposhnikov commented on SPARK-8781:


I believe this will affect both released and SNAPSHOT artefacts. 

Basically, as part of SPARK-3812 the build was changed to deploy an effective 
POMs into maven repository. E.g. in 
https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/1.4.0/spark-core_2.11-1.4.0.pom
 you won't find {{$\{scala.binary.version}}, it was resolved to 2.11 by the 
maven during the build. This is required for Scala 2.11 build to make sure that 
jars that are built with Scala 2.11 reference Scala 2.11 jars (e.g. 
spark-core_2.11 should depend on spark-launcher_2.11, not on 
spark-launcher_2.10). By default {{$\{scala.binary.version}} will be resolved 
to 2.10 because scala-2.10 maven profile is the active by default.

Publishing of effective POMs is implemented using maven-shade-plugin. To be 
honest I am not sure how exactly it works. However when I removed the following 
line from the parent POM 
{{createDependencyReducedPomfalse/createDependencyReducedPom}} the build 
started to deploy effective POMs again.

I hope my explanation helps.

 Pusblished POMs are no longer effective POMs
 

 Key: SPARK-8781
 URL: https://issues.apache.org/jira/browse/SPARK-8781
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.2, 1.4.1, 1.5.0
Reporter: Konstantin Shaposhnikov

 Published to maven repository POMs are no longer effective POMs. E.g. 
 In 
 https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/1.4.2-SNAPSHOT/spark-core_2.11-1.4.2-20150702.043114-52.pom:
 {noformat}
 ...
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-launcher_${scala.binary.version}/artifactId
 version${project.version}/version
 /dependency
 ...
 {noformat}
 while it should be
 {noformat}
 ...
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-launcher_2.11/artifactId
 version${project.version}/version
 /dependency
 ...
 {noformat}
 The following commits are most likely the cause of it:
 - for branch-1.3: 
 https://github.com/apache/spark/commit/ce137b8ed3b240b7516046699ac96daa55ddc129
 - for branch-1.4: 
 https://github.com/apache/spark/commit/84da653192a2d9edb82d0dbe50f577c4dc6a0c78
 - for master: 
 https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724
 On branch-1.4 reverting the commit fixed the issue.
 See SPARK-3812 for additional details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8781) Pusblished POMs are no longer effective POMs

2015-07-02 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611606#comment-14611606
 ] 

Konstantin Shaposhnikov commented on SPARK-8781:


The original commit that adds effective POM publishing: 
https://github.com/apache/spark/commit/6e09c98b5d7ad92cf01a3b415008f48782f2f1a3

 Pusblished POMs are no longer effective POMs
 

 Key: SPARK-8781
 URL: https://issues.apache.org/jira/browse/SPARK-8781
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.2, 1.4.1, 1.5.0
Reporter: Konstantin Shaposhnikov

 Published to maven repository POMs are no longer effective POMs. E.g. 
 In 
 https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/1.4.2-SNAPSHOT/spark-core_2.11-1.4.2-20150702.043114-52.pom:
 {noformat}
 ...
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-launcher_${scala.binary.version}/artifactId
 version${project.version}/version
 /dependency
 ...
 {noformat}
 while it should be
 {noformat}
 ...
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-launcher_2.11/artifactId
 version${project.version}/version
 /dependency
 ...
 {noformat}
 The following commits are most likely the cause of it:
 - for branch-1.3: 
 https://github.com/apache/spark/commit/ce137b8ed3b240b7516046699ac96daa55ddc129
 - for branch-1.4: 
 https://github.com/apache/spark/commit/84da653192a2d9edb82d0dbe50f577c4dc6a0c78
 - for master: 
 https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724
 On branch-1.4 reverting the commit fixed the issue.
 See SPARK-3812 for additional details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8781) Pusblished POMs are no longer effective POMs

2015-07-01 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-8781:
--

 Summary: Pusblished POMs are no longer effective POMs
 Key: SPARK-8781
 URL: https://issues.apache.org/jira/browse/SPARK-8781
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.2, 1.4.1, 1.5.0
Reporter: Konstantin Shaposhnikov


Published to maven repository POMs are no longer effective POMs. E.g. 

In 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/1.4.2-SNAPSHOT/spark-core_2.11-1.4.2-20150702.043114-52.pom:

{noformat}
...
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-launcher_${scala.binary.version}/artifactId
version${project.version}/version
/dependency
...
{noformat}

while it should be

{noformat}
...
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-launcher_2.11/artifactId
version${project.version}/version
/dependency
...
{noformat}


The following commits are most likely the cause of it:
- for branch-1.3: 
https://github.com/apache/spark/commit/ce137b8ed3b240b7516046699ac96daa55ddc129
- for branch-1.4: 
https://github.com/apache/spark/commit/84da653192a2d9edb82d0dbe50f577c4dc6a0c78
- for master: 
https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

On branch-1.4 reverting the commit fixed the issue.

See SPARK-3812 for additional details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8585) Support LATERAL VIEW in Spark SQL parser

2015-06-23 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-8585:
--

 Summary: Support LATERAL VIEW in Spark SQL parser
 Key: SPARK-8585
 URL: https://issues.apache.org/jira/browse/SPARK-8585
 Project: Spark
  Issue Type: Improvement
Reporter: Konstantin Shaposhnikov


It would be good to support LATERAL VIEW SQL syntax without need to create 
HiveContext.

Docs: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-07 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576536#comment-14576536
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


Yes, a strong reference is required for all Logger instances that are 
configured in enableLogForwarding.

http://docs.oracle.com/javase/7/docs/api/java/util/logging/Logger.html#getLogger(java.lang.String)

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ doesn't hold to the created loggers that can be 
 garbage collected and all configuration changes will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._
 All created logger references need to be kept, e.g. in static variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-05 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574419#comment-14574419
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


Parquet itself surfers from this issue too but its almost impossible to hit it 
because the static block in Log is most likely called very shortly before some 
Logger instance is strongly referenced from a static LOG field (Log - Logger 
- parent Logger). It is very unlikely that GC happens between these two events.

But when there is a bigger interval between a Logger is configured in 
`enableLogForwarding()` and is actually used to log something there is a bigger 
chance to see this.

In one of my applications I used similar code to redirect parquet logging to 
slf4j and saw once that the redirect wasn't setup properly due to GC.

To be honest I wish parquet just used slf4j and didn't mess with logging set up 
;)

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ doesn't hold to the created loggers that can be 
 garbage collected and all configuration changes will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._
 All created logger references need to be kept, e.g. in static variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-8122:
--

 Summary: A few problems in ParquetRelation.enableLogForwarding()
 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov


_enableLogForwarding()_ should be updated after parquet 1.7.0 update , because 
name of the logger has been changed to `org.apache.parquet`. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Description: 
_enableLogForwarding()_ doesn't hold to the created loggers that can be garbage 
collected and all configuration changes will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._

All created logger references need to be kept, e.g. in static variables.


  was:

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._


 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ doesn't hold to the created loggers that can be 
 garbage collected and all configuration changes will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._
 All created logger references need to be kept, e.g. in static variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8118) Turn off noisy log output produced by Parquet 1.7.0

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573981#comment-14573981
 ] 

Konstantin Shaposhnikov commented on SPARK-8118:


Name of the logger has been changed to _org.apache.parquet_. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}


 Turn off noisy log output produced by Parquet 1.7.0
 ---

 Key: SPARK-8118
 URL: https://issues.apache.org/jira/browse/SPARK-8118
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor

 Parquet 1.7.0 renames package name to org.apache.parquet, need to adjust 
 {{ParquetRelation.enableLogForwarding}} accordingly to avoid noisy log output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573973#comment-14573973
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


I believe that currently `ParquetRelation.enableLogForwarding` doesn't do 
anything as it configures the wrong logger (parquet instead of 
org.apache.parquet). I haven't tested it though.


 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Priority: Minor  (was: Major)

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Summary: ParquetRelation.enableLogForwarding() may fail to configure 
loggers  (was: A few problems in ParquetRelation.enableLogForwarding())

 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8122) ParquetRelation.enableLogForwarding() may fail to configure loggers

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-8122:
---
Description: 

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._

  was:
_enableLogForwarding()_ should be updated after parquet 1.7.0 update , because 
name of the logger has been changed to `org.apache.parquet`. From parquet-mr 
Log:

{code}
// add a default handler in case there is none
Logger logger = Logger.getLogger(Log.class.getPackage().getName());
{code}

Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
created loggers that can be garbage collected and all configuration changes 
will be gone. From 
https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
javadocs:  _It is important to note that the Logger returned by one of the 
getLogger factory methods may be garbage collected at any time if a strong 
reference to the Logger is not kept._


 ParquetRelation.enableLogForwarding() may fail to configure loggers
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov
Priority: Minor

 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8122) A few problems in ParquetRelation.enableLogForwarding()

2015-06-04 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573978#comment-14573978
 ] 

Konstantin Shaposhnikov commented on SPARK-8122:


SPARK-8118 is for the first problem described in this issue.

The second problem (the loggers can be garbage collected) is another issue and 
should be fixed separately.

I will updated the JIRA.

 A few problems in ParquetRelation.enableLogForwarding()
 ---

 Key: SPARK-8122
 URL: https://issues.apache.org/jira/browse/SPARK-8122
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.5.0
Reporter: Konstantin Shaposhnikov

 _enableLogForwarding()_ should be updated after parquet 1.7.0 update , 
 because name of the logger has been changed to `org.apache.parquet`. From 
 parquet-mr Log:
 {code}
 // add a default handler in case there is none
 Logger logger = Logger.getLogger(Log.class.getPackage().getName());
 {code}
 Another problem with _enableLogForwarding()_ is that it doesn't hold to the 
 created loggers that can be garbage collected and all configuration changes 
 will be gone. From 
 https://docs.oracle.com/javase/6/docs/api/java/util/logging/Logger.html 
 javadocs:  _It is important to note that the Logger returned by one of the 
 getLogger factory methods may be garbage collected at any time if a strong 
 reference to the Logger is not kept._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-29 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564339#comment-14564339
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


I've created a pull request with akka 2.3.11 update. You can merge it if update 
of akka to version 2.3.11 looks reasonable.

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-27 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560688#comment-14560688
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


Yes, I've just tested it locally - 2.11 Spark build works with akka 2.3.11

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-26 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560180#comment-14560180
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


It looks like akka-zeromq_2.11 is only available for versions 2.3.7+, though 
the rest of the akka libraries are available for 2.3.4.

I wonder if akka version can just be updated to the latest 2.3.11?

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-26 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560181#comment-14560181
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


It looks like akka-zeromq_2.11 is only available for versions 2.3.7+, though 
the rest of the akka libraries are available for 2.3.4.

I wonder if akka version can just be updated to the latest 2.3.11?

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-26 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-7042:
---
Comment: was deleted

(was: It looks like akka-zeromq_2.11 is only available for versions 2.3.7+, 
though the rest of the akka libraries are available for 2.3.4.

I wonder if akka version can just be updated to the latest 2.3.11?)

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-26 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560166#comment-14560166
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


That is not true: http://search.maven.org/#browse%7C-1552622333 
(http://search.maven.org/#artifactdetails%7Ccom.typesafe.akka%7Cakka-actor_2.11%7C2.3.4%7Cjar)

What exactly was broken in scala 2.11 build?

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-26 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560283#comment-14560283
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


It looks like the Spark specific akka-zeromq version (2.3.4-spark) has been 
modified to work with Scala 2.11

In fact the standard build of akka-zeromq_2.11 (that is available for versions 
2.3.7+) depends on zeromq scala bindings created by Spark project 
(org.spark-project.zeromq:zeromq-scala-binding_2.11:0.0.7-spark).


 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Assignee: Konstantin Shaposhnikov
Priority: Minor
 Fix For: 1.5.0


 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-05-15 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545375#comment-14545375
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


There is nothing wrong with the standard Akka 2.11 build. In fact we have a 
custom build of Spark now that uses standard Akka 2.3.9 from maven central 
repository without any problems.

The error appears only with the custom build of akka (because it was compiled 
with buggy version of Scala) that comes with spark by default. 

I agree that number of users affected by this problem is probably quite small 
(only 1? ;)

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov
Priority: Minor

 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7249) Updated Hadoop dependencies due to inconsistency in the versions

2015-05-05 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529771#comment-14529771
 ] 

Konstantin Shaposhnikov commented on SPARK-7249:


SPARK-7042 is somewhat related to this issue.

If my understanding is correct then the standard akka build can be used with 
hadoop 2.x because both of them use protobuf 2.5

  Updated Hadoop dependencies due to inconsistency in the versions
 -

 Key: SPARK-7249
 URL: https://issues.apache.org/jira/browse/SPARK-7249
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 1.3.1
 Environment: Ubuntu 14.04. Apache Mesos in cluster mode with HDFS 
 from cloudera 2.5.0-cdh5.3.3.
Reporter: Favio Vázquez
Priority: Blocker

 Updated Hadoop dependencies due to inconsistency in the versions. Now the 
 global properties are the ones used by the hadoop-2.2 profile, and the 
 profile was set to empty but kept for backwards compatibility reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-04-22 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507243#comment-14507243
 ] 

Konstantin Shaposhnikov commented on SPARK-7042:


Is my understanding correct that the custom version of akka is required only 
for Hadoop-1?

In this case it might be good idea to use standard akka jars when building with 
hadoop-2 profile enabled,

 Spark version of akka-actor_2.11 is not compatible with the official 
 akka-actor_2.11 2.3.x
 --

 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov

 When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
 with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
 error:
 {noformat}
 2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
 [sparkDriver-akka.actor.default-dispatcher-5] -
 Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
 failed, address is now gated for [5000] ms.
 Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
 serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
 {noformat}
 It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been 
 built using Scala compiler 2.11.0 that ignores SerialVersionUID annotations 
 (see https://issues.scala-lang.org/browse/SI-8549).
 The following steps can resolve the issue:
 - re-build the custom akka library that is used by Spark with the more recent 
 version of Scala compiler (e.g. 2.11.6) 
 - deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
 - update version of akka used by spark (master and 1.3 branch)
 I would also suggest to upgrade to the latest version of akka 2.3.9 (or 
 2.3.10 that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7042) Spark version of akka-actor_2.11 is not compatible with the official akka-actor_2.11 2.3.x

2015-04-21 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-7042:
--

 Summary: Spark version of akka-actor_2.11 is not compatible with 
the official akka-actor_2.11 2.3.x
 Key: SPARK-7042
 URL: https://issues.apache.org/jira/browse/SPARK-7042
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Konstantin Shaposhnikov


When connecting to a remote Spark cluster (that runs Spark branch-1.3 built 
with Scala 2.11) from an application that uses akka 2.3.9 I get the following 
error:

{noformat}
2015-04-22 09:01:38,924 - [WARN] - [akka.remote.ReliableDeliverySupervisor] 
[sparkDriver-akka.actor.default-dispatcher-5] -
Association with remote system [akka.tcp://sparkExecutor@server:59007] has 
failed, address is now gated for [5000] ms.
Reason is: [akka.actor.Identify; local class incompatible: stream classdesc 
serialVersionUID = -213377755528332889, local class serialVersionUID = 1].
{noformat}

It looks like akka-actor_2.11 2.3.4-spark that is used by Spark has been built 
using Scala compiler 2.11.0 that ignores SerialVersionUID annotations (see 
https://issues.scala-lang.org/browse/SI-8549).

The following steps can resolve the issue:
- re-build the custom akka library that is used by Spark with the more recent 
version of Scala compiler (e.g. 2.11.6) 
- deploy a new version (e.g. 2.3.4.1-spark) to a maven repo
- update version of akka used by spark (master and 1.3 branch)

I would also suggest to upgrade to the latest version of akka 2.3.9 (or 2.3.10 
that should be released soon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6566) Update Spark to use the latest version of Parquet libraries

2015-03-29 Thread Konstantin Shaposhnikov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386170#comment-14386170
 ] 

Konstantin Shaposhnikov commented on SPARK-6566:


Thank you for the update [~lian cheng]

 Update Spark to use the latest version of Parquet libraries
 ---

 Key: SPARK-6566
 URL: https://issues.apache.org/jira/browse/SPARK-6566
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Konstantin Shaposhnikov

 There are a lot of bug fixes in the latest version of parquet (1.6.0rc7). 
 E.g. PARQUET-136
 It would be good to update Spark to use the latest parquet version.
 The following changes are required:
 {code}
 diff --git a/pom.xml b/pom.xml
 index 5ad39a9..095b519 100644
 --- a/pom.xml
 +++ b/pom.xml
 @@ -132,7 +132,7 @@
  !-- Version used for internal directory structure --
  hive.version.short0.13.1/hive.version.short
  derby.version10.10.1.1/derby.version
 -parquet.version1.6.0rc3/parquet.version
 +parquet.version1.6.0rc7/parquet.version
  jblas.version1.2.3/jblas.version
  jetty.version8.1.14.v20131031/jetty.version
  orbit.version3.0.0.v201112011016/orbit.version
 {code}
 and
 {code}
 --- 
 a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 +++ 
 b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
 @@ -480,7 +480,7 @@ private[parquet] class FilteringParquetRowInputFormat
  globalMetaData = new GlobalMetaData(globalMetaData.getSchema,
mergedMetadata, globalMetaData.getCreatedBy)
  
 -val readContext = getReadSupport(configuration).init(
 +val readContext = 
 ParquetInputFormat.getReadSupportInstance(configuration).init(
new InitContext(configuration,
  globalMetaData.getKeyValueMetaData,
  globalMetaData.getSchema))
 {code}
 I am happy to prepare a pull request if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6566) Update Spark to use the latest version of Parquet libraries

2015-03-27 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-6566:
--

 Summary: Update Spark to use the latest version of Parquet 
libraries
 Key: SPARK-6566
 URL: https://issues.apache.org/jira/browse/SPARK-6566
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Konstantin Shaposhnikov


There are a lot of bug fixes in the latest version of parquet (1.6.0rc7). E.g. 
PARQUET-136

It would be good to update Spark to use the latest parquet version.

The following changes are required:
{code}
diff --git a/pom.xml b/pom.xml
index 5ad39a9..095b519 100644
--- a/pom.xml
+++ b/pom.xml
@@ -132,7 +132,7 @@
 !-- Version used for internal directory structure --
 hive.version.short0.13.1/hive.version.short
 derby.version10.10.1.1/derby.version
-parquet.version1.6.0rc3/parquet.version
+parquet.version1.6.0rc7/parquet.version
 jblas.version1.2.3/jblas.version
 jetty.version8.1.14.v20131031/jetty.version
 orbit.version3.0.0.v201112011016/orbit.version
{code}
and
{code}
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
@@ -480,7 +480,7 @@ private[parquet] class FilteringParquetRowInputFormat
 globalMetaData = new GlobalMetaData(globalMetaData.getSchema,
   mergedMetadata, globalMetaData.getCreatedBy)
 
-val readContext = getReadSupport(configuration).init(
+val readContext = 
ParquetInputFormat.getReadSupportInstance(configuration).init(
   new InitContext(configuration,
 globalMetaData.getKeyValueMetaData,
 globalMetaData.getSchema))

{code}

I am happy to prepare a pull request if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6489) Optimize lateral view with explode to not read unnecessary columns

2015-03-23 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-6489:
--

 Summary: Optimize lateral view with explode to not read 
unnecessary columns
 Key: SPARK-6489
 URL: https://issues.apache.org/jira/browse/SPARK-6489
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Konstantin Shaposhnikov


Currently a query with lateral view explode(...) results in an execution plan 
that reads all columns of the underlying RDD.

E.g. given *ppl* table is DF created from Person case class:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])
{code}
the following SQL:
{code}
select name, sum(d) from ppl lateral view explode(data) d as d group by name
{code}
executes as follows:
{noformat}
== Physical Plan ==
Aggregate false, [name#0], [name#0,SUM(PartialSum#8L) AS _c1#3L]
 Exchange (HashPartitioning [name#0], 200)
  Aggregate true, [name#0], [name#0,SUM(CAST(d#6, LongType)) AS PartialSum#8L]
   Project [name#0,d#6]
Generate explode(data#2), true, false
 PhysicalRDD [name#0,age#1,data#2], MapPartitionsRDD[1] at mapPartitions at 
ExistingRDD.scala:35
{noformat}

Note that *age* column is not needed to produce the output but it is still read 
from the underlying RDD.

A sample program to demonstrate the issue:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])

object ExplodeDemo extends App {
  val ppl = Array(
Person(A, 20, Array(10, 12, 19)),
Person(B, 25, Array(7, 8, 4)),
Person(C, 19, Array(12, 4, 232)))
  

  val conf = new SparkConf().setMaster(local[2]).setAppName(sql)
  val sc = new SparkContext(conf)
  val sqlCtx = new HiveContext(sc)
  import sqlCtx.implicits._
  val df = sc.makeRDD(ppl).toDF
  df.registerTempTable(ppl)
  val s = sqlCtx.sql(select name, sum(d) from ppl lateral view explode(data) d 
as d group by name)
  s.explain(true)
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6489) Optimize lateral view with explode to not read unnecessary columns

2015-03-23 Thread Konstantin Shaposhnikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shaposhnikov updated SPARK-6489:
---
Description: 
Currently a query with lateral view explode(...) results in an execution plan 
that reads all columns of the underlying RDD.

E.g. given *ppl* table is DF created from Person case class:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])
{code}
the following SQL:
{code}
select name, sum(d) from ppl lateral view explode(data) d as d group by name
{code}
executes as follows:
{noformat}
== Physical Plan ==
Aggregate false, [name#0], [name#0,SUM(PartialSum#38L) AS _c1#18L]
 Exchange (HashPartitioning [name#0], 200)
  Aggregate true, [name#0], [name#0,SUM(CAST(d#21, LongType)) AS PartialSum#38L]
   Project [name#0,d#21]
Generate explode(data#2), true, false
 InMemoryColumnarTableScan [name#0,age#1,data#2], [], (InMemoryRelation 
[name#0,age#1,data#2], true, 1, StorageLevel(true, true, false, true, 1), 
(PhysicalRDD [name#0,age#1,data#2], MapPartitionsRDD[1] at mapPartitions at 
ExistingRDD.scala:35), Some(ppl))
{noformat}

Note that *age* column is not needed to produce the output but it is still read 
from the underlying RDD.

A sample program to demonstrate the issue:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])

object ExplodeDemo extends App {
  val ppl = Array(
Person(A, 20, Array(10, 12, 19)),
Person(B, 25, Array(7, 8, 4)),
Person(C, 19, Array(12, 4, 232)))
  

  val conf = new SparkConf().setMaster(local[2]).setAppName(sql)
  val sc = new SparkContext(conf)
  val sqlCtx = new HiveContext(sc)
  import sqlCtx.implicits._
  val df = sc.makeRDD(ppl).toDF
  df.registerTempTable(ppl)
  sqlCtx.cacheTable(ppl) // cache table otherwise ExistingRDD will be used 
that do not support column pruning
  val s = sqlCtx.sql(select name, sum(d) from ppl lateral view explode(data) d 
as d group by name)
  s.explain(true)
}
{code}

  was:
Currently a query with lateral view explode(...) results in an execution plan 
that reads all columns of the underlying RDD.

E.g. given *ppl* table is DF created from Person case class:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])
{code}
the following SQL:
{code}
select name, sum(d) from ppl lateral view explode(data) d as d group by name
{code}
executes as follows:
{noformat}
== Physical Plan ==
Aggregate false, [name#0], [name#0,SUM(PartialSum#8L) AS _c1#3L]
 Exchange (HashPartitioning [name#0], 200)
  Aggregate true, [name#0], [name#0,SUM(CAST(d#6, LongType)) AS PartialSum#8L]
   Project [name#0,d#6]
Generate explode(data#2), true, false
 PhysicalRDD [name#0,age#1,data#2], MapPartitionsRDD[1] at mapPartitions at 
ExistingRDD.scala:35
{noformat}

Note that *age* column is not needed to produce the output but it is still read 
from the underlying RDD.

A sample program to demonstrate the issue:
{code}
case class Person(val name: String, val age: Int, val data: Array[Int])

object ExplodeDemo extends App {
  val ppl = Array(
Person(A, 20, Array(10, 12, 19)),
Person(B, 25, Array(7, 8, 4)),
Person(C, 19, Array(12, 4, 232)))
  

  val conf = new SparkConf().setMaster(local[2]).setAppName(sql)
  val sc = new SparkContext(conf)
  val sqlCtx = new HiveContext(sc)
  import sqlCtx.implicits._
  val df = sc.makeRDD(ppl).toDF
  df.registerTempTable(ppl)
  val s = sqlCtx.sql(select name, sum(d) from ppl lateral view explode(data) d 
as d group by name)
  s.explain(true)
}
{code}


 Optimize lateral view with explode to not read unnecessary columns
 --

 Key: SPARK-6489
 URL: https://issues.apache.org/jira/browse/SPARK-6489
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Konstantin Shaposhnikov

 Currently a query with lateral view explode(...) results in an execution 
 plan that reads all columns of the underlying RDD.
 E.g. given *ppl* table is DF created from Person case class:
 {code}
 case class Person(val name: String, val age: Int, val data: Array[Int])
 {code}
 the following SQL:
 {code}
 select name, sum(d) from ppl lateral view explode(data) d as d group by name
 {code}
 executes as follows:
 {noformat}
 == Physical Plan ==
 Aggregate false, [name#0], [name#0,SUM(PartialSum#38L) AS _c1#18L]
  Exchange (HashPartitioning [name#0], 200)
   Aggregate true, [name#0], [name#0,SUM(CAST(d#21, LongType)) AS 
 PartialSum#38L]
Project [name#0,d#21]
 Generate explode(data#2), true, false
  InMemoryColumnarTableScan [name#0,age#1,data#2], [], (InMemoryRelation 
 [name#0,age#1,data#2], true, 1, StorageLevel(true, true, false, true, 1), 
 (PhysicalRDD [name#0,age#1,data#2], MapPartitionsRDD[1] at mapPartitions at 
 ExistingRDD.scala:35),