[jira] [Created] (SPARK-12121) Remote Spark init.sh called from spark_ec2.py points to incorrect prebuilt image URL
Andre Schumacher created SPARK-12121: Summary: Remote Spark init.sh called from spark_ec2.py points to incorrect prebuilt image URL Key: SPARK-12121 URL: https://issues.apache.org/jira/browse/SPARK-12121 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.5.2 Reporter: Andre Schumacher The tar ball for 1.5.2 with Hadoop 1 (the default) has the scala version appended to the file name, which leads to spark_ec2.py failing on starting up the cluster. Here is the record from the S3 contents: spark-1.5.2-bin-hadoop1-scala2.11.tgz 2015-11-10T06:45:17.000Z "056fc68e549db27d986da707f19e39c8-4" 234574403 STANDARD Maybe one could provide one without the scala suffix (default)? A workaround is to use set the Hadoop version to a version different from 1 when calling spark-ec2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6462) UpdateStateByKey should allow inner join of new with old keys
Andre Schumacher created SPARK-6462: --- Summary: UpdateStateByKey should allow inner join of new with old keys Key: SPARK-6462 URL: https://issues.apache.org/jira/browse/SPARK-6462 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.3.0 Reporter: Andre Schumacher In a nutshell: provide a (inner join) instead of a cogroup for updateStateByKey in StateDStream. Details: It is common to read data (saw weblog data) from a streaming source (say Kafka) and each time update the state of a relatively small number of keys. If only the state changes need to be propagated to a downstream sink then one could avoid filtering out unchanged state in the user program and instead provide this functionality in the API (say by adding a updateStateChangesByKey method). Note that this is related but not identical to: https://issues.apache.org/jira/browse/SPARK-2629 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194576#comment-14194576 ] Andre Schumacher commented on SPARK-2620: - I also bumped into this issue (on Spark 1.1.0) and it is kind of extremely annoying although it only affects the REPL. Is anybody actively working on reolving this? Given it's already a few months old: are there any blockers for making this work? Matei mentioned the way code is wrapped inside the REPL. case class cannot be used as key for reduce --- Key: SPARK-2620 URL: https://issues.apache.org/jira/browse/SPARK-2620 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.1.0 Environment: reproduced on spark-shell local[4] Reporter: Gerard Maas Priority: Critical Labels: case-class, core Using a case class as a key doesn't seem to work properly on Spark 1.0.0 A minimal example: case class P(name:String) val ps = Array(P(alice), P(bob), P(charly), P(bob)) sc.parallelize(ps).map(x= (x,1)).reduceByKey((x,y) = x+y).collect [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), (P(abe),1), (P(charly),1)) In contrast to the expected behavior, that should be equivalent to: sc.parallelize(ps).map(x= (x.name,1)).reduceByKey((x,y) = x+y).collect Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2195) Parquet extraMetadata can contain key information
[ https://issues.apache.org/jira/browse/SPARK-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038661#comment-14038661 ] Andre Schumacher commented on SPARK-2195: - Since commit https://github.com/apache/spark/commit/f479cf3743e416ee08e62806e1b34aff5998ac22 the path is no longer stored in the extraMetadata. So I guess this issue can be closed? Parquet extraMetadata can contain key information - Key: SPARK-2195 URL: https://issues.apache.org/jira/browse/SPARK-2195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust Priority: Blocker {code} 14/06/19 01:52:05 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: file:/Users/pat/Projects/spark-summit-training-2014/usb/data/wiki-parquet/part-r-1.parquet start: 0 length: 24971040 hosts: [localhost] blocks: 1 requestedSchema: same as file fileSchema: message root { optional int32 id; optional binary title; optional int64 modified; optional binary text; optional binary username; } extraMetadata: {org.apache.spark.sql.parquet.row.metadata=StructType(List(StructField(id,IntegerType,true), StructField(title,StringType,true), StructField(modified,LongType,true), StructField(text,StringType,true), StructField(username,StringType,true))), path= MY AWS KEYS!!! } readSupportMetadata: {org.apache.spark.sql.parquet.row.metadata=StructType(List(StructField(id,IntegerType,true), StructField(title,StringType,true), StructField(modified,LongType,true), StructField(text,StringType,true), StructField(username,StringType,true))), path= MY AWS KEYS ***}} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2112) ParquetTypesConverter should not create its own conf
[ https://issues.apache.org/jira/browse/SPARK-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038666#comment-14038666 ] Andre Schumacher commented on SPARK-2112: - Since commit https://github.com/apache/spark/commit/f479cf3743e416ee08e62806e1b34aff5998ac22 the SparkContext's Hadoop configuration should be used when reading metadata from the file source. I wasn't yet able to test this with say S3 bucket names. Are the the S3 credentials copied from SparkConfig to its Hadoop configuration? If someone could confirm this to be working we could close this issue. ParquetTypesConverter should not create its own conf Key: SPARK-2112 URL: https://issues.apache.org/jira/browse/SPARK-2112 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust [~adav]: this actually makes it so that we can't use S3 credentials set in the SparkContext, or add new FileSystems at runtime, for instance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1982) saveToParquetFile doesn't support ByteType
[ https://issues.apache.org/jira/browse/SPARK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014945#comment-14014945 ] Andre Schumacher commented on SPARK-1982: - It turns out that ByteType primitive types weren't correctly treated earlier. Since Parquet doesn't have these one fix is to use fixed-length byte arrays (which are treated as primitives also). This is fine until there will be support for nested types. Even then I think one may want to treat these as actual arrays and not primitives. Anyway.. PR available here: https://github.com/apache/spark/pull/934 saveToParquetFile doesn't support ByteType -- Key: SPARK-1982 URL: https://issues.apache.org/jira/browse/SPARK-1982 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Michael Armbrust Assignee: Andre Schumacher {code} java.lang.RuntimeException: Unsupported datatype ByteType at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetRelation.scala:201) ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1913) Parquet table column pruning error caused by filter pushdown
[ https://issues.apache.org/jira/browse/SPARK-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008193#comment-14008193 ] Andre Schumacher commented on SPARK-1913: - Sigh.. can't believe I missed this when adding the original tests. Good that there is a fix now. Parquet table column pruning error caused by filter pushdown Key: SPARK-1913 URL: https://issues.apache.org/jira/browse/SPARK-1913 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Environment: mac os 10.9.2 Reporter: Chen Chao When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception. Verified in the {{sbt hive/console}}: {code} loadTestTable(src) table(src).saveAsParquetFile(src.parquet) parquetFile(src.parquet).registerAsTable(src_parquet) hql(SELECT value FROM src_parquet WHERE key 10).collect().foreach(println) {code} Exception {code} parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/scratch/rxin/spark/src.parquet/part-r-2.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException: Column key does not exist. at parquet.filter.ColumnRecordFilter$1.bind(ColumnRecordFilter.java:51) at org.apache.spark.sql.parquet.ComparisonFilter.bind(ParquetFilters.scala:306) at parquet.io.FilteredRecordReader.init(FilteredRecordReader.java:46) at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74) at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) ... 28 more {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1913) Parquet table column pruning error caused by filter pushdown
[ https://issues.apache.org/jira/browse/SPARK-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Schumacher updated SPARK-1913: Affects Version/s: (was: 1.0.0) 1.1.0 Parquet table column pruning error caused by filter pushdown Key: SPARK-1913 URL: https://issues.apache.org/jira/browse/SPARK-1913 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Environment: mac os 10.9.2 Reporter: Chen Chao When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception. Verified in the {{sbt hive/console}}: {code} loadTestTable(src) table(src).saveAsParquetFile(src.parquet) parquetFile(src.parquet).registerAsTable(src_parquet) hql(SELECT value FROM src_parquet WHERE key 10).collect().foreach(println) {code} Exception {code} parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/scratch/rxin/spark/src.parquet/part-r-2.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException: Column key does not exist. at parquet.filter.ColumnRecordFilter$1.bind(ColumnRecordFilter.java:51) at org.apache.spark.sql.parquet.ComparisonFilter.bind(ParquetFilters.scala:306) at parquet.io.FilteredRecordReader.init(FilteredRecordReader.java:46) at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74) at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) ... 28 more {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1487) Support record filtering via predicate pushdown in Parquet
[ https://issues.apache.org/jira/browse/SPARK-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Schumacher updated SPARK-1487: Affects Version/s: (was: 1.0.0) 1.1.0 Support record filtering via predicate pushdown in Parquet -- Key: SPARK-1487 URL: https://issues.apache.org/jira/browse/SPARK-1487 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Andre Schumacher Assignee: Andre Schumacher Fix For: 1.1.0 Parquet has support for column filters, which can be used to avoid reading and de-serializing records that fail the column filter condition. This can lead to potentially large savings, depending on the number of columns filtered by and how many records actually pass the filter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1487) Support record filtering via predicate pushdown in Parquet
[ https://issues.apache.org/jira/browse/SPARK-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Schumacher closed SPARK-1487. --- Resolution: Fixed Support record filtering via predicate pushdown in Parquet -- Key: SPARK-1487 URL: https://issues.apache.org/jira/browse/SPARK-1487 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Andre Schumacher Assignee: Andre Schumacher Fix For: 1.1.0 Parquet has support for column filters, which can be used to avoid reading and de-serializing records that fail the column filter condition. This can lead to potentially large savings, depending on the number of columns filtered by and how many records actually pass the filter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1649) DataType should contain nullable bit
[ https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983963#comment-13983963 ] Andre Schumacher commented on SPARK-1649: - OK, I now understand that this would be a bigger change. It's not just struct fields for nested types, array element types, map value value types, etc. IMHO it would be cleaner to have it inside the DataType. But since this seems to be mostly relevant only for nested types could one have a special DataType for them, something like NestedDataType(val nullable: Boolean) extends DataType? DataType should contain nullable bit Key: SPARK-1649 URL: https://issues.apache.org/jira/browse/SPARK-1649 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Andre Schumacher Priority: Critical For the underlying storage layer it would simplify things such as schema conversions, predicate filter determination and such to record in the data type itself whether a column can be nullable. So the DataType type could look like like this: abstract class DataType(nullable: Boolean = true) Concrete subclasses could then override the nullable val. Mostly this could be left as the default but when types can be contained in nested types one could optimize for, e.g., arrays with elements that are nullable and those that are not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1649) DataType should contain nullable bit
[ https://issues.apache.org/jira/browse/SPARK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983985#comment-13983985 ] Andre Schumacher commented on SPARK-1649: - Thinking about it a bit longer.. could Nullable maybe be a mixin? But what should the default be, nullable or not nullable? DataType should contain nullable bit Key: SPARK-1649 URL: https://issues.apache.org/jira/browse/SPARK-1649 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Andre Schumacher Priority: Critical For the underlying storage layer it would simplify things such as schema conversions, predicate filter determination and such to record in the data type itself whether a column can be nullable. So the DataType type could look like like this: abstract class DataType(nullable: Boolean = true) Concrete subclasses could then override the nullable val. Mostly this could be left as the default but when types can be contained in nested types one could optimize for, e.g., arrays with elements that are nullable and those that are not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1649) DataType should contain nullable bit
Andre Schumacher created SPARK-1649: --- Summary: DataType should contain nullable bit Key: SPARK-1649 URL: https://issues.apache.org/jira/browse/SPARK-1649 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Andre Schumacher Priority: Critical For the underlying storage layer it would simplify things such as schema conversions, predicate filter determination and such to record in the data type itself whether a column can be nullable. So the DataType type could look like like this: abstract class DataType(nullable: Boolean = true) Concrete subclasses could then override the nullable val. Mostly this could be left as the default but when types can be contained in nested types one could optimize for, e.g., arrays with elements that are nullable and those that are not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1293) Support for reading/writing complex types in Parquet
[ https://issues.apache.org/jira/browse/SPARK-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968332#comment-13968332 ] Andre Schumacher commented on SPARK-1293: - There is a preliminary pull request now: https://github.com/apache/spark/pull/360 Support for reading/writing complex types in Parquet Key: SPARK-1293 URL: https://issues.apache.org/jira/browse/SPARK-1293 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Andre Schumacher Fix For: 1.1.0 Complex types include: Arrays, Maps, and Nested rows (structs). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1383) Spark-SQL: ParquetRelation improvements
[ https://issues.apache.org/jira/browse/SPARK-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Schumacher resolved SPARK-1383. - Resolution: Fixed Fixed by https://github.com/apache/spark/commit/fbebaedf26286ee8a75065822a3af1148351f828 Spark-SQL: ParquetRelation improvements --- Key: SPARK-1383 URL: https://issues.apache.org/jira/browse/SPARK-1383 Project: Spark Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Andre Schumacher Assignee: Andre Schumacher Improve Spark-SQL's ParquetRelation as follows: - Instead of files a ParquetRelation is should be backed by a directory, which simplifies importing data from other sources - InsertIntoParquetTable operation should supports switching between overwriting or appending (at least in HiveQL) - tests should use the new API - Parquet logging should be forwarded to Log4J - It should be possible to enable compression (default compression for Parquet files: GZIP, as in parquet-mr) - OverwriteCatalog should support dropping of tables -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1367) NPE when joining Parquet Relations
[ https://issues.apache.org/jira/browse/SPARK-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre Schumacher reassigned SPARK-1367: --- Assignee: Andre Schumacher (was: Michael Armbrust) NPE when joining Parquet Relations -- Key: SPARK-1367 URL: https://issues.apache.org/jira/browse/SPARK-1367 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Andre Schumacher Priority: Blocker Fix For: 1.0.0 {code} test(self-join parquet files) { val x = ParquetTestData.testData.subquery('x) val y = ParquetTestData.testData.newInstance.subquery('y) val query = x.join(y).where(x.myint.attr === y.myint.attr) query.collect() } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)