HiveContext: cache table not supported for partitioned table?
Hi, In Spark 1.1 HiveContext, I ran a create partitioned table command followed by a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View 'PARTITIONS' does not exist. But cache table worked fine if the table is not a partitioned table. Can anybody confirm that cache of partitioned table is not supported yet in current version? Thanks, Du
Re: SparkSQL: map type MatchError when inserting into Hive table
It turned out a bug in my code. In the select clause the list of fields is misaligned with the schema of the target table. As a consequence the map data couldn’t be cast to some other type in the schema. Thanks anyway. On 9/26/14, 8:08 PM, Cheng Lian lian.cs@gmail.com wrote: Would you mind to provide the DDL of this partitioned table together with the query you tried? The stacktrace suggests that the query was trying to cast a map into something else, which is not supported in Spark SQL. And I doubt whether Hive support casting a complex type to some other type. On 9/27/14 7:48 AM, Du Li wrote: Hi, I was loading data into a partitioned table on Spark 1.1.0 beeline-thriftserver. The table has complex data types such as mapstring, string and arraymapstring,string. The query is like ³insert overwrite table a partition (Š) select Š² and the select clause worked if run separately. However, when running the insert query, there was an error as follows. The source code of Cast.scala seems to only handle the primitive data types, which is perhaps why the MatchError was thrown. I just wonder if this is still work in progress, or I should do it differently. Thanks, Du scala.MatchError: MapType(StringType,StringType,true) (of class org.apache.spark.sql.catalyst.types.MapType) org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala :2 47) org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:247) org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:263) org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.sca la :84) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap pl y(Projection.scala:66) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.ap pl y(Projection.scala:50) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$ sq l$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.s ca la:149) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi ve File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHi ve File$1.apply(InsertIntoHiveTable.scala:158) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java :1 145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav a: 615) java.lang.Thread.run(Thread.java:722) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
view not supported in spark thrift server?
Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? create table src(k string, v string); load data local inpath '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src; create view kv as select k, v from src; select * from kv; Error: java.lang.NullPointerException (state=,code=0)
Re: view not supported in spark thrift server?
Thanks, Michael, for your quick response. View is critical for my project that is migrating from shark to spark SQL. I have implemented and tested everything else. It would be perfect if view could be implemented soon. Du From: Michael Armbrust mich...@databricks.commailto:mich...@databricks.com Date: Sunday, September 28, 2014 at 12:13 PM To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid Cc: dev@spark.apache.orgmailto:dev@spark.apache.org dev@spark.apache.orgmailto:dev@spark.apache.org, u...@spark.apache.orgmailto:u...@spark.apache.org u...@spark.apache.orgmailto:u...@spark.apache.org Subject: Re: view not supported in spark thrift server? Views are not supported yet. Its not currently on the near term roadmap, but that can change if there is sufficient demand or someone in the community is interested in implementing them. I do not think it would be very hard. Michael On Sun, Sep 28, 2014 at 11:59 AM, Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote: Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? create table src(k string, v string); load data local inpath '/home/y/share/yspark/examples/src/main/resources/kv1.txt' into table src; create view kv as select k, v from src; select * from kv; Error: java.lang.NullPointerException (state=,code=0)
Re: Spark SQL use of alias in where clause
Thanks, Yanbo and Nicholas. Now it makes more sense — query optimization is the answer. /Du From: Nicholas Chammas nicholas.cham...@gmail.commailto:nicholas.cham...@gmail.com Date: Thursday, September 25, 2014 at 6:43 AM To: Yanbo Liang yanboha...@gmail.commailto:yanboha...@gmail.com Cc: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid, dev@spark.apache.orgmailto:dev@spark.apache.org dev@spark.apache.orgmailto:dev@spark.apache.org, u...@spark.apache.orgmailto:u...@spark.apache.org u...@spark.apache.orgmailto:u...@spark.apache.org Subject: Re: Spark SQL use of alias in where clause That is correct. Aliases in the SELECT clause can only be referenced in the ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the statement, like concat() in this case. A more elegant alternative, which is probably not available in Spark SQL yet, is to use Common Table Expressionshttp://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx. On Wed, Sep 24, 2014 at 11:32 PM, Yanbo Liang yanboha...@gmail.commailto:yanboha...@gmail.com wrote: Maybe it's the way SQL works. The select part is executed after the where filter is applied, so you cannot use alias declared in select part in where clause. Hive and Oracle behavior the same as Spark SQL. 2014-09-25 8:58 GMT+08:00 Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid: Hi, The following query does not work in Shark nor in the new Spark SQLContext or HiveContext. SELECT key, value, concat(key, value) as combined from src where combined like ’11%’; The following tweak of syntax works fine although a bit ugly. SELECT key, value, concat(key, value) as combined from src where concat(key,value) like ’11%’ order by combined; Are you going to support alias in where clause soon? Thanks, Du
Spark SQL use of alias in where clause
Hi, The following query does not work in Shark nor in the new Spark SQLContext or HiveContext. SELECT key, value, concat(key, value) as combined from src where combined like ’11%’; The following tweak of syntax works fine although a bit ugly. SELECT key, value, concat(key, value) as combined from src where concat(key,value) like ’11%’ order by combined; Are you going to support alias in where clause soon? Thanks, Du
Re: NullWritable not serializable
Hi, The test case is separated out as follows. The call to rdd2.first() breaks when spark version is changed to 1.1.0, reporting exception NullWritable not serializable. However, the same test passed with spark 1.0.2. The pom.xml file is attached. The test data README.md was copied from spark. Thanks, Du - package com.company.project.test import org.scalatest._ class WritableTestSuite extends FunSuite { test(generated sequence file should be readable from spark) { import org.apache.hadoop.io.{NullWritable, Text} import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.SparkContext._ val conf = new SparkConf(false).setMaster(local).setAppName(test data exchange with spark) val sc = new SparkContext(conf) val rdd = sc.textFile(README.md) val res = rdd.map(x = (NullWritable.get(), new Text(x))) res.saveAsSequenceFile(./test_data) val rdd2 = sc.sequenceFile(./test_data, classOf[NullWritable], classOf[Text]) assert(rdd.first == rdd2.first._2.toString) } } From: Matei Zaharia matei.zaha...@gmail.commailto:matei.zaha...@gmail.com Date: Monday, September 15, 2014 at 10:52 PM To: Du Li l...@yahoo-inc.commailto:l...@yahoo-inc.com Cc: u...@spark.apache.orgmailto:u...@spark.apache.org u...@spark.apache.orgmailto:u...@spark.apache.org, dev@spark.apache.orgmailto:dev@spark.apache.org dev@spark.apache.orgmailto:dev@spark.apache.org Subject: Re: NullWritable not serializable Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally). But generally speaking these operations should *not* work over a network, so you'll have to make sure that you only send serializable types through shuffles or collects, or use a serialization framework like Kryo that might be okay with Writables. Matei On September 15, 2014 at 9:13:13 PM, Du Li (l...@yahoo-inc.commailto:l...@yahoo-inc.com) wrote: Hi Matei, Thanks for your reply. The Writable classes have never been serializable and this is why it is weird. I did try as you suggested to map the Writables to integers and strings. It didn’t pass, either. Similar exceptions were thrown except that the messages became IntWritable, Text are not serializable. The reason is in the implicits defined in the SparkContext object that convert those values into their corresponding Writable classes before saving the data in sequence file. My original code was actual some test cases to try out SequenceFile related APIs. The tests all passed when the spark version was specified as 1.0.2. But this one failed after I changed the spark version to 1.1.0 the new release, nothing else changed. In addition, it failed when I called rdd2.collect(), take(1), and first(). But it worked fine when calling rdd2.count(). As you can see, count() does not need to serialize and ship data while the other three methods do. Do you recall any difference between spark 1.0 and 1.1 that might cause this problem? Thanks, Du From: Matei Zaharia matei.zaha...@gmail.commailto:matei.zaha...@gmail.com Date: Friday, September 12, 2014 at 9:10 PM To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid, u...@spark.apache.orgmailto:u...@spark.apache.org u...@spark.apache.orgmailto:u...@spark.apache.org, dev@spark.apache.orgmailto:dev@spark.apache.org dev@spark.apache.orgmailto:dev@spark.apache.org Subject: Re: NullWritable not serializable Hi Du, I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String). Matie On September 12, 2014 at 8:48:36 PM, Du Li (l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid) wrote: Hi, I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10. Anybody knows what caused the problem? Thanks, Du import org.apache.hadoop.io.{NullWritable, Text} val rdd = sc.textFile(README.md) val res = rdd.map(x = (NullWritable.get(), new Text(x))) res.saveAsSequenceFile(./test_data) val rdd2 = sc.sequenceFile(./test_data, classOf[NullWritable], classOf[Text]) rdd2.collect val rdd3 = sc.sequenceFile[NullWritable,Text](./test_data) rdd3.collect pom.xml Description: pom.xml - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands
NullWritable not serializable
Hi, I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. I got the same problem in similar code of my app which uses the newly released Spark 1.1.0 under hadoop 2.4.0. Previously it worked fine with spark 1.0.2 under either hadoop 2.40 and 0.23.10. Anybody knows what caused the problem? Thanks, Du import org.apache.hadoop.io.{NullWritable, Text} val rdd = sc.textFile(README.md) val res = rdd.map(x = (NullWritable.get(), new Text(x))) res.saveAsSequenceFile(./test_data) val rdd2 = sc.sequenceFile(./test_data, classOf[NullWritable], classOf[Text]) rdd2.collect val rdd3 = sc.sequenceFile[NullWritable,Text](./test_data) rdd3.collect