Re: Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)
I get the same error with the JDBC Datasource as well 0: jdbc:hive2://localhost:1> CREATE TABLE jtest USING jdbc OPTIONS > ("url" "jdbc:mysql://localhost/test", "driver" "com.mysql.jdbc.Driver", > "dbtable" "stats"); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.156 seconds) > 0: jdbc:hive2://localhost:1> CREATE TABLE test_stored STORED AS PARQUET > LOCATION '/Users/kiran/spark/test5.parquet' AS SELECT * FROM jtest; > Error: java.lang.AssertionError: assertion failed: No plan for > InsertIntoTable > Relation[id#14,stat_repository_type#15,stat_repository_id#16,stat_holder_type#17,stat_holder_id#18,stat_coverage_type#19,stat_coverage_id#20,stat_membership_type#21,stat_membership_id#22,context#23] > parquet, true, false > +- > Relation[id#4,stat_repository_type#5,stat_repository_id#6,stat_holder_type#7,stat_holder_id#8,stat_coverage_type#9,stat_coverage_id#10,stat_membership_type#11,stat_membership_id#12,context#13] > JDBCRelation(stats) (state=,code=0) > JDBCRelation also extends the BaseRelation as well. Is there any workaround for the Datasources that extend BaseRelation ? On Sun, Nov 6, 2016 at 8:08 PM, Kiran Chitturi < kiran.chitt...@lucidworks.com> wrote: > Hello, > > I am encountering a new problem with Spark 2.0.1 that didn't happen with > Spark 1.6.x. > > These SQL statements ran successfully spark-thrift-server in 1.6.x > > >> CREATE TABLE test2 USING solr OPTIONS (zkhost "localhost:9987", >> collection "test", fields "id" ); >> >> CREATE TABLE test_stored STORED AS PARQUET LOCATION >> '/Users/kiran/spark/test.parquet' AS SELECT * FROM test; > > > but with Spark 2.0.x, the last statement throws this below error > > >> CREATE TABLE test_stored1 STORED AS PARQUET LOCATION > > '/Users/kiran/spark/test.parquet' AS SELECT * FROM test2; > > > > > > Error: java.lang.AssertionError: assertion failed: No plan for >> InsertIntoTable Relation[id#3] parquet, true, false >> +- Relation[id#2] com.lucidworks.spark.SolrRelation@57d735e9 >> (state=,code=0) > > > The full stack trace is at https://gist.github.com/kiranchitturi/ > 8b3637723e0887f31917f405ef1425a1 > > SolrRelation class (https://github.com/lucidworks/spark-solr/blob/ > master/src/main/scala/com/lucidworks/spark/SolrRelation.scala) > > This error message doesn't seem very meaningful to me. I am not quite sure > how to track this down or fix this. Is there something I need to implement > in the SolrRelation class to be able to create Parquet tables from Solr > tables. > > Looking forward to your suggestions. > > Thanks, > -- > Kiran Chitturi > > -- Kiran Chitturi
Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)
Hello, I am encountering a new problem with Spark 2.0.1 that didn't happen with Spark 1.6.x. These SQL statements ran successfully spark-thrift-server in 1.6.x > CREATE TABLE test2 USING solr OPTIONS (zkhost "localhost:9987", collection > "test", fields "id" ); > > CREATE TABLE test_stored STORED AS PARQUET LOCATION > '/Users/kiran/spark/test.parquet' AS SELECT * FROM test; but with Spark 2.0.x, the last statement throws this below error > CREATE TABLE test_stored1 STORED AS PARQUET LOCATION '/Users/kiran/spark/test.parquet' AS SELECT * FROM test2; Error: java.lang.AssertionError: assertion failed: No plan for > InsertIntoTable Relation[id#3] parquet, true, false > +- Relation[id#2] com.lucidworks.spark.SolrRelation@57d735e9 > (state=,code=0) The full stack trace is at https://gist.github.com/kiranchitturi/8b3637723e0887f31917f405ef1425a1 SolrRelation class ( https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala ) This error message doesn't seem very meaningful to me. I am not quite sure how to track this down or fix this. Is there something I need to implement in the SolrRelation class to be able to create Parquet tables from Solr tables. Looking forward to your suggestions. Thanks, -- Kiran Chitturi
Re: 2.0.0: AnalysisException when reading csv/json files with dots in periods
Nevermind, there is already a Jira open for this https://issues.apache.org/jira/browse/SPARK-16698 On Fri, Aug 5, 2016 at 5:33 PM, Kiran Chitturi < kiran.chitt...@lucidworks.com> wrote: > Hi, > > During our upgrade to 2.0.0, we found this issue with one of our failing > tests. > > Any csv/json files that contains field names with dots are unreadable > using DataFrames. > > My sample csv file: > > flag_s,params.url_s >> test,http://www.google.com > > > In spark-shell, I ran the following code: > > scala> val csvDF = > spark.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load("test.csv") >> csvDF: org.apache.spark.sql.DataFrame = [flag_s: string, params.url_s: >> string] >> scala> csvDF.take(1) >> org.apache.spark.sql.AnalysisException: Unable to resolve params.url_s >> given [flag_s, params.url_s]; >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ >> anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ >> anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) >> at scala.Option.getOrElse(Option.scala:121) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ >> anonfun$resolve$1.apply(LogicalPlan.scala:133) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ >> anonfun$resolve$1.apply(LogicalPlan.scala:129) >> at scala.collection.TraversableLike$$anonfun$map$ >> 1.apply(TraversableLike.scala:234) >> at scala.collection.TraversableLike$$anonfun$map$ >> 1.apply(TraversableLike.scala:234) >> at scala.collection.Iterator$class.foreach(Iterator.scala:893) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) >> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >> at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) >> at scala.collection.TraversableLike$class.map( >> TraversableLike.scala:234) >> at org.apache.spark.sql.types.StructType.map(StructType.scala:95) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan. >> resolve(LogicalPlan.scala:129) >> at org.apache.spark.sql.execution.datasources. >> FileSourceStrategy$.apply(FileSourceStrategy.scala:87) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$ >> anonfun$1.apply(QueryPlanner.scala:60) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$ >> anonfun$1.apply(QueryPlanner.scala:60) >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan( >> QueryPlanner.scala:61) >> at org.apache.spark.sql.execution.SparkPlanner.plan( >> SparkPlanner.scala:47) >> at org.apache.spark.sql.execution.SparkPlanner$$ >> anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51) >> at org.apache.spark.sql.execution.SparkPlanner$$ >> anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$ >> transformUp$1.apply(TreeNode.scala:301) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$ >> transformUp$1.apply(TreeNode.scala:301) >> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$. >> withOrigin(TreeNode.scala:69) >> at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp( >> TreeNode.scala:300) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4. >> apply(TreeNode.scala:298) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4. >> apply(TreeNode.scala:298) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5. >> apply(TreeNode.scala:321) >> at org.apache.spark.sql.catalyst.trees.TreeNode. >> mapProductIterator(TreeNode.scala:179) >> at org.apache.spark.sql.catalyst.trees.TreeNode. >> transformChildren(TreeNode.scala:319) >> at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp( >> TreeNode.scala:298) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply( >> SparkPlanner.scala:48) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply( >> SparkPlanner.scala:48) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) >> at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute( >> QueryExecution.scala:78) >> at org.apache.spark.sql.execution.QueryExecution. >> sparkPlan(QueryExe
2.0.0: Hive metastore uses a different version of derby than the Spark package
Hi, In 2.0.0, I encountered this error while using the spark-shell. Caused by: java.lang.SecurityException: sealing violation: can't seal package org.apache.derby.impl.services.timer: already loaded Full stacktrace: https://gist.github.com/kiranchitturi/9ae38f07d9836a75f233019eb2b65236 While looking at the dependency tree, I found that hive-metastore 'org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile' uses *10.10.2.0* while the jars folder in 2.0.0 binary uses *derby-10.11.1.1.jar* Excluding derby from my shaded jar fixed the issue. This could be an issue for someone else. Would it make sense to update so that hive-metastore and Spark package are on the same derby version ? Thanks, -- Kiran Chitturi
2.0.0: AnalysisException when reading csv/json files with dots in periods
Hi, During our upgrade to 2.0.0, we found this issue with one of our failing tests. Any csv/json files that contains field names with dots are unreadable using DataFrames. My sample csv file: flag_s,params.url_s > test,http://www.google.com In spark-shell, I ran the following code: scala> val csvDF = > spark.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("test.csv") > csvDF: org.apache.spark.sql.DataFrame = [flag_s: string, params.url_s: > string] > scala> csvDF.take(1) > org.apache.spark.sql.AnalysisException: Unable to resolve params.url_s > given [flag_s, params.url_s]; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:133) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:95) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:129) > at > org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:87) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:61) > at > org.apache.spark.sql.execution.SparkPlanner.plan(SparkPlanner.scala:47) > at > org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51) > at > org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298) > at > org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48) > at > org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558) > at org.apache.spark.sql.Dataset.head(Dataset.scala:1924) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2139) > ... 48 elided > scala> The same happens for json files too. Is this a known issue in 2.0.0 ? Removing the field with dots from the csv/json file fixes the issue :) Thanks, -- Kiran Chitturi
Re: 2.0.0 packages for twitter streaming, flume and other connectors
Thank you! On Wed, Aug 3, 2016 at 8:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > The Flume connector is still available from Spark: > > http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-streaming-flume-assembly_2.11%7C2.0.0%7Cjar > > Many of the others have indeed been removed from Spark, and can be > found at the Apache Bahir project: http://bahir.apache.org/ > > I don't think there's a release for Spark 2.0.0 yet, though (only for > the preview version). > > > On Wed, Aug 3, 2016 at 8:40 PM, Kiran Chitturi > <kiran.chitt...@lucidworks.com> wrote: > > Hi, > > > > When Spark 2.0.0 is released, the 'spark-streaming-twitter' package and > > several other packages are not released/published to maven central. It > looks > > like these packages are removed from the official repo of Spark. > > > > I found the replacement git repos for these missing packages at > > https://github.com/spark-packages. This seems to have been created by > some > > of the Spark committers. > > > > However, the repos haven't been active since last 5 months and none of > the > > versions are released/published. > > > > Is https://github.com/spark-packages supposed to be the new official > place > > for these missing streaming packages ? > > > > If so, how can we get someone to release and publish new versions > officially > > ? > > > > I would like to help in any way possible to get these packages released > and > > published. > > > > Thanks, > > -- > > Kiran Chitturi > > > > > > -- > Marcelo > -- Kiran Chitturi
2.0.0 packages for twitter streaming, flume and other connectors
Hi, When Spark 2.0.0 is released, the 'spark-streaming-twitter' package and several other packages are not released/published to maven central. It looks like these packages are removed from the official repo of Spark. I found the replacement git repos for these missing packages at https://github.com/spark-packages. This seems to have been created by some of the Spark committers. However, the repos haven't been active since last 5 months and none of the versions are released/published. Is https://github.com/spark-packages supposed to be the new official place for these missing streaming packages ? If so, how can we get someone to release and publish new versions officially ? I would like to help in any way possible to get these packages released and published. Thanks, -- Kiran Chitturi
Spark executor crashes when the tasks are cancelled
Hi, We are seeing this issue with Spark 1.6.1. The executor is exiting when one of the running tasks is cancelled. The executor logs is showing the below error and crashing. 16/04/27 16:34:13 ERROR SparkUncaughtExceptionHandler: [Container in > shutdown] Uncaught exception in thread Thread[Executor task launch > worker-2,5,main] > java.lang.Error: java.nio.channels.ClosedByInterruptException > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at > java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460) > at > org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49) > at > org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1219) > at > org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47) > at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) > at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:252) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:195) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:132) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:288) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ... 2 more I have attached the full logs at this gist: https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f On the driver side, the following info is logged ( https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f) The following lines show that Executor exited because of the running tasks 2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70] - > Lost task 0.0 in stage 89.0 (TID 173, 10.0.0.42): ExecutorLostFailure > (executor 2 exited caused by one of the running tasks) Reason: Remote RPC > client di > sassociated. Likely due to containers exceeding thresholds, or network > issues. Check driver logs for WARN messages. > 2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70] > - Lost task 1.0 in stage 89.0 (TID 174, 10.0.0.42): ExecutorLostFailure > (executor 2 exited caused by one of the running tasks) Reason: Remote RPC > client di Is it possible for executor to die when the jobs in the sparkContext are cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I could not find any Jiras that report this error. Sometimes, we notice a scenario where the executor dies and driver doesn't request for a new one. This causes the jobs to hang indefinitely. We are using dynamic allocation for our jobs. Thanks, > Kiran Chitturi
Re: Spark sql not pushing down timestamp range queries
t;> resolve cast (eg. long to integer) >>> >>> For an workaround, the implementation of Solr data source should be >>> changed to one with CatalystScan, which take all the filters. >>> >>> But CatalystScan is not designed to be binary compatible across >>> releases, however it looks some think it is stable now, as mentioned here, >>> https://github.com/apache/spark/pull/10750#issuecomment-175400704. >>> >>> >>> Thanks! >>> >>> >>> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh <mich.talebza...@gmail.com>: >>> >>>> Hi Josh, >>>> >>>> Can you please clarify whether date comparisons as two strings work at >>>> all? >>>> >>>> I was under the impression is that with string comparison only first >>>> characters are compared? >>>> >>>> Thanks >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote: >>>> >>>>> AFAIK this is not being pushed down because it involves an implicit >>>>> cast and we currently don't push casts into data sources or scans; see >>>>> https://github.com/databricks/spark-redshift/issues/155 for a >>>>> possibly-related discussion. >>>>> >>>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Are you comparing strings in here or timestamp? >>>>>> >>>>>> Filter ((cast(registration#37 as string) >= 2015-05-28) && >>>>>> (cast(registration#37 as string) <= 2015-05-29)) >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 14 April 2016 at 18:04, Kiran Chitturi < >>>>>> kiran.chitt...@lucidworks.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Timestamp range filter queries in SQL are not getting pushed down to >>>>>>> the PrunedFilteredScan instances. The filtering is happening at the >>>>>>> Spark >>>>>>> layer. >>>>>>> >>>>>>> The physical plan for timestamp range queries is not showing the >>>>>>> pushed filters where as range queries on other types is working fine as >>>>>>> the >>>>>>> physical plan is showing the pushed filters. >>>>>>> >>>>>>> Please see below for code and examples. >>>>>>> >>>>>>> *Example:* >>>>>>> >>>>>>> *1.* Range filter queries on Timestamp types >>>>>>> >>>>>>>*code: * >>>>>>> >>>>>>>> sqlContext.sql("SELECT * from events WHERE `registration` >= >>>>>>>> '2015-05-28' AND `registration` <= '2015-05-29' ") >>>>>>> >>>>>>>*Full example*: >>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>>> *plan*: >>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql >>>>>>> >>>>>>> *2. * Range filter queries on Long types >>>>>>> >>>>>>> *code*: >>>>>>> >>>>>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and >>>>>>>> `length` <= '1000'") >>>>>>> >>>>>>> *Full example*: >>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 >>>>>>> *plan*: >>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql >>>>>>> >>>>>>> The SolrRelation class we use extends >>>>>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37> >>>>>>> the PrunedFilteredScan. >>>>>>> >>>>>>> Since Solr supports date ranges, I would like for the timestamp >>>>>>> filters to be pushed down to the Solr query. >>>>>>> >>>>>>> Are there limitations on the type of filters that are passed down >>>>>>> with Timestamp types ? >>>>>>> Is there something that I should do in my code to fix this ? >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Kiran Chitturi >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- Kiran Chitturi
Re: Spark sql not pushing down timestamp range queries
Thanks Hyukjin for the suggestion. I will take a look at implementing Solr datasource with CatalystScan.
Spark sql not pushing down timestamp range queries
Hi, Timestamp range filter queries in SQL are not getting pushed down to the PrunedFilteredScan instances. The filtering is happening at the Spark layer. The physical plan for timestamp range queries is not showing the pushed filters where as range queries on other types is working fine as the physical plan is showing the pushed filters. Please see below for code and examples. *Example:* *1.* Range filter queries on Timestamp types *code: * > sqlContext.sql("SELECT * from events WHERE `registration` >= '2015-05-28' > AND `registration` <= '2015-05-29' ") *Full example*: https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 *plan*: https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql *2. * Range filter queries on Long types *code*: > sqlContext.sql("SELECT * from events WHERE `length` >= '700' and `length` > <= '1000'") *Full example*: https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151 *plan*: https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql The SolrRelation class we use extends <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37> the PrunedFilteredScan. Since Solr supports date ranges, I would like for the timestamp filters to be pushed down to the Solr query. Are there limitations on the type of filters that are passed down with Timestamp types ? Is there something that I should do in my code to fix this ? Thanks, -- Kiran Chitturi
supporting adoc files in spark-packages.org
Hi, We want to add spark-solr repo (https://github.com/LucidWorks/spark-solr) to the spark-packages.org but it is currently failing due to "Cannot find README.md" (http://spark-packages.org/staging?id=882) We use adoc for our internal and external documentation and we are wondering if spark-packages.org can support ascii doc files in addition to README.md files. Thanks, -- Kiran Chitturi