Re: Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)

2016-11-06 Thread Kiran Chitturi
I get the same error with the JDBC Datasource as well

0: jdbc:hive2://localhost:1> CREATE TABLE jtest USING jdbc OPTIONS
> ("url" "jdbc:mysql://localhost/test", "driver" "com.mysql.jdbc.Driver",
> "dbtable" "stats");
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.156 seconds)
>


0: jdbc:hive2://localhost:1> CREATE TABLE test_stored STORED AS PARQUET
> LOCATION  '/Users/kiran/spark/test5.parquet' AS SELECT * FROM jtest;
> Error: java.lang.AssertionError: assertion failed: No plan for
> InsertIntoTable
> Relation[id#14,stat_repository_type#15,stat_repository_id#16,stat_holder_type#17,stat_holder_id#18,stat_coverage_type#19,stat_coverage_id#20,stat_membership_type#21,stat_membership_id#22,context#23]
> parquet, true, false
> +-
> Relation[id#4,stat_repository_type#5,stat_repository_id#6,stat_holder_type#7,stat_holder_id#8,stat_coverage_type#9,stat_coverage_id#10,stat_membership_type#11,stat_membership_id#12,context#13]
> JDBCRelation(stats) (state=,code=0)
>

JDBCRelation also extends the BaseRelation as well. Is there any workaround
for the Datasources that extend BaseRelation ?



On Sun, Nov 6, 2016 at 8:08 PM, Kiran Chitturi <
kiran.chitt...@lucidworks.com> wrote:

> Hello,
>
> I am encountering a new problem with Spark 2.0.1 that didn't happen with
> Spark 1.6.x.
>
> These SQL statements ran successfully spark-thrift-server in 1.6.x
>
>
>> CREATE TABLE test2 USING solr OPTIONS (zkhost "localhost:9987",
>> collection "test", fields "id" );
>>
>> CREATE TABLE test_stored STORED AS PARQUET LOCATION
>>  '/Users/kiran/spark/test.parquet' AS SELECT * FROM test;
>
>
> but with Spark 2.0.x, the last statement throws this below error
>
>
>> CREATE TABLE test_stored1 STORED AS PARQUET LOCATION
>
> '/Users/kiran/spark/test.parquet' AS SELECT * FROM test2;
>
>
>
>
>
> Error: java.lang.AssertionError: assertion failed: No plan for
>> InsertIntoTable Relation[id#3] parquet, true, false
>> +- Relation[id#2] com.lucidworks.spark.SolrRelation@57d735e9
>> (state=,code=0)
>
>
> The full stack trace is at https://gist.github.com/kiranchitturi/
> 8b3637723e0887f31917f405ef1425a1
>
> SolrRelation class (https://github.com/lucidworks/spark-solr/blob/
> master/src/main/scala/com/lucidworks/spark/SolrRelation.scala)
>
> This error message doesn't seem very meaningful to me. I am not quite sure
> how to track this down or fix this. Is there something I need to implement
> in the SolrRelation class to be able to create Parquet tables from Solr
> tables.
>
> Looking forward to your suggestions.
>
> Thanks,
> --
> Kiran Chitturi
>
>


-- 
Kiran Chitturi


Error while creating tables in Parquet format in 2.0.1 (No plan for InsertIntoTable)

2016-11-06 Thread Kiran Chitturi
Hello,

I am encountering a new problem with Spark 2.0.1 that didn't happen with
Spark 1.6.x.

These SQL statements ran successfully spark-thrift-server in 1.6.x


> CREATE TABLE test2 USING solr OPTIONS (zkhost "localhost:9987", collection
> "test", fields "id" );
>
> CREATE TABLE test_stored STORED AS PARQUET LOCATION
>  '/Users/kiran/spark/test.parquet' AS SELECT * FROM test;


but with Spark 2.0.x, the last statement throws this below error


> CREATE TABLE test_stored1 STORED AS PARQUET LOCATION

'/Users/kiran/spark/test.parquet' AS SELECT * FROM test2;





Error: java.lang.AssertionError: assertion failed: No plan for
> InsertIntoTable Relation[id#3] parquet, true, false
> +- Relation[id#2] com.lucidworks.spark.SolrRelation@57d735e9
> (state=,code=0)


The full stack trace is at
https://gist.github.com/kiranchitturi/8b3637723e0887f31917f405ef1425a1

SolrRelation class (
https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala
)

This error message doesn't seem very meaningful to me. I am not quite sure
how to track this down or fix this. Is there something I need to implement
in the SolrRelation class to be able to create Parquet tables from Solr
tables.

Looking forward to your suggestions.

Thanks,
-- 
Kiran Chitturi


Re: 2.0.0: AnalysisException when reading csv/json files with dots in periods

2016-08-05 Thread Kiran Chitturi
Nevermind, there is already a Jira open for this
https://issues.apache.org/jira/browse/SPARK-16698

On Fri, Aug 5, 2016 at 5:33 PM, Kiran Chitturi <
kiran.chitt...@lucidworks.com> wrote:

> Hi,
>
> During our upgrade to 2.0.0, we found this issue with one of our failing
> tests.
>
> Any csv/json files that contains field names with dots are unreadable
> using DataFrames.
>
> My sample csv file:
>
> flag_s,params.url_s
>> test,http://www.google.com
>
>
> In spark-shell, I ran the following code:
>
> scala> val csvDF = 
> spark.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("test.csv")
>> csvDF: org.apache.spark.sql.DataFrame = [flag_s: string, params.url_s:
>> string]
>> scala> csvDF.take(1)
>> org.apache.spark.sql.AnalysisException: Unable to resolve params.url_s
>> given [flag_s, params.url_s];
>>   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$
>> anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>>   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$
>> anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>>   at scala.Option.getOrElse(Option.scala:121)
>>   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$
>> anonfun$resolve$1.apply(LogicalPlan.scala:133)
>>   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$
>> anonfun$resolve$1.apply(LogicalPlan.scala:129)
>>   at scala.collection.TraversableLike$$anonfun$map$
>> 1.apply(TraversableLike.scala:234)
>>   at scala.collection.TraversableLike$$anonfun$map$
>> 1.apply(TraversableLike.scala:234)
>>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
>>   at scala.collection.TraversableLike$class.map(
>> TraversableLike.scala:234)
>>   at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
>>   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.
>> resolve(LogicalPlan.scala:129)
>>   at org.apache.spark.sql.execution.datasources.
>> FileSourceStrategy$.apply(FileSourceStrategy.scala:87)
>>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$
>> anonfun$1.apply(QueryPlanner.scala:60)
>>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$
>> anonfun$1.apply(QueryPlanner.scala:60)
>>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>   at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(
>> QueryPlanner.scala:61)
>>   at org.apache.spark.sql.execution.SparkPlanner.plan(
>> SparkPlanner.scala:47)
>>   at org.apache.spark.sql.execution.SparkPlanner$$
>> anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51)
>>   at org.apache.spark.sql.execution.SparkPlanner$$
>> anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$
>> transformUp$1.apply(TreeNode.scala:301)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$
>> transformUp$1.apply(TreeNode.scala:301)
>>   at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.
>> withOrigin(TreeNode.scala:69)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(
>> TreeNode.scala:300)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.
>> apply(TreeNode.scala:298)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.
>> apply(TreeNode.scala:298)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.
>> apply(TreeNode.scala:321)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode.
>> mapProductIterator(TreeNode.scala:179)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode.
>> transformChildren(TreeNode.scala:319)
>>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(
>> TreeNode.scala:298)
>>   at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(
>> SparkPlanner.scala:48)
>>   at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(
>> SparkPlanner.scala:48)
>>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>   at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(
>> QueryExecution.scala:78)
>>   at org.apache.spark.sql.execution.QueryExecution.
>> sparkPlan(QueryExe

2.0.0: Hive metastore uses a different version of derby than the Spark package

2016-08-05 Thread Kiran Chitturi
Hi,

In 2.0.0, I encountered this error while using the spark-shell.

Caused by: java.lang.SecurityException: sealing violation: can't seal
package org.apache.derby.impl.services.timer: already loaded

Full stacktrace:
https://gist.github.com/kiranchitturi/9ae38f07d9836a75f233019eb2b65236

While looking at the dependency tree, I found that hive-metastore
'org.spark-project.hive:hive-metastore:jar:1.2.1.spark2:compile' uses
*10.10.2.0* while the jars folder in 2.0.0 binary uses *derby-10.11.1.1.jar*

Excluding derby from my shaded jar fixed the issue.

This could be an issue for someone else. Would it make sense to update so
that hive-metastore and Spark package are on the same derby version ?

Thanks,

-- 
Kiran Chitturi


2.0.0: AnalysisException when reading csv/json files with dots in periods

2016-08-05 Thread Kiran Chitturi
Hi,

During our upgrade to 2.0.0, we found this issue with one of our failing
tests.

Any csv/json files that contains field names with dots are unreadable using
DataFrames.

My sample csv file:

flag_s,params.url_s
> test,http://www.google.com


In spark-shell, I ran the following code:

scala> val csvDF =
> spark.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("test.csv")
> csvDF: org.apache.spark.sql.DataFrame = [flag_s: string, params.url_s:
> string]
> scala> csvDF.take(1)
> org.apache.spark.sql.AnalysisException: Unable to resolve params.url_s
> given [flag_s, params.url_s];
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
>   at scala.Option.getOrElse(Option.scala:121)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:133)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
>   at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
>   at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:129)
>   at
> org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:87)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:61)
>   at
> org.apache.spark.sql.execution.SparkPlanner.plan(SparkPlanner.scala:47)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
>   at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
>   at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
>   at
> org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
>   at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
>   at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
>   at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
>   ... 48 elided
> scala>

The same happens for json files too. Is this a known issue in 2.0.0 ?

Removing the field with dots from the csv/json file fixes the issue :)

Thanks,

-- 
Kiran Chitturi


Re: 2.0.0 packages for twitter streaming, flume and other connectors

2016-08-03 Thread Kiran Chitturi
Thank you!

On Wed, Aug 3, 2016 at 8:45 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> The Flume connector is still available from Spark:
>
> http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-streaming-flume-assembly_2.11%7C2.0.0%7Cjar
>
> Many of the others have indeed been removed from Spark, and can be
> found at the Apache Bahir project: http://bahir.apache.org/
>
> I don't think there's a release for Spark 2.0.0 yet, though (only for
> the preview version).
>
>
> On Wed, Aug 3, 2016 at 8:40 PM, Kiran Chitturi
> <kiran.chitt...@lucidworks.com> wrote:
> > Hi,
> >
> > When Spark 2.0.0 is released, the 'spark-streaming-twitter' package and
> > several other packages are not released/published to maven central. It
> looks
> > like these packages are removed from the official repo of Spark.
> >
> > I found the replacement git repos for these missing packages at
> > https://github.com/spark-packages. This seems to have been created by
> some
> > of the Spark committers.
> >
> > However, the repos haven't been active since last 5 months and none of
> the
> > versions are released/published.
> >
> > Is https://github.com/spark-packages supposed to be the new official
> place
> > for these missing streaming packages ?
> >
> > If so, how can we get someone to release and publish new versions
> officially
> > ?
> >
> > I would like to help in any way possible to get these packages released
> and
> > published.
> >
> > Thanks,
> > --
> > Kiran Chitturi
> >
>
>
>
> --
> Marcelo
>



-- 
Kiran Chitturi


2.0.0 packages for twitter streaming, flume and other connectors

2016-08-03 Thread Kiran Chitturi
Hi,

When Spark 2.0.0 is released, the 'spark-streaming-twitter' package and
several other packages are not released/published to maven central. It
looks like these packages are removed from the official repo of Spark.

I found the replacement git repos for these missing packages at
https://github.com/spark-packages. This seems to have been created by some
of the Spark committers.

However, the repos haven't been active since last 5 months and none of the
versions are released/published.

Is https://github.com/spark-packages supposed to be the new official place
for these missing streaming packages ?

If so, how can we get someone to release and publish new versions
officially ?

I would like to help in any way possible to get these packages released and
published.

Thanks,
-- 
Kiran Chitturi


Spark executor crashes when the tasks are cancelled

2016-04-27 Thread Kiran Chitturi
Hi,

We are seeing this issue with Spark 1.6.1. The executor is exiting when one
of the running tasks is cancelled.

The executor logs is showing the below error and crashing.

16/04/27 16:34:13 ERROR SparkUncaughtExceptionHandler: [Container in
> shutdown] Uncaught exception in thread Thread[Executor task launch
> worker-2,5,main]
> java.lang.Error: java.nio.channels.ClosedByInterruptException
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at
> java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1219)
> at
> org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47)
> at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
> at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:252)
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:195)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:132)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:288)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ... 2 more


I have attached the full logs at this gist:
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f

On the driver side, the following info is logged (
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f)

The following lines show that Executor exited because of the running tasks

2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70] -
> Lost task 0.0 in stage 89.0 (TID 173, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di
> sassociated. Likely due to containers exceeding thresholds, or network
> issues. Check driver logs for WARN messages.
> 2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70]
> - Lost task 1.0 in stage 89.0 (TID 174, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di


Is it possible for executor to die when the jobs in the sparkContext are
cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I
could not find any Jiras that report this error.

Sometimes, we notice a scenario where the executor dies and driver doesn't
request for a new one. This causes the jobs to hang indefinitely. We are
using dynamic allocation for our jobs.

Thanks,


>
Kiran Chitturi


Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
t;> resolve cast (eg. long to integer)
>>>
>>> For an workaround,  the implementation of Solr data source should be
>>> changed to one with CatalystScan, which take all the filters.
>>>
>>> But CatalystScan is not designed to be binary compatible across
>>> releases, however it looks some think it is stable now, as mentioned here,
>>> https://github.com/apache/spark/pull/10750#issuecomment-175400704.
>>>
>>>
>>> Thanks!
>>>
>>>
>>> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
>>>
>>>> Hi Josh,
>>>>
>>>> Can you please clarify whether date comparisons as two strings work at
>>>> all?
>>>>
>>>> I was under the impression is that with string comparison only first
>>>> characters are compared?
>>>>
>>>> Thanks
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 14 April 2016 at 19:26, Josh Rosen <joshro...@databricks.com> wrote:
>>>>
>>>>> AFAIK this is not being pushed down because it involves an implicit
>>>>> cast and we currently don't push casts into data sources or scans; see
>>>>> https://github.com/databricks/spark-redshift/issues/155 for a
>>>>> possibly-related discussion.
>>>>>
>>>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Are you comparing strings in here or timestamp?
>>>>>>
>>>>>> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
>>>>>> (cast(registration#37 as string) <= 2015-05-29))
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 14 April 2016 at 18:04, Kiran Chitturi <
>>>>>> kiran.chitt...@lucidworks.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Timestamp range filter queries in SQL are not getting pushed down to
>>>>>>> the PrunedFilteredScan instances. The filtering is happening at the 
>>>>>>> Spark
>>>>>>> layer.
>>>>>>>
>>>>>>> The physical plan for timestamp range queries is not showing the
>>>>>>> pushed filters where as range queries on other types is working fine as 
>>>>>>> the
>>>>>>> physical plan is showing the pushed filters.
>>>>>>>
>>>>>>> Please see below for code and examples.
>>>>>>>
>>>>>>> *Example:*
>>>>>>>
>>>>>>> *1.* Range filter queries on Timestamp types
>>>>>>>
>>>>>>>*code: *
>>>>>>>
>>>>>>>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>>>>>>>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>>>>>>>
>>>>>>>*Full example*:
>>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>>>>>> *plan*:
>>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>>>>>>>
>>>>>>> *2. * Range filter queries on Long types
>>>>>>>
>>>>>>> *code*:
>>>>>>>
>>>>>>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>>>>>>>> `length` <= '1000'")
>>>>>>>
>>>>>>> *Full example*:
>>>>>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>>>>>> *plan*:
>>>>>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>>>>>>>
>>>>>>> The SolrRelation class we use extends
>>>>>>> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37>
>>>>>>> the PrunedFilteredScan.
>>>>>>>
>>>>>>> Since Solr supports date ranges, I would like for the timestamp
>>>>>>> filters to be pushed down to the Solr query.
>>>>>>>
>>>>>>> Are there limitations on the type of filters that are passed down
>>>>>>> with Timestamp types ?
>>>>>>> Is there something that I should do in my code to fix this ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Kiran Chitturi
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
Kiran Chitturi


Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
Thanks Hyukjin for the suggestion. I will take a look at implementing Solr
datasource with CatalystScan.


​


Spark sql not pushing down timestamp range queries

2016-04-14 Thread Kiran Chitturi
Hi,

Timestamp range filter queries in SQL are not getting pushed down to the
PrunedFilteredScan instances. The filtering is happening at the Spark layer.

The physical plan for timestamp range queries is not showing the pushed
filters where as range queries on other types is working fine as the
physical plan is showing the pushed filters.

Please see below for code and examples.

*Example:*

*1.* Range filter queries on Timestamp types

   *code: *

> sqlContext.sql("SELECT * from events WHERE `registration` >= '2015-05-28'
> AND `registration` <= '2015-05-29' ")

   *Full example*:
https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
*plan*:
https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql

*2. * Range filter queries on Long types

*code*:

> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and `length`
> <= '1000'")

*Full example*:
https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
*plan*:
https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql

The SolrRelation class we use extends
<https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/SolrRelation.scala#L37>
the PrunedFilteredScan.

Since Solr supports date ranges, I would like for the timestamp filters to
be pushed down to the Solr query.

Are there limitations on the type of filters that are passed down with
Timestamp types ?
Is there something that I should do in my code to fix this ?

Thanks,
-- 
Kiran Chitturi


supporting adoc files in spark-packages.org

2016-02-10 Thread Kiran Chitturi
Hi,

We want to add spark-solr repo (https://github.com/LucidWorks/spark-solr)
to the spark-packages.org but it is currently failing due to "Cannot find
README.md" (http://spark-packages.org/staging?id=882)

We use adoc for our internal and external documentation and we are
wondering if spark-packages.org can support ascii doc files in addition to
README.md files.


Thanks,
-- 
Kiran Chitturi