Re: Issue using kryo serilization

2014-08-01 Thread gpatcham
any pointers to this issue. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issue-using-kryo-serilization-tp11129p11191.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

saveasSequenceFile with codec and compression type

2014-10-20 Thread gpatcham
Hi All, I'm trying to save RDD as sequencefile and not able to use compresiontype (BLOCK or RECORD) Can any one let me know how we can use compressiontype here is the code I'm using RDD.saveAsSequenceFile(target,Some(classOf[org.apache.hadoop.io.compress.GzipCodec])) Thanks -- View this

resource allocation spark on yarn

2014-12-12 Thread gpatcham
Hi All, I have spark on yarn and there are multiple spark jobs on the cluster. Sometimes some jobs are not getting enough resources even when there are enough free resources available on cluster, even when I use below settings --num-workers 75 \ --worker-cores 16 Jobs stick with the resources

query avro hive table in spark sql

2015-08-26 Thread gpatcham
Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException:

Incorrect results with spark sql

2015-09-16 Thread gpatcham
Hi, I'm trying to query on hive view using spark and it is giving different rowcounts when compared to hive. here is the view definition in hive create view test_hive_view as select col1 , col2 from tab1 left join tab2 on tab1.col1 = tab2.col1 left join tab3 on tab1.col1 = tab3.col1 where col1

using spark context in map funciton TASk not serilizable error

2016-01-18 Thread gpatcham
Hi, I have a use case where I need to pass sparkcontext in map function reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) Method1 needs spark context to query cassandra. But I see below error java.io.NotSerializableException: org.apache.spark.SparkContext Is there a way we can fix

Not able pass 3rd party jars to mesos executors

2016-05-10 Thread gpatcham
Hi All, I'm using --jars option in spark-submit to send 3rd party jars . But I don't see they are actually passed to mesos slaves. Getting Noclass found exceptions. This is how I'm using --jars option --jars hdfs://namenode:8082/user/path/to/jar Am I missing something here or what's the

Spark UI error spark 2.0.1 hadoop 2.6

2016-10-27 Thread gpatcham
Hi, I'm running spark-shell in yarn client mode and sparkcontext started and able to run commands . But UI is not coming up and see below error's in spark shell 20:51:20 WARN servlet.ServletHandler: javax.servlet.ServletException: Could not determine the proxy server for redirection at

Re: Spark UI error spark 2.0.1 hadoop 2.6

2016-10-27 Thread gpatcham
I'm able to fix.. added servlet 3.0 to classpath -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970p27971.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Apache Spark orc read performance when reading large number of small files

2018-10-31 Thread gpatcham
When reading large number of orc files from HDFS under a directory spark doesn't launch any tasks until some amount of time and I don't see any tasks running during that time. I'm using below command to read orc and spark.sql configs. What spark is doing under hoods when spark.read.orc is

Re: Apache Spark orc read performance when reading large number of small files

2018-10-31 Thread gpatcham
spark version 2.2.0 Hive version 1.1.0 There are lot of small files Spark code : "spark.sql.orc.enabled": "true", "spark.sql.orc.filterPushdown": "true val logs =spark.read.schema(schema).orc("hdfs://test/date=201810").filter("date > 20181003") Hive: "spark.sql.orc.enabled": "true",

Re: Apache Spark orc read performance when reading large number of small files

2018-11-01 Thread gpatcham
When I run spark.read.orc("hdfs://test").filter("conv_date = 20181025").count with "spark.sql.orc.filterPushdown=true" I see below in executors logs. Predicate push down is happening 18/11/01 17:31:17 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 = (IS_NULL conv_date) leaf-1 = (EQUALS

How to set Description in UI SQL tab

2020-06-04 Thread gpatcham
Is there a way can we set description to display in UI SQL TAB ?. Like we can set sc.setJobDescription for Jobs and stages Thanks Giri -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe