Re: Spark 1.5.2 memory error

2016-02-02 Thread Jim Green
Look at part#3 in below blog: http://www.openkb.info/2015/06/resource-allocation-configurations-for.html You may want to increase the executor memory, not just the spark.yarn.executor.memoryOverhead. On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov wrote: > For the

Array column stored as “.bag” in parquet file instead of “REPEATED INT64

2015-08-27 Thread Jim Green
Hi Team, Say I have a test.json file: {c1:[1,2,3]} I can create a parquet file like : var df = sqlContext.load(/tmp/test.json,json) var df_c = df.repartition(1) df_c.select(*).save(/tmp/testjson_spark,parquet”) The output parquet file’s schema is like: c1: OPTIONAL F:1 .bag:

Re: Is SPARK-3322 fixed in latest version of Spark?

2015-08-05 Thread Jim Green
...@gmail.com wrote: ConnectionManager has been deprecated and is no longer used by default (NettyBlockTransferService is the replacement). Hopefully you would no longer see these messages unless you have explicitly flipped it back on. On Tue, Aug 4, 2015 at 6:14 PM, Jim Green openkbi...@gmail.com

Re: Is SPARK-3322 fixed in latest version of Spark?

2015-08-04 Thread Jim Green
And also https://issues.apache.org/jira/browse/SPARK-3106 This one is still open. On Tue, Aug 4, 2015 at 6:12 PM, Jim Green openkbi...@gmail.com wrote: *Symotom:* Even sample job fails: $ MASTER=spark://xxx:7077 run-example org.apache.spark.examples.SparkPi 10 Pi is roughly 3.140636 ERROR

Is SPARK-3322 fixed in latest version of Spark?

2015-08-04 Thread Jim Green
*Symotom:* Even sample job fails: $ MASTER=spark://xxx:7077 run-example org.apache.spark.examples.SparkPi 10 Pi is roughly 3.140636 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(xxx,) not found WARN ConnectionManager: All connections not cleaned up Found

Resource allocation configurations for Spark on Yarn

2015-06-12 Thread Jim Green
Hi Team, Sharing one article which summarize the Resource allocation configurations for Spark on Yarn: Resource allocation configurations for Spark on Yarn http://www.openkb.info/2015/06/resource-allocation-configurations-for.html -- Thanks, www.openkb.info (Open KnowledgeBase for

Spark impersonation

2015-02-02 Thread Jim Green
Hi Team, Does spark support impersonation? For example, when spark on yarn/hive/hbase/etc..., which user is used by default? The user which starts the spark job? Any suggestions related to impersonation? -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Scala on Spark functions examples cheatsheet.

2015-02-02 Thread Jim Green
Hi Team, I just spent some time these 2 weeks on Scala and tried all Scala on Spark functions in the Spark Programming Guide http://spark.apache.org/docs/1.2.0/programming-guide.html. If you need example codes of Scala on Spark functions, I created this cheat sheet

Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Hi Team, I need some help on writing a scala to bulk load some data into hbase. *Env:* hbase 0.94 spark-1.0.2 I am trying below code to just bulk load some data into hbase table “t1”. import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
: public void write(ImmutableBytesWritable row, KeyValue kv) Meaning, KeyValue is expected, not Put. On Tue, Jan 27, 2015 at 10:54 AM, Jim Green openkbi...@gmail.com wrote: Hi Team, I need some help on writing a scala to bulk load some data into hbase. *Env:* hbase 0.94 spark-1.0.2

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
).getBytes()) (new ImmutableBytesWritable(Bytes.toBytes(x)), put) }) rdd.saveAsNewAPIHadoopFile(/tmp/13, classOf[ImmutableBytesWritable], classOf[KeyValue], classOf[HFileOutputFormat], conf) On Tue, Jan 27, 2015 at 12:17 PM, Jim Green openkbi...@gmail.com wrote: Thanks Ted. Could you give me