Re: How to specify default value for StructField?

2017-02-14 Thread smartzjp
You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 

Re: why does spark web UI keeps changing its port?

2017-01-24 Thread smartzjp
Config your spark master web ui you can set env SPARK_MASTER_WEBUI_PORT=
You can running cmd  netstat –nao|grep 4040 to check 4040 is in using

———


I am not sure why Spark web UI keeps changing its port every time I restart a 
cluster? how can I make it run always on one port? I did make sure there is no 
process running on 4040(spark default web ui port) however it still starts at 
8080. any ideas?



MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://x.x.x.x:8080



Thanks!



Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp

It’s since spark version 2.0.0,  if you are using under the version, you can 
try the below code.

result.write.format("csv").save(path)


--

Hi,
  I tried the below code, as 
result.write.csv(home/Prasad/)   
It is not working,
  It says

Error: value csv is not member of org.apache.spark.sql.DataFrameWriter.

Regards
Prasad


On Thu, Jan 19, 2017 at 4:35 PM, smartzjp <zjp_j...@163.com> wrote:
Beacause the reduce number will be not one, so it will out put a fold on the 
HDFS,  You can use  “result.write.csv(foldPath)”.



--

Hi,
  Can anyone please let us know how to write the output of the Spark SQL  in
Local  and HDFS path using Scala code.

Code :-

scala>  val result = sqlContext.sql("select empno , name from emp");
scala > result.show();

If I give the command result.show() then It will print the output in the 
console.
I need to redirect the output in local file as well as HDFS file.
with the delimiter as "|".

We tried with the below code
 result.saveAsTextFile ("home/Prasad/result.txt")
It is not working as expected.


-- 
--
Prasad. T



-- 
--
Regards,
RAVI PRASAD. T



Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
Beacause the reduce number will be not one, so it will out put a fold on the 
HDFS,  You can use  “result.write.csv(foldPath)”.



--

Hi,
  Can anyone please let us know how to write the output of the Spark SQL  in
Local  and HDFS path using Scala code.

Code :-

scala>  val result = sqlContext.sql("select empno , name from emp");
scala > result.show();

If I give the command result.show() then It will print the output in the 
console.
I need to redirect the output in local file as well as HDFS file.
with the delimiter as "|".

We tried with the below code
 result.saveAsTextFile ("home/Prasad/result.txt")
It is not working as expected.


-- 
--
Prasad. T



Re: how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread smartzjp

I think if you want to run spark sql on CLI this configuration will be ok, but 
if you want to run with distributed query engine,  start the JDBC/ODBC server 
and set the hive address info.

You can reference this description for more detail.
http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine


-


Spark read hive table, catalog. CurrentDatabase value is the default, how the 
sparksession initialization, set currentDatabase value?




hive.metastore.uris
thrift://localhost:9083
IP address (or fully-qualified domain name) and port of 
the metastore host






Re: Spark 2.0.2, KyroSerializer, double[] is not registered.

2017-01-07 Thread smartzjp
You can have a try the following code.

ObjectArraySerializer serializer = new ObjectArraySerializer(kryo, 
Double[].class);
kryo.register(Double[].class, serializer);


---

Hi, all.
I enable kyro in spark with spark-defaults.conf:
 spark.serializer org.apache.spark.serializer.KryoSerializer
 spark.kryo.registrationRequired  true

A KryoException is raised when a logistic regression algorithm is running:
 Note: To register this class use: kryo.register(double[].class);
 Serialization trace:
 currL1 (org.apache.spark.mllib.stat.MultivariateOnlineSummarizer)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
at 
com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
at 
com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
 
My question is:
Doesn't double[].class be supported by default?

Thanks.