pyspark split pair rdd to multiple

2016-04-19 Thread pth001
Hi, How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

dataframe access hive complex type

2016-01-19 Thread pth001
Hi, How dataframe (What API) can access hive complex type (Struct, Array, Maps)? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

OrcNewOutputFormat write partitioned orc file

2015-11-16 Thread pth001
Hi, How to write partitioned orc file using OrcNewOutputFormat in MapReduce? Thanks Patcharee

override log4j level

2015-11-16 Thread pth001
Hi, How can I override log4j level by using --hiveconf? I want to use ERROR level for some tasks. Thanks, Patcharee

Re: character '' not supported here

2015-07-18 Thread pth001
Hi, The query result> 11236119012.64043-5.9708868.5592070.0 0.0 0.0-19.6869931308.804799848.00.006196644 0.00.0 301.274750.382470460.0NULL11 20081 11236122012.513598-6.36717137.3927946 0.0

Re: character '' not supported here

2015-07-18 Thread pth001
Hi, The query result> 11236119012.64043-5.9708868.5592070.0 0.0 0.0-19.6869931308.804799848.00.006196644 0.00.0 301.274750.382470460.0NULL11 20081 11236122012.513598-6.36717137.3927946 0.0

alter table on multiple partitions

2015-06-30 Thread pth001
Hi, I have a table partitioned by a, b, c, d column. I want to alter concatenate this table. Is it possible to use wildcard in alter command to alter several partitions at a time? For ex. alter table TestHive partition (a=1, b=*, c=2, d=*) CONCATENATE; BR, Patcharee

How to use KryoSerializer : ClassNotFoundException

2015-06-24 Thread pth001
Hi, I am using spark 1.4. I wanted to serialize by KryoSerializer, but got ClassNotFoundException. The configuration and exception is below. When I submitted the job, I also provided --jars mylib.jar which contains WRFVariableZ. conf.set("spark.serializer", "org.apache.spark.serializer.KryoS

memory needed for each executor

2015-06-21 Thread pth001
Hi, How can I know the size of memory needed for each executor (one core) to execute each job? If there are many cores per executors, will the memory be the multiplication (memory needed for each executor (one core) * no. of cores)? Any suggestions/guidelines? BR, Patcharee ---

Re: Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001
://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Hope this helps, Will On June 13, 2015, at 3:36 PM, pth001 wrote: Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format

Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001
Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format("org.apache.spark.sql.hive.orc.DefaultSource") .mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("testorc") When this job is s

ERROR 2135: Received error from store function.Premature EOF: no length prefix available

2015-06-09 Thread pth001
Hi, My pig on Tez (to store dataset into a partitioned hive table) throws the following exception. What can be wrong? How can I fix it? 2015-06-09 10:59:57,268 ERROR [TezChild] runtime.PigProcessor: Encountered exception while processing: org.apache.pig.backend.executionengine.ExecException:

ERROR 2135: Received error from store function.Premature EOF: no length prefix available

2015-06-09 Thread pth001
Hi, My pig on Tez (to store dataset into a partitioned hive table) throws the following exception. What can be wrong? How can I fix it? 2015-06-09 10:59:57,268 ERROR [TezChild] runtime.PigProcessor: Encountered exception while processing: org.apache.pig.backend.executionengine.ExecException:

Cast relation to scala ClassCastException: java.lang.Integer cannot be cast to java.lang.String

2015-05-27 Thread pth001
Hi, I tried to cast relation (one row) to scala. It works well when the cast field is Integer. But if the cast field is FLOAT, i got ClassCastException: java.lang.Integer cannot be cast to java.lang.String. coordinate_cossin_xy = FOREACH join_coordinate_cossin_xy GENERATE coordinate_xy::xlo

filter by query result

2015-05-27 Thread pth001
Hi, I am new to pig. First I queried a hive table (x = LOAD 'x' USING org.apache.hive.hcatalog.pig.HCatLoader();) and got a single record/value. How can I used this single value to filter in another query? I hope to get a better performance by filter as soon as possible. BR, Patcharee

EOFException - TezJob - Cannot submit DAG

2015-05-22 Thread pth001
Hi, I ran a pig script on tez and got the EOFException. Check at http://wiki.apache.org/hadoop/EOFException I have no ideas at all how I can fix it. However I did not get the exception when I executed this pig script on MR. I am using HadoopVersion: 2.6.0.2.2.4.2-2, PigVersion: 0.14.0.2.2.4.

create a pipeline

2015-04-15 Thread pth001
Hi, How can I create a pipeline (containing a sequence of pig scripts)? BR, Patcharee