Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

Chanh Le Sat, 30 Jul 2016 03:44:30 -0700

Hi Mich,
Thanks for supporting. Here some of my thoughts.

> BTW can you log in to thrift server and do select * from <TABLE> limit 10
> 
> Do you see the rows?


Yes I can see the row but all the fields value NULL.

> Works OK for me

You just test the number of row. In my case I check and it shows 117 rows but 
the problem is about the data is NULL in all fields.


> AS I see it the issue is that Hive table created as external on Parquet table 
> somehow does not see data. Rows are all nulls.
> 
> I don't think this is specific to thrift server. Just log in to Hive and see 
> you can read the data from your table topic created as external.
> 
> I noticed the same issue

I don’t think it’s a Hive issue. Right now I am using Spark and Zeppelin.


And the point is why with the same parquet file ( I convert from CSV to 
parquet) it can be read in Spark but not in STS.

One more thing is with the same file and method to create table in STS in Spark 
1.6.1 it works fine.


Regards,
Chanh



> On Jul 30, 2016, at 2:10 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> BTW can you log in to thrift server and do select * from <TABLE> limit 10
> 
> Do you see the rows?
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 30 July 2016 at 07:20, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> Works OK for me
> 
> scala> val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", 
> "true").option("header", 
> "false").load("hdfs://rhes564:9000/data/stg/accounts/ll/18740868")
> df: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string, C3: 
> string, C4: string, C5: string, C6: string, C7: string, C8: string]
> scala> df.write.mode("overwrite").parquet("/user/hduser/ll_18740868.parquet")
> scala> sqlContext.read.parquet("/user/hduser/ll_18740868.parquet")count
> res2: Long = 3651
> scala> val ff = sqlContext.read.parquet("/user/hduser/ll_18740868.parquet")
> ff: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string, C3: 
> string, C4: string, C5: string, C6: string, C7: string, C8: string]
> scala> ff.take(5)
> res3: Array[org.apache.spark.sql.Row] = Array([Transaction Date,Transaction 
> Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit 
> Amount,Balance,], [31/12/2009,CPT,'30-64-72,18740868,LTSB STH KENSINGTO CD 
> 5710 31DEC09 ,90.00,,400.00,null], [31/12/2009,CPT,'30-64-72,18740868,LTSB 
> CHELSEA (3091 CD 5710 31DEC09 ,10.00,,490.00,null], 
> [31/12/2009,DEP,'30-64-72,18740868,CHELSEA ,,500.00,500.00,null], 
> [Transaction Date,Transaction Type,Sort Code,Account Number,Transaction 
> Description,Debit Amount,Credit Amount,Balance,])
> 
> Now in Zeppelin create an external table and read it
> 
> <image.png>
> 
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 29 July 2016 at 09:04, Chanh Le <giaosu...@gmail.com> wrote:
> I continue to debug
> 
> 16/07/29 13:57:35 INFO FileScanRDD: Reading File path: 
> file:///Users/giaosudau/Documents/Topics.parquet/part-r-00000-8997050f-e063-427e-b53c-f0a61739706f.gz.parquet,
>  range: 0-3118, partition values: [empty row]
> vs OK one
> 16/07/29 15:02:47 INFO FileScanRDD: Reading File path: 
> file:///Users/giaosudau/data_example/FACT_ADMIN_HOURLY/time=2016-07-24-18/network_id=30206/part-r-00000-c5f5e18d-c8a1-4831-8903-3c60b02bdfe8.snappy.parquet,
>  range: 0-6050, partition values: [2016-07-24-18,30206]
> 
> I attached 2 files.
> 
> 
> 
> 
> 
> 
>> On Jul 29, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com> wrote:
>> 
>> Hi everyone,
>> 
>> For more investigation I attached the file that I convert CSV to parquet.
>> 
>> Spark Code
>> 
>> I loaded from CSV file
>> val df = spark.sqlContext.read 
>> .format("com.databricks.spark.csv").option("delimiter", 
>> ",").option("header", "true").option("inferSchema", 
>> "true").load("/Users/giaosudau/Downloads/Topics.xls - Sheet 1.csv")
>> I create a Parquet
>> df.write.mode("overwrite").parquet("/Users/giaosudau/Documents/Topics.parquet”)
>> 
>> It’s OK in Spark-Shell
>> 
>> scala> df.take(5)
>> res22: Array[org.apache.spark.sql.Row] = Array([124,Nghệ thuật & Giải 
>> trí,Arts & Entertainment,0,124,1], [53,Scandal,Scandal,124,124,53,2], 
>> [54,Showbiz - World,Showbiz-World,124,124,54,2], [52,Âm 
>> nhạc,Entertainment-Music,124,124,52,2], [47,Bar - Karaoke - 
>> Massage,Bar-Karaoke-Massage-Prostitution,124,124,47,2])
>> 
>> When Create a table in STS
>> 
>> 0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE topic (TOPIC_ID int, 
>> TOPIC_NAME_VN String, TOPIC_NAME_EN String, PARENT_ID int, FULL_PARENT 
>> String, LEVEL_ID int) STORED AS PARQUET LOCATION 
>> '/Users/giaosudau/Documents/Topics.parquet’;
>> 
>> But I get all result NULL
>> 
>> <Screen Shot 2016-07-29 at 9.42.26 AM.png>
>> 
>> 
>> 
>> I think it’s really a BUG right?
>> 
>> Regards,
>> Chanh
>> 
>> 
>> <Topics.parquet>
>> 
>> 
>> <Topics.xls - Sheet 1.csv>
>> 
>> 
>> 
>> 
>> 
>>> On Jul 28, 2016, at 4:25 PM, Chanh Le <giaosu...@gmail.com> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I have problem when I create a external table in Spark Thrift Server (STS) 
>>> and query the data.
>>> 
>>> Scenario:
>>> Spark 2.0
>>> Alluxio 1.2.0 
>>> Zeppelin 0.7.0
>>> STS start script 
>>> /home/spark/spark-2.0.0-bin-hadoop2.6/sbin/start-thriftserver.sh --master 
>>> mesos://zk://master1:2181,master2:2181,master3:2181/mesos --conf 
>>> spark.driver.memory=5G --conf spark.scheduler.mode=FAIR --class 
>>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --jars 
>>> /home/spark/spark-2.0.0-bin-hadoop2.6/jars/alluxio-core-client-spark-1.2.0-jar-with-dependencies.jar
>>>  --total-executor-cores 35 spark-internal --hiveconf 
>>> hive.server2.thrift.port=10000 --hiveconf 
>>> hive.metastore.warehouse.dir=/user/hive/warehouse --hiveconf 
>>> hive.metastore.metadb.dir=/user/hive/metadb --conf 
>>> spark.sql.shuffle.partitions=20
>>> 
>>> I have a file store in Alluxio alluxio://master2:19998/etl_info/TOPIC
>>> 
>>> then I create a table in STS by 
>>> CREATE EXTERNAL TABLE topic (topic_id int, topic_name_vn String, 
>>> topic_name_en String, parent_id int, full_parent String, level_id int)
>>> STORED AS PARQUET LOCATION 'alluxio://master2:19998/etl_info/TOPIC';
>>> 
>>> to compare STS with Spark I create a temp table with name topics
>>> spark.sqlContext.read.parquet("alluxio://master2:19998/etl_info/TOPIC").registerTempTable("topics")
>>> 
>>> Then I do query and compare.
>>> <Screen Shot 2016-07-28 at 4.18.59 PM.png>
>>> 
>>> 
>>> As you can see the result is different.
>>> Is that a bug? Or I did something wrong
>>> 
>>> Regards,
>>> Chanh
>> 
> 
> 
> 
>

Re: Spark Thrift Server (Spark 2.0) show table has value with NULL in all fields

Reply via email to