Hi Zang any idea why is this happening? I can load ORC files created by Hive table but I cant load ORC files created by Spark itself. It looks like bug.
On Wed, Sep 30, 2015 at 12:03 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > Hi Zang thanks much please find the code below > > Working code loading data from a path created by Hive table using hive > console outside of spark : > > DataFrame df = > hiveContext.read().format("orc").load("/hdfs/path/to/hive/table/partition") > > Not working code inside spark hive tables created using hiveContext.sql > insert into partition queries > > DataFrame df = > hiveContext.read().format("orc").load("/hdfs/path/to/hive/table/partition/created/by/spark") > > You see above is same in both cases just second code is trying to load orc > data created by Spark. > On Sep 30, 2015 11:22 AM, "Zhan Zhang" <zzh...@hortonworks.com> wrote: > >> Hi Umesh, >> >> The potential reason is that Hive and Spark does not use same >> OrcInputFormat. In new hive version, there are NewOrcInputFormat, but it is >> not in spark because of backward compatibility (which is not available in >> hive-0.12). >> Do you mind post the code that works and not works for you? >> >> Thanks. >> >> Zhan Zhang >> >> On Sep 29, 2015, at 10:05 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: >> >> Hi I can read/load orc data created by hive table in a dataframe why is >> it throwing Malformed ORC exception when I try to load data created by >> hiveContext.sql into dataframe? >> On Sep 30, 2015 2:37 AM, "Hortonworks" <zzh...@hortonworks.com> wrote: >> >>> You can try to use data frame for both read and write >>> >>> Thanks >>> >>> Zhan Zhang >>> >>> >>> Sent from my iPhone >>> >>> On Sep 29, 2015, at 1:56 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: >>> >>> Hi Zang, thanks for the response. Table is created using Spark >>> hiveContext.sql and data inserted into table also using hiveContext.sql. >>> Insert into partition table. When I try to load orc data into dataframe I >>> am loading particular partition data stored in path say >>> /user/xyz/Hive/xyz.db/sparktable/partition1=abc >>> >>> Regards, >>> Umesh >>> On Sep 30, 2015 02:21, "Hortonworks" <zzh...@hortonworks.com> wrote: >>> >>>> How was the table is generated, by hive or by spark? >>>> >>>> If you generate table using have but read it by data frame, it may have >>>> some comparability issue. >>>> >>>> Thanks >>>> >>>> Zhan Zhang >>>> >>>> >>>> Sent from my iPhone >>>> >>>> > On Sep 29, 2015, at 1:47 PM, unk1102 <umesh.ka...@gmail.com> wrote: >>>> > >>>> > Hi I have a spark job which creates hive tables in orc format with >>>> > partitions. It works well I can read data back into hive table using >>>> hive >>>> > console. But if I try further process orc files generated by Spark >>>> job by >>>> > loading into dataframe then I get the following exception >>>> > Caused by: java.io.IOException: Malformed ORC file >>>> > hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt. >>>> Invalid >>>> > postscript. >>>> > >>>> > Dataframe df = hiveContext.read().format("orc").load(to/path); >>>> > >>>> > Please guide. >>>> > >>>> > >>>> > >>>> > -- >>>> > View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html >>>> > Sent from the Apache Spark User List mailing list archive at >>>> Nabble.com <http://nabble.com/>. >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>> > >>>> > >>>> >>>> -- >>>> CONFIDENTIALITY NOTICE >>>> NOTICE: This message is intended for the use of the individual or >>>> entity to >>>> which it is addressed and may contain information that is confidential, >>>> privileged and exempt from disclosure under applicable law. If the >>>> reader >>>> of this message is not the intended recipient, you are hereby notified >>>> that >>>> any printing, copying, dissemination, distribution, disclosure or >>>> forwarding of this communication is strictly prohibited. If you have >>>> received this communication in error, please contact the sender >>>> immediately >>>> and delete it from your system. Thank You. >>>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >> >> >>