Hi
I try tomorrow same settings as you to see if I can experience same issues.
Will report back once done
Thanks
On 20 Mar 2016 3:50 pm, "Vincent Ohprecio" wrote:
> Thanks Mich and Marco for your help. I have created a ticket to look into
> it on dev channel.
> Here is the issue https://issues.ap
Thanks Mich and Marco for your help. I have created a ticket to look into
it on dev channel.
Here is the issue https://issues.apache.org/jira/browse/SPARK-14031
On Sun, Mar 20, 2016 at 2:57 AM, Mich Talebzadeh
wrote:
> Hi Vincent,
>
> I downloads the CSV file and did the test.
>
> Spark version
Hi Vincent,
I downloads the CSV file and did the test.
Spark version 1.5.2
The full code as follows. Minor changes to delete yearAndCancelled.parquet
and output.csv files if they are already created
//$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.3.0
val HiveContext = n
Hi Vince,
We had a similar case a while back. I tried two solutions in both Spark on
Hive metastore and Hive on Spark engine.
Hive version 2
Spark as Hive engine 1.3.1
Basically
--1 Move .CSV data into HDFS:
--2 Create an external table (all columns as string)
--3 Create the ORC table (majority
Have u tried df.saveAsParquetFIle? I think that method is on df Api
Hth
Marco
On 19 Mar 2016 7:18 pm, "Vincent Ohprecio" wrote:
>
> For some reason writing data from Spark shell to csv using the `csv
> package` takes almost an hour to dump to disk. Am I going crazy or did I do
> this wrong? I tri
For some reason writing data from Spark shell to csv using the `csv
package` takes almost an hour to dump to disk. Am I going crazy or did I do
this wrong? I tried writing to parquet first and its fast as normal.
On my Macbook Pro 16g - 2.2 GHz Intel Core i7 -1TB the machine CPU's goes
crazy and i