Hi
I want to write something in Structured streaming:
1. I have a dataset which has 3 columns: id, last_update_timestamp,
attribute
2. I am receiving the data through Kinesis
I want to deduplicate records based on last_updated. In batch, it looks
like:
spark.sql("select * from (Select *, row_nu
Hi All,
I would like to know how and when the types of the result set are figured
out in Spark? for example say I have the following dataframe.
*inputdf*
col1 | col2 | col3
---
1 | 2 | 5
2 | 3 | 6
Now say I do something like below (Pseudo sql)
resultdf = select c
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
In a situation where multiple workflows write different partitions of the
same table.
Example:
10 Different processes are writing parquet or orc files for different
partitions of the same table foo, at
/staging/tables/foo/partition_field=1,/staging/tables/foo/partition_field=2,/staging/tables/fo
After setting `parquet.strings.signed-min-max.enabled` to `true` in
`ShowMetaCommand.java`, parquet-tools meta show min,max.
@@ -57,8 +57,9 @@ public class ShowMetaCommand extends ArgsOnlyCommand {
String[] args = options.getArgs();
String input = args[0];
Configu
Hi, Nicolas.
Yes. In Apache Spark 2.3, there are new sub-improvements for SPARK-20901
(Feature parity for ORC with Parquet).
For your questions, the following three are related.
1. spark.sql.orc.impl="native"
By default, `native` ORC implementation (based on the latest ORC 1.4.1)
is added.
Hi
Thanks for this work.
Will this affect both:
1) spark.read.format("orc").load("...")
2) spark.sql("select ... from my_orc_table_in_hive")
?
Le 10 janv. 2018 à 20:14, Dongjoon Hyun écrivait :
> Hi, All.
>
> Vectorized ORC Reader is now supported in Apache Spark 2.3.
>
> https://issues.
He is using CSV and either ORC or parquet would be fine.
> On 28. Jan 2018, at 06:49, Gourav Sengupta wrote:
>
> Hi,
>
> There is definitely a parameter while creating temporary security credential
> to mention the number of minutes those credentials will be active. There is
> an upper limit