from:"\"李斌松\""

In spark 2.4 Upgrade Apache Arrow to version 0.12.0

2019-05-14 Thread 李斌松

Pyarrow version 0.12.1, arrow jar version 0.10, can run correctly. Pyarrow version 0.121, arrow jar version 0.12, this exception occurs: > *Expected schema message in stream, was null or length 0* Pyarrow was upgraded to version 0.12.1 by other package dependency, resulting in inconsistency betw

Re: [spark-sql] Hive failing on insert empty array into parquet table

2018-12-29 Thread 李斌松

https://issues.apache.org/jira/browse/HIVE-13632 李斌松于2018年12月29日周六下午4:08写道： > Hive has fixed this problem, which is not fixed in > hive-exec-1.2.1.spark2.jar > > [image: image.png] > >

[Spark SQL]use zstd, No enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD

2018-12-19 Thread 李斌松

Import parquet-hadoop-bundle jar. into the spark hive project When you compress data using zstd, you may load it preferentially from the parquet-hadoop-bundle, and you canundefinedt find the enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD > > 18/12/20 10:35:28 ERROR Executor: Excep

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-06 Thread 李斌松

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

spark 2.3 dataframe join bug

2018-03-26 Thread 李斌松

Hi, sparks: I'm using spark2.3 and had found a bug in spark dataframe, here is my codes: sc = sparkSession.sparkContext tmp = sparkSession.createDataFrame(sc.parallelize([[1, 2, 3, 4], [1, 2, 5, 6], [2, 3, 4, 5], [2, 3, 5, 6]])).toDF('a', 'b', 'c', 'd') tmp.createOrRep

How is data desensitization (example: select bank_no from users)?

2017-08-19 Thread 李斌松

For example, the user's bank card number cannot be viewed by an analyst and replaced by an asterisk. How do you do that in spark?

Limit the number of tasks submitted：spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

2017-07-11 Thread 李斌松

Limit the number of tasks submitted to avoid a task occupancy attitude resources, while you can guide users to set reasonable conditions, [image: 内嵌图片 1] spark_submit_tasks_threshold.patch Description: Binary data - To unsubscr

Does spark support hive table(parquet) column renaming?

2017-06-19 Thread 李斌松

Does spark support hive table(parquet) column renaming?

Custom function cannot be accessed across database

2017-05-22 Thread 李斌松

Custom function cannot be accessed across database, example: The registration function json_extract_value is in database A, and A.json_extract_value cannot be called in the database B SessionCatalog.java externalCatalog.getFunction(currentDb, name.funcName) to externalCatalog.getFunction(name.d

How does spark hiveserver dynamically update function dependent jar?

2017-05-18 Thread 李斌松

Create a temporary function, reference HDFS on the jar file, update the jar file, not immediately effective, need to restart hiveserver

Spark thriff server hiveStatement.getQueryLog return empty

2017-03-12 Thread 李斌松

Spark thriff server hiveStatement.getQueryLog return empty?

Spark SQL table authority control？

2017-02-25 Thread 李斌松

Through the JDBC connection spark thriftserver, execte hive SQL, check whether the table read or write permission to expand hook in hive on spark, you can control permissions, spark on hive what is the point of expansion?

how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread 李斌松

Spark read hive table, catalog. CurrentDatabase value is the default, how the sparksession initialization, set currentDatabase value? hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host

What is the difference between hive on spark and spark on hive?

2017-01-09 Thread 李斌松

What is the difference between hive on spark and spark on hive?

The spark hive udf can read broadcast the variables?

2016-12-17 Thread 李斌松

The spark hive udf can read broadcast the variables?

How to reflect dynamic registration udf?

2016-12-15 Thread 李斌松

How to reflect dynamic registration udf? java.lang.UnsupportedOperationException: Schema for type _$13 is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)

In spark 2.4 Upgrade Apache Arrow to version 0.12.0

Re: [spark-sql] Hive failing on insert empty array into parquet table

[Spark SQL]use zstd, No enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD

If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

spark 2.3 dataframe join bug

How is data desensitization (example: select bank_no from users)?

Limit the number of tasks submitted：spark.submit.tasks.threshold.enabled & spark.submit.tasks.threshold

Does spark support hive table(parquet) column renaming?

Custom function cannot be accessed across database

How does spark hiveserver dynamically update function dependent jar?

Spark thriff server hiveStatement.getQueryLog return empty

Spark SQL table authority control？

how the sparksession initialization, set currentDatabase value?

What is the difference between hive on spark and spark on hive?

The spark hive udf can read broadcast the variables?

How to reflect dynamic registration udf?

16 matches

Site Navigation

Mail list logo

Footer information