Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
Spark3.3 in OSS built with parquet 1.12. Just compiling with parquet 1.10 results in build failure , so just wondering if any one have build & compiled Spark 3.3 with parquet 1.10. Regards Pralabh Kumar On Mon, Jul 24, 2023 at 3:04 PM Mich Talebzadeh wrote: > Hi, > > Where is th

Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
3.3 with parquet 1.10 ? What are the dos/ don't for it ? Regards Pralabh Kumar

Spark 3.0.0 EOL

2023-07-24 Thread Pralabh Kumar
Hi Dev Team If possible , can you please provide the Spark 3.0.0 EOL timelines . Regards Pralabh Kumar

SPARK-43235

2023-05-03 Thread Pralabh Kumar
Hi Dev Please find some time to review the Jira. Regards Pralabh Kumar

Re: Setting spark.kubernetes.driver.connectionTimeout, spark.kubernetes.submission.connectionTimeout to default spark.network.timeout

2022-08-02 Thread Pralabh Kumar
figuration if you have very > limited control plan resources. > > spark.kubernetes.executor.enablePollingWithResourceVersion=true > > Dongjoon. > > On Mon, Aug 1, 2022 at 7:52 AM Pralabh Kumar > wrote: > > > > Hi Dev team > > > > > > > > Since spa

Setting spark.kubernetes.driver.connectionTimeout, spark.kubernetes.submission.connectionTimeout to default spark.network.timeout

2022-08-01 Thread Pralabh Kumar
understanding is correct Regards Pralabh Kumar

Spark-39755 Review/comment

2022-07-14 Thread Pralabh Kumar
Hi Dev community Please review/comment https://issues.apache.org/jira/browse/SPARK-39755 Regards Pralabh kumar

Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-14 Thread Pralabh Kumar
for Spark on K8s on Spark32 running on version < Hadoop 3.2 (since the default value in Docker file for Spark32 is Java 11) Please let me know if it make sense to you. Regards Pralabh Kumar On Tue, Jun 14, 2022 at 4:21 PM Steve Loughran wrote: > hadoop 3.2.x is the oldest of the

Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
our environment (with Hadoop3.1) Regards Pralabh Kumar On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran wrote: > > > On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar > wrote: > >> Hi Dev team >> >> I have a spark32 image with Java 11 (Running Spark on K8s) . Whil

Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Steve . Thx for your help ,please ignore last comment. Regards Pralabh Kumar On Mon, 13 Jun 2022, 15:43 Pralabh Kumar, wrote: > Hi steve > > Thx for help . We are on Hadoop3.2 ,however we are building Hadoop3.2 with > Java 8 . > > Do you suggest to build Hadoop with J

Re: Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Hi steve Thx for help . We are on Hadoop3.2 ,however we are building Hadoop3.2 with Java 8 . Do you suggest to build Hadoop with Java 11 Regards Pralabh kumar On Mon, 13 Jun 2022, 15:25 Steve Loughran, wrote: > > > On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar > wrote: >

Spark32 + Java 11 . Reading parquet java.lang.NoSuchMethodError: 'sun.misc.Cleaner sun.nio.ch.DirectBuffer.cleaner()'

2022-06-13 Thread Pralabh Kumar
Hi Dev team I have a spark32 image with Java 11 (Running Spark on K8s) . While reading a huge parquet file via spark.read.parquet("") . I am getting the following error . The same error is mentioned in Spark docs https://spark.apache.org/docs/latest/#downloading but w.r.t to apache arrow.

CVE-2020-13936

2022-05-05 Thread Pralabh Kumar
Hi Dev Team Please let me know if there is a jira to track this CVE changes with respect to Spark . Searched jira but couldn't find anything. Please help Regards Pralabh Kumar

CVE-2021-22569

2022-05-04 Thread Pralabh Kumar
Hi Dev Team Spark is using protobuf 2.5.0 which is vulnerable to CVE-2021-22569. CVE recommends to use protobuf 3.19.2 Please let me know , if there is a jira to track the update w.r.t CVE and Spark or should I create the one ? Regards Pralabh Kumar

Re: Issue on Spark on K8s with Proxy user on Kerberized HDFS : Spark-25355

2022-05-03 Thread Pralabh Kumar
. Regards Pralabh Kumar On Tue, May 3, 2022 at 7:39 PM Steve Loughran wrote: > > Prablah, did you follow the URL provided in the exception message? i put a > lot of effort in to improving the diagnostics, where the wiki articles are > part of the troubleshooing process > https://issues.

Issue on Spark on K8s with Proxy user on Kerberized HDFS : Spark-25355

2022-04-29 Thread Pralabh Kumar
y.$Proxy14.getFileInfo(Unknown Source) at On debugging deep , we found the proxy user doesn't have access to delegation tokens in case of K8s .SparkSubmit.submit explicitly creating the proxy user and this user doesn't have delegation token. Please help me with the same. Regards Pralabh Kumar

Spark3.2 on K8s with proxy-user kerberized environment

2022-04-25 Thread Pralabh Kumar
Hi dev team Please help me on the below problem I have kerberized cluster and am also doing the kinit . Problem is only coming when the proxy user is being used . > > Running Spark 3.2 on K8s with --proxy-user and getting below error and > then the job fails . However when running without a

Re: CVE -2020-28458, How to upgrade datatables dependency

2022-04-17 Thread Pralabh Kumar
er upgrade to 1.10.22 is also sufficient. >> >> On Wed, Apr 13, 2022 at 7:43 AM Pralabh Kumar >> wrote: >> >>> Hi Dev Team >>> >>> Spark 3.2 (and 3.3 might also) have CVE 2020-28458. Therefore in my >>> local repo of Spark I would

CVE -2020-28458, How to upgrade datatables dependency

2022-04-13 Thread Pralabh Kumar
Hi Dev Team Spark 3.2 (and 3.3 might also) have CVE 2020-28458. Therefore in my local repo of Spark I would like to update DataTables to 1.11.5. Can you please help me to point out where I should upgrade DataTables dependency ?. Regards Pralabh Kumar

Spark 3.0.1 and spark 3.2 compatibility

2022-04-07 Thread Pralabh Kumar
Hi spark community I have quick question .I am planning to migrate from spark 3.0.1 to spark 3.2. Do I need to recompile my application with 3.2 dependencies or application compiled with 3.0.1 will work fine on 3.2 ? Regards Pralabh kumar

Spark on K8s , some applications ended ungracefully

2022-03-31 Thread Pralabh Kumar
) at org.apache.spark.util.ThreadUtils$.shutdown(ThreadUtils.scala:348) Please let me know if there is a solution for it .. Regards Pralabh Kumar

Skip single integration test case in Spark on K8s

2022-03-16 Thread Pralabh Kumar
m successfully able to run some test cases and some are failing . For e.g "Run SparkRemoteFileTest using a Remote data file" in KuberneterSuite is failing. Is there a way to skip running some of the test cases ?. Please help me on the same. Regards Pralabh Kumar

Spark on K8s : property simillar to yarn.max.application.attempt

2022-02-04 Thread Pralabh Kumar
machine . Is there a way to do the same . Regards Pralabh Kumar

Re: Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-18 Thread Pralabh Kumar
Does this property spark.kubernetes.executor.deleteontermination checks whether the executor which is deleted have shuffle data or not ? On Tue, 18 Jan 2022, 11:20 Pralabh Kumar, wrote: > Hi spark team > > Have cluster wide property spark.kubernetis.executor.deleteontermination

Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-17 Thread Pralabh Kumar
Hi spark team Have cluster wide property spark.kubernetis.executor.deleteontermination to true. During the long running job, some of the executor got deleted which have shuffle data. Because of this, in the subsequent stage , we get lot of spark shuffle fetch fail exceptions. Please let me

Difference in behavior for Spark 3.0 vs Spark 3.1 "create database "

2022-01-10 Thread Pralabh Kumar
to prefix with hdfs to create db on hdfs. Why is there a difference in the behavior, Can you please point me to the jira which causes this change. Note : spark.sql.warehouse.dir and hive.metastore.warehouse.dir both are having default values(not explicitly set) Regards Pralabh Kumar

ivy unit test case filing for Spark

2021-12-21 Thread Pralabh Kumar
3) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) Regards Pralabh Kumar

Log4j 1.2.17 spark CVE

2021-12-12 Thread Pralabh Kumar
Hi developers, users Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on recent CVE detected ? Regards Pralabh kumar

https://issues.apache.org/jira/browse/SPARK-36622

2021-09-01 Thread Pralabh Kumar
Hi Spark dev Community Please let me know your opinion about https://issues.apache.org/jira/browse/SPARK-36622 Regards Pralabh Kumar

Spark Thriftserver is failing for when submitting command from beeline

2021-08-20 Thread Pralabh Kumar
abase(Hive.java:1556) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1545) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:384) My guess is authorization through proxy is not working . Please help Regards Pralabh Kumar

Re: Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Pralabh Kumar
responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. >

Hive on Spark vs Spark on Hive(HiveContext)

2021-07-01 Thread Pralabh Kumar
please guide me which option to go for . I am personally inclined to go for option 2 . It also allows the use of the latest spark . Please help me on the same , as there are not much comparisons online available keeping Spark 3.0 in perspective. Regards Pralabh Kumar

Unable to pickle pySpark PipelineModel

2020-12-10 Thread Pralabh Kumar
Hi Dev , User I want to store spark ml model in databases , so that I can reuse them later on . I am unable to pickle them . However while using scala I am able to convert them into byte array stream . So for .eg I am able to do something below in scala but not in python val modelToByteArray

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-02 Thread Pralabh Kumar
ur query is select sum(x), a from t group by a, then try select > sum(partial), a from (select sum(x) as partial, a, b from t group by a, b) > group by a. > > rb > ​ > > On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com> > wrote: > >> Hi >>

org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Pralabh Kumar
Hi I am getting the above error in Spark SQL . I have increase (using 5000 ) number of partitions but still getting the same error . My data most probably is skew. org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 at

Re: Best way to Hive to Spark migration

2018-04-05 Thread Pralabh Kumar
expect from the migration. > > On 5. Apr 2018, at 05:43, Pralabh Kumar <pralabhku...@gmail.com> wrote: > > Hi Spark group > > What's the best way to Migrate Hive to Spark > > 1) Use HiveContext of Spark > 2) Use Hive on Spark (https://cwiki.apache.org/ > confluence/di

Best way to Hive to Spark migration

2018-04-04 Thread Pralabh Kumar
Hi Spark group What's the best way to Migrate Hive to Spark 1) Use HiveContext of Spark 2) Use Hive on Spark ( https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started ) 3) Migrate Hive to Calcite to Spark SQL Regards

Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-07 Thread Pralabh Kumar
Hi Spark 2.0 doesn't support stored by . Is there any alternative to achieve the same.

Re: Kryo serialization failed: Buffer overflow : Broadcast Join

2018-02-02 Thread Pralabh Kumar
I am using spark 2.1.0 On Fri, Feb 2, 2018 at 5:08 PM, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi > > I am performing broadcast join where my small table is 1 gb . I am > getting following error . > > I am using > > > org.apache.spark.SparkException: >

Kryo serialization failed: Buffer overflow : Broadcast Join

2018-02-02 Thread Pralabh Kumar
Hi I am performing broadcast join where my small table is 1 gb . I am getting following error . I am using org.apache.spark.SparkException: . Available: 0, required: 28869232. To avoid this, increase spark.kryoserializer.buffer.max value I increase the value to

Does Spark and Hive use Same SQL parser : ANTLR

2018-01-18 Thread Pralabh Kumar
Hi Does hive and spark uses same SQL parser provided by ANTLR . Did they generate the same logical plan . Please help on the same. Regards Pralabh Kumar

Spark build is failing in amplab Jenkins

2017-11-02 Thread Pralabh Kumar
JUnit test result report? failed: No test report files were found. Configuration error? Please help Regards Pralabh Kumar

[SPARK-20199][ML] : Provided featureSubsetStrategy to GBTClassifier and GBTRegressor

2017-09-11 Thread Pralabh Kumar
mpjlu <https://github.com/apache/spark/pull/18118/files/16ccbdfd8862c528c90fdde94c8ec20d6631126e> ? . Please review it . Regards Pralabh Kumar

Re: How to tune the performance of Tpch query5 within Spark

2017-07-17 Thread Pralabh Kumar
ool.awaitTermination(Long.MaxValue, TimeUnit.NANOSECONDS) for(finalData<-rddList){ finalData.show() } This will read data in parallel ,which I think is your main bottleneck. Regards Pralabh Kumar On Mon, Jul 17, 2017 at 6:25 PM, vaquar khan <vaquar.k...@gmail.com> wrote: > Coul

Re: Memory issue in pyspark for 1.6 mb file

2017-06-17 Thread Pralabh Kumar
Pralabh Kumar On Sun, Jun 18, 2017 at 12:06 AM, Naga Guduru <gudurun...@gmail.com> wrote: > Hi, > > I am trying to load 1.6 mb excel file which has 16 tabs. We converted > excel to csv and loaded 16 csv files to 8 tables. Job was running > successful in 1st run in pyspar

Re: featureSubsetStrategy parameter for GradientBoostedTreesModel

2017-06-15 Thread Pralabh Kumar
level. Jira SPARK-20199 <https://issues.apache.org/jira/browse/SPARK-20199> Please let me know , if my understanding is correct. Regards Pralabh Kumar On Fri, Jun 16, 2017 at 7:53 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi everyone > > Currently GBT doesn

featureSubsetStrategy parameter for GradientBoostedTreesModel

2017-06-15 Thread Pralabh Kumar
level. Jira SPARK-20199 <https://issues.apache.org/jira/browse/SPARK-20199> Please let me know , if my understanding is correct. Regards Pralabh Kumar