REGEX Spark - Dataframe

2021-06-26 Thread KhajaAsmath Mohammed
Hi,

What is the equivalent function using dataframe in spark. I was able to
make it work for spark sql but looking to use dataframes instead.

df11=self.spark.sql("""SELECT  transaction_card_bin,(CASE WHEN
transaction_card_bin  REGEXP '^5[1-5][\d]*' THEN "MC"
WHEN transaction_card_bin  REGEXP '^4[\d]*' THEN "VISA"
WHEN transaction_card_bin  REGEXP '^3[47][\d]*' THEN "AMEX"
WHEN transaction_card_bin  REGEXP
'^(6011|622(12[6-9]|1[3-9][0-9]|[2-8][0-9][0-9]|9[0-1][0-9]|92[0-5])|64[4-9]|65)[\d]*'
THEN "DISC"
ELSE "OTHER" END ) AS cardtype FROM  test12  """)


Thanks,

Asmath


Fwd: Fail to run benchmark in Github Action

2021-06-26 Thread Kevin Su
-- Forwarded message -
寄件者: Kevin Su 
Date: 2021年6月25日 週五 下午8:23
Subject: Fail to run benchmark in Github Action
To: 


Hi all,

I try to run a benchmark test in GitHub action in my fork, and I faced the
below error.
https://github.com/pingsutw/spark/runs/2867617238?check_suite_focus=true
java.lang.AssertionError: assertion failed: spark.test.home is not set!
23799

at scala.Predef$.assert(Predef.scala:223)
23800

at org.apache.spark.deploy.worker.Worker.(Worker.scala:148)
23801

at
org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)

23802

at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:68)

23803

at
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)

23804

at scala.collection.immutable.Range.foreach(Range.scala:158)

After I add the  "--driver-java-options
"-Dspark.test.home=$GITHUB_WORKSPACE" \" in benchmark.yml


I still got the below error.
https://github.com/pingsutw/spark/runs/2911027350?check_suite_focus=true
.
Do I need to set something up in my fork?
after 1900, vec on, rebase EXCEPTION 7474 7511 58 13.4 74.7 2.7X
4427
after
1900, vec on, rebase LEGACY 9228 9296 60 10.8 92.3 2.2X
4428
after
1900, vec on, rebase CORRECTED 7553 7678 128 13.2 75.5 2.7X
4429
before
1900, vec off, rebase LEGACY 23280 23362 71 4.3 232.8 0.9X
4430
before
1900, vec off, rebase CORRECTED 20548 20630 119 4.9 205.5 1.0X
4431
before
1900, vec on, rebase LEGACY 12210 12239 37 8.2 122.1 1.7X
4432
before
1900, vec on, rebase CORRECTED 7486 7489 2 13.4 74.9 2.7X
4433

4434
Running
benchmark: Save TIMESTAMP_MICROS to parquet
4435

Running case: after 1900, noop
4436

Stopped after 1 iterations, 4003 ms
4437

Running case: before 1900, noop
4438

Stopped after 1 iterations, 3965 ms
4439

Running case: after 1900, rebase EXCEPTION
4440

Stopped after 1 iterations, 18339 ms
4441

Running case: after 1900, rebase LEGACY
4442

Stopped after 1 iterations, 18375 ms
4443

Running case: after 1900, rebase CORRECTED


Stopped after 1 iterations, 18716 ms
4445

Running case: before 1900, rebase LEGACY
4446
Error:
The operation was canceled.