date:20190605

Re: Spark MySQL Invalid DateTime value killing job

2019-06-05 Thread Anthony May

Murphy's Law striking after asking the question, I just discovered the
solution:
The jdbc url should set the zeroDateTimeBehavior option.
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html
https://stackoverflow.com/questions/11133759/-00-00-00-can-not-be-represented-as-java-sql-timestamp-error

On Wed, Jun 5, 2019 at 6:29 PM Anthony May  wrote:

> Hi,
>
> We have a legacy process of scraping a MySQL Database. The Spark job uses
> the DataFrame API and MySQL JDBC driver to read the tables and save them as
> JSON files. One table has DateTime columns that contain values invalid for
> java.sql.Timestamp so it's throwing the exception:
> java.sql.SQLException: Value '-00-00 00:00:00' can not be represented
> as java.sql.Timestamp
>
> Unfortunately, I can't edit the values in the table to make them valid.
> There doesn't seem to be a way to specify row level exception handling in
> the DataFrame API. Is there a way to handle this that would scale for
> hundreds of tables?
>
> Any help is appreciated.
>
> Anthony
>

Spark MySQL Invalid DateTime value killing job

2019-06-05 Thread Anthony May

Hi,

We have a legacy process of scraping a MySQL Database. The Spark job uses
the DataFrame API and MySQL JDBC driver to read the tables and save them as
JSON files. One table has DateTime columns that contain values invalid for
java.sql.Timestamp so it's throwing the exception:
java.sql.SQLException: Value '-00-00 00:00:00' can not be represented
as java.sql.Timestamp

Unfortunately, I can't edit the values in the table to make them valid.
There doesn't seem to be a way to specify row level exception handling in
the DataFrame API. Is there a way to handle this that would scale for
hundreds of tables?

Any help is appreciated.

Anthony

Blog post: DataFrame.transform -- Spark function composition

2019-06-05 Thread Daniel Mateus Pires

Hi everyone!

I just published this blog post on how Spark Scala custom transformations
can be re-arranged to better be composed and used within .transform:


https://medium.com/@dmateusp/dataframe-transform-spark-function-composition-eb8ec296c108

I found the discussions in this group to be largely around issues /
performance, this post only concerns itself with code readability,
hopefully not off-topic!

I welcome any feedback I can get :)

Daniel

Spark on K8S - --packages not working for cluster mode?

2019-06-05 Thread pacuna

I'm trying to run a sample code that reads a file from s3 so I need the aws
sdk and aws hadoop dependencies.
If I assemble these deps into the main jar everything works fine. But when I
try using --packages, the deps are not seen by the pods.

This is my submit command:

spark-submit 
--master k8s://https://xx.xx.xx.xx
--class "SimpleApp"
--deploy-mode cluster 
--conf spark.kubernetes.container.image=docker.io/pacuna/spark:0.2 
--conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-test-user 
--packages
com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
--conf spark.hadoop.fs.s3a.access.key=... 
--conf spark.hadoop.fs.s3a.secret.key=...  
https://x/simple-project_2.11-1.0.jar

And the error I'm getting in the driver pod is:

19/06/05 20:13:50 ERROR SparkContext: Failed to add
file:///home/dev/.ivy2/jars/com.fasterxml.jackson.core_jackson-core-2.2.3.jar
to Spark environment
java.io.FileNotFoundException: Jar
/home/dev/.ivy2/jars/com.fasterxml.jackson.core_jackson-core-2.2.3.jar not
found

I'm getting that error for all the deps jars needed.

Any ideas?

Thanks.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark structured streaming leftOuter join not working as I expect

2019-06-05 Thread Jungtaek Lim

Nice to hear you're investigating the issue deeply.

Btw, if attaching code is not easy, maybe you could share logical/physical
plan on any batch: "detail" in SQL tab would show up the plan as string.
Plans from sequential batches would be much helpful - and streaming query
status in these batch (especially watermark) should be helpful too.


On Wed, Jun 5, 2019 at 11:57 PM Joe Ammann  wrote:

> Hi Jungtaek
>
> Thanks for your response!
>
> I actually have set watermarks on all the streams A/B/C with the
> respective event time
> column A/B/C_LAST_MOD. So I think this should not be the reason.
>
> Of course, the event time on the C stream (the "optional one") progresses
> much slower
> than on the other 2. I try to adjust for this by setting
>
>spark.sql.streaming.multipleWatermarkPolicy=max
>
> and judging from the microbatch results, this also works. The global
> watermark seems
> to progress as expected with the event time from A/B stream.
>
> I will try to put together an isolated test case to reproduce the issue,
> that whole code
> is embedded in a larger app and hence not easily to rip out.
>
> I did some more testing, and for now these are my observations
>  - inner join followed by aggregation works as expected
>  - inner join with 1 left outer (and no aggregation) works as expected
>  - inner join with 2 left outer only produces results where both outer
> have a match
>  - inner join with 1 left outer followed by aggregation only produces the
> messages with a match
>
> Of course, all are stream-stream joins
>
> CU, Joe
>
> On Wednesday, June 5, 2019 09:17 CEST, Jungtaek Lim 
> wrote:
> > I would suspect that rows are never evicted in state in second join. To
> > determine whether the row is NOT matched to other side, Spark should
> check
> > whether the row is ever matched before evicted. You need to set watermark
> > either B_LAST_MOD or C_LAST_MOD.
> >
> > If you already did but not exposed to here, please paste all codes
> > (assuming you've already redacted) to gist or attach zipped file for
> > project.
> >
> > Btw, there's known "correctness" issue on streaming-streaming left/right
> > outer join. Please refer SPARK-26154 [1] for details. That's not a same
> > case, but should be good to know once you're dealing with
> > streaming-streaming join.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 1. https://issues.apache.org/jira/browse/SPARK-26154
> >
> > On Tue, Jun 4, 2019 at 9:31 PM Joe Ammann  wrote:
> >
> > > Hi all
> > >
> > > sorry, tl;dr
> > >
> > > I'm on my first Python Spark structured streaming app, in the end
> joining
> > > messages from ~10 different Kafka topics. I've recently upgraded to
> Spark
> > > 2.4.3, which has resolved all my issues with the time handling
> (watermarks,
> > > join windows) I had before with Spark 2.3.2.
> > >
> > > My current problem happens during a leftOuter join, where messages
> from 3
> > > topics are joined, the results are then aggregated with a groupBy and
> > > finally put onto a result Kafka topic. On the 3 input topics involved,
> all
> > > messages have ID and LAST_MOD fields. I use the ID for joining, and the
> > > LAST_MOD as event timestamp on all incoming streams. Since the fields
> on
> > > the incoming messages are all named the same (ID and LAST_MOD), I
> rename
> > > them on all incoming streams with
> > >
> > >  aDf = aStream.selectExpr("*", "ID as A_ID", "LAST_MOD as
> > > A_LAST_MOD").drop(*["ID", "LAST_MOD"])
> > >
> > > For those data frames, I then take the watermark with the
> A/B/C_LAST_MOD
> > > as event time, before joining. I know that the LAST_MOD timestamps are
> > > equal on the messages that I want to join together.
> > >
> > > The first join is an inner join, where a field on stream A links with
> the
> > > ID of stream B. So I have
> > >
> > >  aDf
> > > .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in
> stream A
> > > .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> > > .agg(
> > > collect_list(struct("*")).alias("RESULTS"),
> > > count("A_ID").alias("NUM_RESULTS"),
> > > # just add a timestamp to watermark on, they are all the
> > > min("A_LAST_MOD").alias("RESULT_LAST_MOD")
> > > )
> > > .withWatermark("RESULT_LAST_MOD", "30 seconds")
> > > )
> > >
> > > This works perfectly and generates (on my current data set) some 10'000
> > > records. This is the expected result.
> > >
> > > When I add the leftOuter join of the third topic as follows
> > >
> > >  aDf
> > > .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in
> stream A
> > > # here the additional left join
> > > -join(cDF, expr("C_FK = C_ID and B_LAST_MOD = C_LAST_MOD",
> > > "leftOuter)) # C_FK is the field in stream B
> > > .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> > > .agg(
> > > collect_list(struct("*")).alias("RESULTS"),
> > >

Re: Spark structured streaming leftOuter join not working as I expect

2019-06-05 Thread Joe Ammann

Hi Jungtaek

Thanks for your response!

I actually have set watermarks on all the streams A/B/C with the respective 
event time
column A/B/C_LAST_MOD. So I think this should not be the reason.

Of course, the event time on the C stream (the "optional one") progresses much 
slower
than on the other 2. I try to adjust for this by setting 

   spark.sql.streaming.multipleWatermarkPolicy=max

and judging from the microbatch results, this also works. The global watermark 
seems
to progress as expected with the event time from A/B stream.

I will try to put together an isolated test case to reproduce the issue, that 
whole code
is embedded in a larger app and hence not easily to rip out.

I did some more testing, and for now these are my observations
 - inner join followed by aggregation works as expected
 - inner join with 1 left outer (and no aggregation) works as expected
 - inner join with 2 left outer only produces results where both outer have a 
match
 - inner join with 1 left outer followed by aggregation only produces the 
messages with a match 

Of course, all are stream-stream joins

CU, Joe
 
On Wednesday, June 5, 2019 09:17 CEST, Jungtaek Lim  wrote: 
> I would suspect that rows are never evicted in state in second join. To
> determine whether the row is NOT matched to other side, Spark should check
> whether the row is ever matched before evicted. You need to set watermark
> either B_LAST_MOD or C_LAST_MOD.
> 
> If you already did but not exposed to here, please paste all codes
> (assuming you've already redacted) to gist or attach zipped file for
> project.
> 
> Btw, there's known "correctness" issue on streaming-streaming left/right
> outer join. Please refer SPARK-26154 [1] for details. That's not a same
> case, but should be good to know once you're dealing with
> streaming-streaming join.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 1. https://issues.apache.org/jira/browse/SPARK-26154
> 
> On Tue, Jun 4, 2019 at 9:31 PM Joe Ammann  wrote:
> 
> > Hi all
> >
> > sorry, tl;dr
> >
> > I'm on my first Python Spark structured streaming app, in the end joining
> > messages from ~10 different Kafka topics. I've recently upgraded to Spark
> > 2.4.3, which has resolved all my issues with the time handling (watermarks,
> > join windows) I had before with Spark 2.3.2.
> >
> > My current problem happens during a leftOuter join, where messages from 3
> > topics are joined, the results are then aggregated with a groupBy and
> > finally put onto a result Kafka topic. On the 3 input topics involved, all
> > messages have ID and LAST_MOD fields. I use the ID for joining, and the
> > LAST_MOD as event timestamp on all incoming streams. Since the fields on
> > the incoming messages are all named the same (ID and LAST_MOD), I rename
> > them on all incoming streams with
> >
> >  aDf = aStream.selectExpr("*", "ID as A_ID", "LAST_MOD as
> > A_LAST_MOD").drop(*["ID", "LAST_MOD"])
> >
> > For those data frames, I then take the watermark with the A/B/C_LAST_MOD
> > as event time, before joining. I know that the LAST_MOD timestamps are
> > equal on the messages that I want to join together.
> >
> > The first join is an inner join, where a field on stream A links with the
> > ID of stream B. So I have
> >
> >  aDf
> > .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in stream A
> > .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> > .agg(
> > collect_list(struct("*")).alias("RESULTS"),
> > count("A_ID").alias("NUM_RESULTS"),
> > # just add a timestamp to watermark on, they are all the
> > min("A_LAST_MOD").alias("RESULT_LAST_MOD")
> > )
> > .withWatermark("RESULT_LAST_MOD", "30 seconds")
> > )
> >
> > This works perfectly and generates (on my current data set) some 10'000
> > records. This is the expected result.
> >
> > When I add the leftOuter join of the third topic as follows
> >
> >  aDf
> > .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in stream A
> > # here the additional left join
> > -join(cDF, expr("C_FK = C_ID and B_LAST_MOD = C_LAST_MOD",
> > "leftOuter)) # C_FK is the field in stream B
> > .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> > .agg(
> > collect_list(struct("*")).alias("RESULTS"),
> > count("A_ID").alias("NUM_RESULTS"),
> > # just add a timestamp to watermark on, they are all the
> > min("A_LAST_MOD").alias("RESULT_LAST_MOD")
> > )
> > .withWatermark("RESULT_LAST_MOD", "30 seconds")
> > )
> >
> > then what I would expect is that I get the same number of output records
> > (~10'000), and some of them have the additional fields from the C stream.
> >
> > But what happens is that my output is reduced to ~1'500 records, exactly
> > those which have a successful join on records on topic C. The other are not
> > shown on the output.
> >
> > I

[Pyspark 2.4] Best way to define activity within different time window

2019-06-05 Thread Rishi Shah

Hi All,

Is there a best practice around calculating daily, weekly, monthly,
quarterly, yearly active users?

One approach is to create a window of daily bitmap and aggregate it based
on period later. However I was wondering if anyone has a better approach to
tackling this problem..

-- 
Regards,

Rishi Shah

Re: installation of spark

2019-06-05 Thread Alonso Isidoro Roman

When using osx, it is recommended to install java, scala and spark using
brew.

Run these commands on a terminal:

brew update

brew install scala

brew install sbt

brew cask install java

brew install spark


There is no need to install HDFS, you  can use your local file system
without a problem.


*How to set JAVA_HOME on Mac OS X **temporary *

   1. Open *Terminal*.
   2. Confirm you have JDK by typing “which java”. ...
   3. Check you have the needed version of Java, by typing “java -version”.
   4. *Set JAVA_HOME* using this command in *Terminal*: *export JAVA_HOME*
   =/Library/Java/Home.
   5. echo $*JAVA_HOME* on *Terminal* to confirm the path.
   6. You should now be able to run your application.


*How to set JAVA_HOME on Mac OS X permanently*

$ vim .bash_profile

$ export JAVA_HOME=$(/usr/libexec/java_home)

$ source .bash_profile

$ echo $JAVA_HOME


Have fun!

Alonso


El mié., 5 jun. 2019 a las 6:10, Jack Kolokasis ()
escribió:

> Hello,
>
> at first you will need to make sure that JAVA is installed, or install
> it otherwise. Then install scala and a build tool (sbt or maven). In my
> point of view, IntelliJ IDEA is a good option to create your Spark
> applications.  At the end you have to install a distributed file system e.g
> HDFS.
>
> I think there is no an all-in-one configuration. But there are
> examples about how to configure you Spark cluster (e.g
> https://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-standalone-example-2-workers-on-1-node-cluster.adoc
> ).
> Best,
> --Iacovos
> On 5/6/19 5:50 π.μ., ya wrote:
>
> Dear list,
>
> I am very new to spark, and I am having trouble installing it on my mac. I
> have following questions, please give me some guidance. Thank you very much.
>
> 1. How many and what software should I install before installing spark? I
> have been searching online, people discussing their experiences on this
> topic with different opinions, some says there is no need to install hadoop
> before install spark, some says hadoop has to be installed before spark.
> Some other people say scala has to be installed, whereas others say scala
> is included in spark, and it is installed automatically once spark in
> installed. So I am confused what to install for a start.
>
> 2.  Is there an simple way to configure these software? for instance, an
> all-in-one configuration file? It takes forever for me to configure things
> before I can really use it for data analysis.
>
> I hope my questions make sense. Thank you very much.
>
> Best regards,
>
> YA
>
>

-- 
Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman

spark ./build/mvn test failed on aarch64

2019-06-05 Thread Tianhua huang

Hi all,
Recently I run './build/mvn test' of spark on aarch64, and master and
branch-2.4 are all failled, the log pieces as below:

..

[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.081 s - in org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Running org.apache.spark.util.kvstore.InMemoryStoreSuite
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.001 s - in org.apache.spark.util.kvstore.InMemoryStoreSuite
[INFO] Running org.apache.spark.util.kvstore.InMemoryIteratorSuite
[INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.219 s - in org.apache.spark.util.kvstore.InMemoryIteratorSuite
[INFO] Running org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] Tests run: 38, Failures: 0, Errors: 38, Skipped: 0, Time elapsed:
0.23 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] 
copyIndexDescendingWithStart(org.apache.spark.util.kvstore.LevelDBIteratorSuite)
Time elapsed: 0.2 s <<< ERROR!
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no
leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in
java.library.path, no leveldbjni in java.library.path,
/usr/local/src/spark/common/kvstore/target/tmp/libleveldbjni-64-1-610267671268036503.8:
/usr/local/src/spark/common/kvstore/target/tmp/libleveldbjni-64-1-610267671268036503.8:
cannot open shared object file: No such file or directory (Possible cause:
can't load AMD 64-bit .so on a AARCH64-bit platform)]
at
org.apache.spark.util.kvstore.LevelDBIteratorSuite.createStore(LevelDBIteratorSuite.java:44)

..

There is a dependency of  leveldbjni-all  , but there is no the native
package for aarch64 i in leveldbjni-1.8(all) .jar, I found aarch64 is
supported after pr https://github.com/fusesource/leveldbjni/pull/82, but it
was not in the 1.8 release, and unfortunately the repo didn't updated
almost for

two years.

So I have a question: does spark support aarch64, and if it is yes, then
how to fix this problem, if it is not, what's

the plan for it? Thank you all!

Re: Spark structured streaming leftOuter join not working as I expect

2019-06-05 Thread Jungtaek Lim

I would suspect that rows are never evicted in state in second join. To
determine whether the row is NOT matched to other side, Spark should check
whether the row is ever matched before evicted. You need to set watermark
either B_LAST_MOD or C_LAST_MOD.

If you already did but not exposed to here, please paste all codes
(assuming you've already redacted) to gist or attach zipped file for
project.

Btw, there's known "correctness" issue on streaming-streaming left/right
outer join. Please refer SPARK-26154 [1] for details. That's not a same
case, but should be good to know once you're dealing with
streaming-streaming join.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-26154

On Tue, Jun 4, 2019 at 9:31 PM Joe Ammann  wrote:

> Hi all
>
> sorry, tl;dr
>
> I'm on my first Python Spark structured streaming app, in the end joining
> messages from ~10 different Kafka topics. I've recently upgraded to Spark
> 2.4.3, which has resolved all my issues with the time handling (watermarks,
> join windows) I had before with Spark 2.3.2.
>
> My current problem happens during a leftOuter join, where messages from 3
> topics are joined, the results are then aggregated with a groupBy and
> finally put onto a result Kafka topic. On the 3 input topics involved, all
> messages have ID and LAST_MOD fields. I use the ID for joining, and the
> LAST_MOD as event timestamp on all incoming streams. Since the fields on
> the incoming messages are all named the same (ID and LAST_MOD), I rename
> them on all incoming streams with
>
>  aDf = aStream.selectExpr("*", "ID as A_ID", "LAST_MOD as
> A_LAST_MOD").drop(*["ID", "LAST_MOD"])
>
> For those data frames, I then take the watermark with the A/B/C_LAST_MOD
> as event time, before joining. I know that the LAST_MOD timestamps are
> equal on the messages that I want to join together.
>
> The first join is an inner join, where a field on stream A links with the
> ID of stream B. So I have
>
>  aDf
> .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in stream A
> .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> .agg(
> collect_list(struct("*")).alias("RESULTS"),
> count("A_ID").alias("NUM_RESULTS"),
> # just add a timestamp to watermark on, they are all the
> min("A_LAST_MOD").alias("RESULT_LAST_MOD")
> )
> .withWatermark("RESULT_LAST_MOD", "30 seconds")
> )
>
> This works perfectly and generates (on my current data set) some 10'000
> records. This is the expected result.
>
> When I add the leftOuter join of the third topic as follows
>
>  aDf
> .join(bDf, expr("B_FK = B_ID"))   # B_FK is the field in stream A
> # here the additional left join
> -join(cDF, expr("C_FK = C_ID and B_LAST_MOD = C_LAST_MOD",
> "leftOuter)) # C_FK is the field in stream B
> .groupBy("SOME_FIELD", window("A_LAST_MOD", "10 seconds"))
> .agg(
> collect_list(struct("*")).alias("RESULTS"),
> count("A_ID").alias("NUM_RESULTS"),
> # just add a timestamp to watermark on, they are all the
> min("A_LAST_MOD").alias("RESULT_LAST_MOD")
> )
> .withWatermark("RESULT_LAST_MOD", "30 seconds")
> )
>
> then what I would expect is that I get the same number of output records
> (~10'000), and some of them have the additional fields from the C stream.
>
> But what happens is that my output is reduced to ~1'500 records, exactly
> those which have a successful join on records on topic C. The other are not
> shown on the output.
>
> I already tried
>
>* make sure that the optional FK on topic B is never null, by using an
> NVL2(C_FK, C_FK, '')
>* widen the time window join on the leftOuter to "B_LAST_MOD <
> C_LAST_LAST_MOD - interval 5 seconds ..."
>* use various combinations of joinWindows and watermarkLateThreshold
>
> The result is always the same: I'm "losing" the ~8'500 records for which
> the optional join FK is NULL on topic B.
>
> Did I totally misunderstand the concept of stream-stream left outer join?
> Or what could be wrong
>
> --
> CU, Joe
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

spark ./build/mvn test failed on aarch64

2019-06-05 Thread Tianhua huang

Hi all,
Recently I run './build/mvn test' of spark on aarch64, and master and
branch-2.4 are all failled, the log pieces as below:

..

[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.081 s - in org.apache.spark.util.kvstore.LevelDBTypeInfoSuite
[INFO] Running org.apache.spark.util.kvstore.InMemoryStoreSuite
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.001 s - in org.apache.spark.util.kvstore.InMemoryStoreSuite
[INFO] Running org.apache.spark.util.kvstore.InMemoryIteratorSuite
[INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
0.219 s - in org.apache.spark.util.kvstore.InMemoryIteratorSuite
[INFO] Running org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] Tests run: 38, Failures: 0, Errors: 38, Skipped: 0, Time elapsed:
0.23 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBIteratorSuite
[ERROR] 
copyIndexDescendingWithStart(org.apache.spark.util.kvstore.LevelDBIteratorSuite)
Time elapsed: 0.2 s <<< ERROR!
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no
leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in
java.library.path, no leveldbjni in java.library.path,
/usr/local/src/spark/common/kvstore/target/tmp/libleveldbjni-64-1-610267671268036503.8:
/usr/local/src/spark/common/kvstore/target/tmp/libleveldbjni-64-1-610267671268036503.8:
cannot open shared object file: No such file or directory (Possible cause:
can't load AMD 64-bit .so on a AARCH64-bit platform)]
at
org.apache.spark.util.kvstore.LevelDBIteratorSuite.createStore(LevelDBIteratorSuite.java:44)

..

There is a dependency of LEVELDBJNI:

org.fusesource.leveldbjni
leveldbjni-all
1.8

Re: Spark MySQL Invalid DateTime value killing job

Spark MySQL Invalid DateTime value killing job

Blog post: DataFrame.transform -- Spark function composition

Spark on K8S - --packages not working for cluster mode?

Re: Spark structured streaming leftOuter join not working as I expect

Re: Spark structured streaming leftOuter join not working as I expect

[Pyspark 2.4] Best way to define activity within different time window

Re: installation of spark

spark ./build/mvn test failed on aarch64

Re: Spark structured streaming leftOuter join not working as I expect

spark ./build/mvn test failed on aarch64

11 matches

Site Navigation

Mail list logo

Footer information