Can not start thrift-server on spark2.4

2019-02-04 Thread Moein Hosseini
I like to start spark thrift server on cluster of 3 machine with HDFS and
standalone HA spark (v2.4).
So I started it with following command under user spark24 but get runtime
exception about hdfs permissions.
*Command:*
*./start-thriftserver.sh --master spark://master:7077*
* exception:*
*Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
are: rwxr-xr-x  *

But I think the hdfs permission is fine because I chmod everything under
*/tmp/hive* to *777* and set rwx access to spark24 and hive users and all
groups.

*$ hdfs dfs -ls -d /tmp/hive*
*drwxrwxrwx+  - hive hdfs  0 2019-02-03 14:13 /tmp/hive*

*$ hdfs dfs -getfacl /tmp/hive*
*# file: /tmp/hive*
*# owner: hive*
*# group: hdfs*
*user::rwx*
*user:hive:rwx*
*user:spark24:rwx*
*group::rwx*
*group:hive:rwx*
*group:spark24:rwx*
*mask::rwx*
*other::rwx*

What is wrong with in my case?
-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Re: How to sleep Spark job

2019-01-22 Thread Moein Hosseini
In this manner, your application should create distinct jobs each time. So
for the first time you driver create DAG and do it with help of executors,
then finish the job and goes to sleep( Driver/Application ). When it wakes
up, it will create new Job and DAG and ...
Some how same as create cron-job to submit your single application to
cluster every time.

On Wed, Jan 23, 2019 at 10:04 AM Kevin Mellott 
wrote:

> I’d recommend using a scheduler of some kind to trigger your job each
> hour, and have the Spark job exit when it completes. Spark is not meant to
> run in any type of “sleep mode”, unless you want to run a structured
> streaming job and create a separate process to pull data from Casandra and
> publish it to your streaming endpoint. That decision really depends more on
> your use case.
>
> On Tue, Jan 22, 2019 at 11:56 PM Soheil Pourbafrani 
> wrote:
>
>> Hi,
>>
>> I want to submit a job in YARN cluster to read data from Cassandra and
>> write them in HDFS, every hour, for example.
>>
>> Is it possible to make Spark Application sleep in a while true loop and
>> awake every hour to process data?
>>
>

-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


Re: How to sleep Spark job

2019-01-22 Thread Moein Hosseini
Hi Soheil,

Yes, It's possible to force your application to sleep after Job
do {
   // Your spark job goes here
   Thread.sleep(360);
} while(true);

But maybe AirFlow is better option if you need scheduler on your Spark Job.


On Wed, Jan 23, 2019 at 9:26 AM Soheil Pourbafrani 
wrote:

> Hi,
>
> I want to submit a job in YARN cluster to read data from Cassandra and
> write them in HDFS, every hour, for example.
>
> Is it possible to make Spark Application sleep in a while true loop and
> awake every hour to process data?
>


-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>


userClassPath first fails

2019-01-21 Thread Moein Hosseini
Hi everyone,

I've a cluster of Standalone spark 2.4.0 (without-hadoop version) which
both of *spark.executor.userClassPathFirst* and
*spark.driver.userClassPathFirst* set true.
This cluster run on HDP (v3.1.0) and set SPARK_DIST_CLASSPATH to $(hadoop
classpath),
My application fails to run because of slf4j which is passed to driver and
executor by me


*How I submit my job:*
*./spark-submit \*
*--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \*
*--master $SPARK_MASTER \*
*--class $MAIN_CLASS \*
*--driver-class-path $FAT_JAR \*
*$FAT_JAR*

*Exception:*
*Exception in thread "main" java.lang.LinkageError: loader constraint
violation: when resolving method
"org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()Lorg/slf4j/ILoggerFactory;"
the class loader (instance of
org/apache/spark/util/ChildFirstURLClassLoader) of the current class,
org/slf4j/LoggerFactory, and the class loader (instance of
sun/misc/Launcher$AppClassLoader) for the method's defining class,
org/slf4j/impl/StaticLoggerBinder, have different Class objects for the
type org/slf4j/ILoggerFactory used in the signature*
*at
org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:418)*
*at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)*
*at com.tap30.combine.Launcher$.(Launcher.scala:17)*
*at com.tap30.combine.Launcher$.(Launcher.scala)*
*at com.tap30.combine.Launcher.main(Launcher.scala)*
*at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
*at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
*at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
*at java.lang.reflect.Method.invoke(Method.java:498)*
*at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)*
*at org.apache.spark.deploy.SparkSubmit.org
<http://org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)*
*at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)*
*at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)*
*at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)*
*at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)*
*at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)*
*at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*

I've used *jinfo* to exctract loaded slf4j jars on both executor and driver
which are:


*/opt/spark-2.4.0-bin-without-hadoop/jars/jcl-over-slf4j-1.7.16.jar*
*/opt/spark-2.4.0-bin-without-hadoop/jars/jul-to-slf4j-1.7.16.jar*

*/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar*
*/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-api-1.7.25.jar*

*/usr/hdp/3.1.0.0-78/hadoop/lib/jul-to-slf4j-1.7.25.jar*
*/usr/hdp/3.1.0.0-78/tez/lib/slf4j-api-1.7.10.jar*

But spark-kafka-sql depends on kafka-client v2.0.0 which uses slf4j v1.7.25
and make things go wrong. How to come over this issue?
-- 

Moein Hosseini
Data Engineer
mobile: +98 912 468 1859 <+98+912+468+1859>
site: www.moein.xyz
email: moein...@gmail.com
[image: linkedin] <https://www.linkedin.com/in/moeinhm>
[image: twitter] <https://twitter.com/moein7tl>