Adding OpenSearch as a secondary index provider to SparkSQL

2023-03-24 Thread Anirudha Jadhav
Hello community, wanted your opinion on this implementation demo.

/ support for Materialized views, skipping indices and covered indices with
bloom filter optimizations with opensearch via SparkSQL

https://github.com/opensearch-project/sql/discussions/1465
( see video with voice over )

Ani
-- 
Anirudha P. Jadhav


Re: spark worker on mesos slave | possible networking config issue

2015-03-25 Thread Anirudha Jadhav
is there a way to have this dynamically pick the local IP.

static assignment does not work cos the workers  are dynamically allocated
on mesos

On Wed, Mar 25, 2015 at 3:04 AM, Akhil Das 
wrote:

> It says:
> ried to associate with unreachable remote address 
> [akka.tcp://sparkDriver@localhost:51849].
> Address is now gated for 5000 ms, all messages to this address will be
> delivered to dead letters. Reason: Connection refused: localhost/
> 127.0.0.1:51849
>
> I'd suggest you changing this property:
> export SPARK_LOCAL_IP=127.0.0.1
>
> Point it to your network address like 192.168.1.10
>
> Thanks
> Best Regards
>
> On Tue, Mar 24, 2015 at 11:18 PM, Anirudha Jadhav 
> wrote:
>
>> is there some setting i am missing:
>> this is my spark-env.sh>>>
>>
>> export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
>> export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz
>> export SPARK_LOCAL_IP=127.0.0.1
>>
>>
>>
>> here is what i see on the slave node.
>> 
>> less
>> 20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/stderr
>> >>>>>
>>
>> WARNING: Logging before InitGoogleLogging() is written to STDERR
>> I0324 02:30:29.389225 27755 fetcher.cpp:76] Fetching URI '
>> http://100.125.5.93/sparkx.tgz'
>> I0324 02:30:29.389361 27755 fetcher.cpp:126] Downloading '
>> http://100.125.5.93/sparkx.tgz' to
>> '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
>> I0324 02:30:35.353446 27755 fetcher.cpp:64] Extracted resource
>> '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
>> into
>> '/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56'
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered signal handlers
>> for [TERM, HUP, INT]
>> I0324 02:30:37.071077 27863 exec.cpp:132] Version: 0.21.1
>> I0324 02:30:37.080971 27885 exec.cpp:206] Executor registered on slave
>> 20150226-160708-78932-5050-8971-S0
>> 15/03/24 02:30:37 INFO MesosExecutorBackend: Registered with Mesos as
>> executor ID 20150226-160708-78932-5050-8971-S0 with 1 cpus
>> 15/03/24 02:30:37 INFO SecurityManager: Changing view acls to: ubuntu
>> 15/03/24 02:30:37 INFO SecurityManager: Changing modify acls to: ubuntu
>> 15/03/24 02:30:37 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
>> with modify permissions: Set(ubuntu)
>> 15/03/24 02:30:37 INFO Slf4jLogger: Slf4jLogger started
>> 15/03/24 02:30:37 INFO Remoting: Starting remoting
>> 15/03/24 02:30:38 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkExecutor@mesos-si2:50542]
>> 15/03/24 02:30:38 INFO Utils: Successfully started service
>> 'sparkExecutor' on port 50542.
>> 15/03/24 02:30:38 INFO AkkaUtils: Connecting to MapOutputTracker:
>> akka.tcp://sparkDriver@localhost:51849/user/MapOutputTracker
>> 15/03/24 02:30:38 WARN Remoting: Tried to associate with unreachable
>> remote address [akka.tcp://sparkDriver@localhost:51849]. Address is now
>> gated for 5000 ms, all messages to this address will be delivered to dead
>> letters. Reason: Connection refused: localhost/127.0.0.1:51849
>> akka.actor.ActorNotFound: Actor not found for:
>> ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/),
>> Path(/user/MapOutputTracker)]
>> at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>> at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>> at
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>> at
>> akka.dispatch.BatchingExecutor$Batch$$an

spark worker on mesos slave | possible networking config issue

2015-03-24 Thread Anirudha Jadhav
is there some setting i am missing:
this is my spark-env.sh>>>

export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=http://100.125.5.93/sparkx.tgz
export SPARK_LOCAL_IP=127.0.0.1



here is what i see on the slave node.

less
20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/stderr
>

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0324 02:30:29.389225 27755 fetcher.cpp:76] Fetching URI '
http://100.125.5.93/sparkx.tgz'
I0324 02:30:29.389361 27755 fetcher.cpp:126] Downloading '
http://100.125.5.93/sparkx.tgz' to
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
I0324 02:30:35.353446 27755 fetcher.cpp:64] Extracted resource
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56/sparkx.tgz'
into
'/tmp/mesos/slaves/20150226-160708-78932-5050-8971-S0/frameworks/20150323-205508-78932-5050-29804-0012/executors/20150226-160708-78932-5050-8971-S0/runs/cceea834-c4d9-49d6-a579-8352f1889b56'
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/24 02:30:37 INFO MesosExecutorBackend: Registered signal handlers for
[TERM, HUP, INT]
I0324 02:30:37.071077 27863 exec.cpp:132] Version: 0.21.1
I0324 02:30:37.080971 27885 exec.cpp:206] Executor registered on slave
20150226-160708-78932-5050-8971-S0
15/03/24 02:30:37 INFO MesosExecutorBackend: Registered with Mesos as
executor ID 20150226-160708-78932-5050-8971-S0 with 1 cpus
15/03/24 02:30:37 INFO SecurityManager: Changing view acls to: ubuntu
15/03/24 02:30:37 INFO SecurityManager: Changing modify acls to: ubuntu
15/03/24 02:30:37 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(ubuntu); users
with modify permissions: Set(ubuntu)
15/03/24 02:30:37 INFO Slf4jLogger: Slf4jLogger started
15/03/24 02:30:37 INFO Remoting: Starting remoting
15/03/24 02:30:38 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@mesos-si2:50542]
15/03/24 02:30:38 INFO Utils: Successfully started service 'sparkExecutor'
on port 50542.
15/03/24 02:30:38 INFO AkkaUtils: Connecting to MapOutputTracker:
akka.tcp://sparkDriver@localhost:51849/user/MapOutputTracker
15/03/24 02:30:38 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://sparkDriver@localhost:51849]. Address is now gated for
5000 ms, all messages to this address will be delivered to dead letters.
Reason: Connection refused: localhost/127.0.0.1:51849
akka.actor.ActorNotFound: Actor not found for:
ActorSelection[Anchor(akka.tcp://sparkDriver@localhost:51849/),
Path(/user/MapOutputTracker)]
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
at
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
at
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)


Re: newbie quesiton - spark with mesos

2015-03-23 Thread Anirudha Jadhav
ider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)





On Mar 23, 2015, at 3:02 PM, Dean Wampler  wrote:

That's a very old page, try this instead:

http://spark.apache.org/docs/latest/running-on-mesos.html

When you run your Spark job on Mesos, tasks will be started on the slave
nodes as needed, since "fine-grained" mode is the default.

For a job like your example, very few tasks will be needed. Actually only
one would be enough, but the default number of partitions will be used. I
believe 8 is the default for Mesos. For local mode ("local[*]"), it's the
number of cores. You can also set the propoerty "spark.default.parallelism".

HTH,

Dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Mar 23, 2015 at 11:46 AM, Anirudha Jadhav  wrote:

> i have a mesos cluster, which i deploy spark to by using instructions on
> http://spark.apache.org/docs/0.7.2/running-on-mesos.html
>
> after that the spark shell starts up fine.
> then i try the following on the shell:
>
> val data = 1 to 1
>
> val distData = sc.parallelize(data)
>
> distData.filter(_< 10).collect()
>
> open spark web ui at host:4040 and see an active job.
>
> NOW, how do i start workers or spark workers on mesos ? who completes my
> job?
> thanks,
>
> --
> Ani
>


newbie quesiton - spark with mesos

2015-03-23 Thread Anirudha Jadhav
i have a mesos cluster, which i deploy spark to by using instructions on
http://spark.apache.org/docs/0.7.2/running-on-mesos.html

after that the spark shell starts up fine.
then i try the following on the shell:

val data = 1 to 1

val distData = sc.parallelize(data)

distData.filter(_< 10).collect()

open spark web ui at host:4040 and see an active job.

NOW, how do i start workers or spark workers on mesos ? who completes my
job?
thanks,

-- 
Ani