date:20150520

Re: Spark and RabbitMQ

2015-05-20 Thread Abel Rincón

Hi,


There is a RabbitMQ reciver for spark-streaming

http://search.maven.org/#artifactdetails|com.stratio.receiver|rabbitmq|0.1.0-RELEASE|jar

https://github.com/Stratio/RabbitMQ-Receiver


2015-05-12 14:49 GMT+02:00 Dmitry Goldenberg :

> Thanks, Akhil. It looks like in the second example, for Rabbit they're
> doing this: https://www.rabbitmq.com/mqtt.html.
>
> On Tue, May 12, 2015 at 7:37 AM, Akhil Das 
> wrote:
>
>> I found two examples Java version
>> ,
>> and Scala version. 
>>
>> Thanks
>> Best Regards
>>
>> On Tue, May 12, 2015 at 2:31 AM, dgoldenberg 
>> wrote:
>>
>>> Are there existing or under development versions/modules for streaming
>>> messages out of RabbitMQ with SparkStreaming, or perhaps a RabbitMQ RDD?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-RabbitMQ-tp22852.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: spark streaming doubt

2015-05-20 Thread Akhil Das

One receiver basically runs on 1 core, so if your single node is having 4
cores, there are still 3 cores left for the processing (for executors). And
yes receiver remains on the same machine unless some failure happens.

Thanks
Best Regards

On Tue, May 19, 2015 at 10:57 PM, Shushant Arora 
wrote:

> Thanks Akhil andDibyendu.
>
> Does in high level receiver based streaming executors run on receivers
> itself to have data localisation ? Or its always data is transferred to
> executor nodes and executor nodes differ in each run of job but receiver
> node remains same(same machines) throughout life of streaming application
> unless node failure happens?
>
>
>
> On Tue, May 19, 2015 at 9:29 PM, Dibyendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
>> Just to add, there is a Receiver based Kafka consumer which uses Kafka
>> Low Level Consumer API.
>>
>> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer
>>
>>
>> Regards,
>> Dibyendu
>>
>> On Tue, May 19, 2015 at 9:00 PM, Akhil Das 
>> wrote:
>>
>>>
>>> On Tue, May 19, 2015 at 8:10 PM, Shushant Arora <
>>> shushantaror...@gmail.com> wrote:
>>>
 So for Kafka+spark streaming, Receiver based streaming used highlevel
 api and non receiver based streaming used low level api.

 1.In high level receiver based streaming does it registers consumers at
 each job start(whenever a new job is launched by streaming application say
 at each second)?

>>>
>>> -> Receiver based streaming will always have the receiver running
>>> parallel while your job is running, So by default for every 200ms
>>> (spark.streaming.blockInterval) the receiver will generate a block of data
>>> which is read from Kafka.
>>> 
>>>
>>>
 2.No of executors in highlevel receiver based jobs will always equal to
 no of partitions in topic ?

>>>
>>> -> Not sure from where did you came up with this. For the non stream
>>> based one, i think the number of partitions in spark will be equal to the
>>> number of kafka partitions for the given topic.
>>> 
>>>
>>>
 3.Will data from a single topic be consumed by executors in parllel or
 only one receiver consumes in multiple threads and assign to executors in
 high level receiver based approach ?

 -> They will consume the data parallel. For the receiver based
>>> approach, you can actually specify the number of receiver that you want to
>>> spawn for consuming the messages.
>>>



 On Tue, May 19, 2015 at 2:38 PM, Akhil Das 
 wrote:

> spark.streaming.concurrentJobs takes an integer value, not boolean.
> If you set it as 2 then 2 jobs will run parallel. Default value is 1 and
> the next job will start once it completes the current one.
>
>
>> Actually, in the current implementation of Spark Streaming and under
>> default configuration, only job is active (i.e. under execution) at any
>> point of time. So if one batch's processing takes longer than 10 seconds,
>> then then next batch's jobs will stay queued.
>> This can be changed with an experimental Spark property
>> "spark.streaming.concurrentJobs" which is by default set to 1. Its not
>> currently documented (maybe I should add it).
>> The reason it is set to 1 is that concurrent jobs can potentially
>> lead to weird sharing of resources and which can make it hard to debug 
>> the
>> whether there is sufficient resources in the system to process the 
>> ingested
>> data fast enough. With only 1 job running at a time, it is easy to see 
>> that
>> if batch processing time < batch interval, then the system will be 
>> stable.
>> Granted that this may not be the most efficient use of resources under
>> certain conditions. We definitely hope to improve this in the future.
>
>
> Copied from TD's answer written in SO
> 
> .
>
> Non-receiver based streaming for example you can say are the
> fileStream, directStream ones. You can read a bit of information from here
> https://spark.apache.org/docs/1.3.1/streaming-kafka-integration.html
>
> Thanks
> Best Regards
>
> On Tue, May 19, 2015 at 2:13 PM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> Thanks Akhil.
>> When I don't  set spark.streaming.concurrentJobs to true. Will the
>> all pending jobs starts one by one after 1 jobs completes,or it does not
>> creates jobs which could not be started at its desired interval.
>>
>> And Whats the difference and usage of Receiver vs non-receiver based
>> streaming. Is there any documentation for that?
>>
>> On Tue, May 19, 2015 at 1:35 PM, Akhil Das <
>> ak...@sigmoidanalytics.com> wrote:
>>
>>> It will be a single job running at a time by default (you can also
>>> configure the spark.streaming.concurren

Re: Reading Binary files in Spark program

2015-05-20 Thread Akhil Das

If you can share the complete code and a sample file, may be i can try to
reproduce it on my end.

Thanks
Best Regards

On Wed, May 20, 2015 at 7:00 AM, Tapan Sharma 
wrote:

> Problem is still there.
> Exception is not coming at the time of reading.
> Also the count of JavaPairRDD is as expected. It is when we are calling
> collect() or toArray() methods, the exception is coming.
> Something to do with Text class even though I haven't used it in the
> program.
>
> Regards
> Tapan
>
> On Tue, May 19, 2015 at 6:26 PM, Akhil Das 
> wrote:
>
>> Try something like:
>>
>> JavaPairRDD output = sc.newAPIHadoopFile(inputDir,
>>
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,
>> IntWritable.class,
>>   Text.class, new Job().getConfiguration());
>>
>> With the type of input format that you require.
>>
>> Thanks
>> Best Regards
>>
>> On Tue, May 19, 2015 at 3:57 PM, Tapan Sharma 
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am new to Spark and learning.
>>> I am trying to read image files into spark job. This is how I am doing:
>>> Step 1. Created sequence files with FileName as Key and Binary image as
>>> value. i.e.  Text and BytesWritable.
>>> I am able to read these sequence files into Map Reduce programs.
>>>
>>> Step 2.
>>> I understand that Text and BytesWritable are Non Serializable therefore,
>>> I
>>> read the sequence file in Spark as following:
>>>
>>> SparkConf sparkConf = new SparkConf().setAppName("JavaSequenceFile");
>>> JavaSparkContext ctx = new JavaSparkContext(sparkConf);
>>> JavaPairRDD seqFiles = ctx.sequenceFile(args[0],
>>> String.class, Byte.class) ;
>>> final List> tuple2s = seqFiles.collect();
>>>
>>>
>>>
>>>
>>> The moment I try to call collect() method to get the keys of sequence
>>> file,
>>> following exception has been thrown
>>>
>>> Can any one help me understanding why collect() method is failing? If I
>>> use
>>> toArray() on seqFiles object then also I am getting same call stack.
>>>
>>> Regards
>>> Tapan
>>>
>>>
>>>
>>> java.io.NotSerializableException: org.apache.hadoop.io.Text
>>> at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>>> at
>>>
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>> at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>> at
>>>
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>> at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>> at
>>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>> at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
>>> at
>>> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>>> at
>>>
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
>>> at
>>>
>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:206)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2015-05-19 15:15:03,705 ERROR [task-result-getter-0]
>>> scheduler.TaskSetManager (Logging.scala:logError(75)) - Task 0.0 in stage
>>> 0.0 (TID 0) had a not serializable result: org.apache.hadoop.io.Text; not
>>> retrying
>>> 2015-05-19 15:15:03,731 INFO  [task-result-getter-0]
>>> scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet
>>> 0.0, whose tasks have all completed, from pool
>>> 2015-05-19 15:15:03,739 INFO
>>> [sparkDriver-akka.actor.default-dispatcher-2]
>>> scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Cancelling
>>> stage 0
>>> 2015-05-19 15:15:03,747 INFO  [main] scheduler.DAGScheduler
>>> (Logging.scala:logInfo(59)) - Job 0 failed: collect at
>>> JavaSequenceFile.java:44, took 4.421397 s
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due
>>> to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable
>>> result: org.apache.hadoop.io.Text
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
>>> at
>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
>>> at
>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
>>> at
>>>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at
>>>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Sean Owen

I don't think that's quite the difference. Any SQL  engine has a query
planner and an execution engine. Both of these Spark for execution. HoS
uses Hive for query planning. Although it's not optimized for execution on
Spark per se, it's got a lot of language support and is stable/mature.
Spark SQL's query planner is less developed at this point but purpose-built
for Spark as an execution engine. Spark SQL is also how you put SQL-like
operations in a Spark program -- programmatic SQL if you will -- which
isn't what Hive or therefore HoS does. HoS is good if you're already using
Hive and need its language features and need it as it works today, and want
a faster batch execution version of it.

On Wed, May 20, 2015 at 7:18 AM, Debasish Das 
wrote:

> SparkSQL was built to improve upon Hive on Spark runtime further...
>
> On Tue, May 19, 2015 at 10:37 PM, guoqing0...@yahoo.com.hk <
> guoqing0...@yahoo.com.hk> wrote:
>
>> Hive on Spark and SparkSQL which should be better , and what are the key
>> characteristics and the advantages and the disadvantages between ?
>>
>> --
>> guoqing0...@yahoo.com.hk
>>
>
>

Re: Mesos Spark Tasks - Lost

2015-05-20 Thread Tim Chen

Can you share your exact spark-submit command line?

And also cluster mode is not yet released yet (1.4) and doesn't support
spark-shell, so I think you're just using client mode unless you're using
latest master.

Tim

On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis 
wrote:

> Hello all,
>
> I am facing a weird issue for the last couple of days running Spark on top
> of Mesos and I need your help. I am running Mesos in a private cluster and
> managed to deploy successfully  hdfs, cassandra, marathon and play but
> Spark is not working for a reason. I have tried so far:
> different java versions (1.6 and 1.7 oracle and openjdk), different
> spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
> different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.
>
> More specifically while local tasks complete fine, in cluster mode all the
> tasks get lost.
> (both using spark-shell and spark-submit)
> From the worker log I see something like this:
>
> ---
> I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
> 'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
> 'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
> Client
> I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
> 'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
> into
> '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
> *Error: Could not find or load main class two*
>
> ---
>
> And from the Spark Terminal:
>
> ---
> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
> 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
> SparkPi.scala:35
> 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
> SparkPi.scala:35
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
> failure: Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
> (executor lost)
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> ..
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> ---
>
> Any help will be greatly appreciated!
>
> Regards,
> Panagiotis
>

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes


Hi,

 thanks for answer. The rights are

drwxr-xr-x 3 tfruboes all 5632 05-19 15:40 test19EE/

 I have tried setting the rights to 777 for this directory prior to 
execution. This does not get propagated down the chain, ie the directory 
created as a result of the "save" call (namesAndAges.parquet2 in the 
path in the dump [1] below) is created with the drwxr-xr-x rights (owned 
by the user submitting the job, ie tfruboes). The temp directories 
created inside


namesAndAges.parquet2/_temporary/0/

(e.g. task_201505200920_0009_r_01) are owned by root, again with 
drwxr-xr-x access rights


 Cheers,
  Tomasz

W dniu 19.05.2015 o 23:56, Davies Liu pisze:

It surprises me, could you list the owner information of
/mnt/lustre/bigdata/med_home/tmp/test19EE/ ?

On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
 wrote:

Dear Experts,

  we have a spark cluster (standalone mode) in which master and workers are
started from root account. Everything runs correctly to the point when we
try doing operations such as

 dataFrame.select("name", "age").save(ofile, "parquet")

or

 rdd.saveAsPickleFile(ofile)

, where ofile is path on a network exported filesystem (visible on all
nodes, in our case this is lustre, I guess on nfs effect would be similar).

  Unsurprisingly temp files created on workers are owned by root, which then
leads to a crash (see [1] below). Is there a solution/workaround for this
(e.g. controlling file creation mode of the temporary files)?

Cheers,
  Tomasz


ps I've tried to google this problem, couple of similar reports, but no
clear answer/solution found

ps2 For completeness - running master/workers as a regular user solves the
problem only for the given user. For other users submitting to this master
the result is given in [2] below


[0] Cluster details:
Master/workers: centos 6.5
Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the 2.6 build)


[1]
##
File
"/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet;
isDirectory=false; length=534; replication=1; blocksize=33554432;
modification_time=1432042832000; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet
 at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
 at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
 at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
 at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
 at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690)
 at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
 at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
 at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
 at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
##



[2]
##
15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3,
wn23023.cis.gov.pl): java.io.IOException: Mkdirs failed to create
file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_00_0
 at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
 at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
 at org.apache.hadoop.fs.FileSystem.crea

Is this a good use case for Spark?

2015-05-20 Thread jakeheller

Hi all, I'm new to Spark -- so new that we're deciding whether to use it in
the first place, and I was hoping someone here could help me figure that
out.

We're doing a lot of processing of legal documents -- in particular, the
entire corpus of American law. It's about 10m documents, many of which are
quite large as far as text goes (100s of pages).

We'd like to
(a) transform these documents from the various (often borked) formats they
come to us in into a standard XML format,
(b) when it is in a standard format, extract information from them (e.g.,
which judicial cases cite each other?) and annotate the documents with the
information extracted, and then
(c) deliver the end result to a repository (like s3) where it can be
accessed by the user-facing application.

Of course, we'd also like to do all of this quickly -- optimally, running
the entire database through the whole pipeline in a few hours.

We currently use a mix of Python and Java scripts (including XSLT, and
NLP/unstructured data tools like UIMA and Stanford's CoreNLP) in various
places along the pipeline we built for ourselves to handle these tasks. The
current pipeline infrastructure was built a while back -- it's basically a
number of HTTP servers that each have a single task and pass the document
along from server to server as it goes through the processing pipeline. It's
great although it's having trouble scaling, and there are some reliability
issues. It's also a headache to handle all the infrastructure. For what it's
worth, metadata about the documents resides in SQL, and the actual text of
the documents lives in s3.

It seems like Spark would be ideal for this, but after some searching I
wasn't able to find too many examples of people using it for
document-processing tasks (like transforming documents from one XML format
into another) and I'm not clear if I can chain those sorts of tasks and NLP
tasks, especially if some happen in Python and others in Java. Finally, I
don't know if the size of the data (i.e., we'll likely want to run
operations on whole documents, rather than just lines) imposes
issues/constraints.

Thanks all!
Jake

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-this-a-good-use-case-for-Spark-tp22954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

90 matches

Mail list logo