Just out of curiosity, what would happen if you put your 10K values in to a
temp table and then did a join against it?
> On Apr 5, 2017, at 4:30 PM, Maciej Bryński wrote:
>
> Hi,
> I'm trying to run queries with many values in IN operator.
>
> The result is that for more
Guys… please take what I say with a grain of salt…
The issue is that the input is a stream of messages where they are addressed in
a LIFO manner. This means that messages may be ignored. The stream of data
(user@spark for example) is semi-structured in that the stream contains a lot
of
Hi,
Apologies if I’ve asked this question before but I didn’t see it in the list
and I’m certain that my last surviving brain cell has gone on strike over my
attempt to reduce my caffeine intake…
Posting this to both user and dev because I think the question / topic jumps in
to both camps.
Silly question?
When you talk about ‘user specified schema’ do you mean for the user to supply
an additional schema, or that you’re using the schema that’s described by the
JSON string?
(or both? [either/or] )
Thx
On Sep 28, 2016, at 12:52 PM, Michael Armbrust
Hi,
There are a lot of moving parts and a lot of unknowns from your description.
Besides the version stuff.
How many executors, how many cores? How much memory?
Are you persisting (memory and disk) or just caching (memory)
During the execution… same tables… are you seeing a lot of
le/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 30 May 2016 at 17:08, Michael Segel <msegel_had...@hotmail.com
> <mailto:msegel_had...@hotmail.com>> wrote:
I’m not sure where to post this since its a bit of a philosophical question in
terms of design and vision for spark.
If we look at SparkSQL and performance… where does Secondary indexing fit in?
The reason this is a bit awkward is that if you view Spark as querying RDDs
which are temporary,
Hi,
I saw a replay of a talk about what’s coming in Spark 2.0 and the speed
performances…
I am curious about indexing of data sets.
In HBase/MapRDB you can create ordered sets of indexes through an inverted
table.
Here, you can take the intersection of the indexes to find the result set of
gt; On Wed, Mar 30, 2016 at 4:33 AM, Steve Loughran <ste...@hortonworks.com>
>> wrote:
>>>
>>>> On 29 Mar 2016, at 22:19, Michael Segel <msegel_had...@hotmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> So yeah, I kn
Hi,
So yeah, I know that Spark jobs running on a Hadoop cluster will inherit its
security from the underlying YARN job.
However… that’s not really saying much when you think about some use cases.
Like using the thrift service …
I’m wondering what else is new and what people have been
Hi,
I’m looking at the online docs for building spark 1.4.1 …
http://spark.apache.org/docs/latest/building-spark.html
http://spark.apache.org/docs/latest/building-spark.html
I was interested in building spark for Scala 2.11 (latest scala) and also for
Hive and JDBC support.
The docs say:
11 matches
Mail list logo