Lucene Index with Spark Cassandra

2017-12-17 Thread Junaid Nasir
Hi everyone, I am trying to run lucene with spark but sparkSQL returns zero results, where when same query is run using cqlsh it returns correct rows. same issue as https://github.com/Stratio/cassandra-lucene-index/issues/79 I can see in spark logs that lucene is working but as mentioned in the

Iterate over grouped df to create new rows/df

2017-07-07 Thread Junaid Nasir
Hi everyone, I am kind of stuck in a problem and was hoping for some pointers or help :) have tried different things but couldn't achieve the desired results. I want to *create single row from multiple rows if those rows are continuous* (based on time i.e if next row's time is within 2 minutes

Re: spark cluster performance decreases by adding more nodes

2017-05-18 Thread Junaid Nasir
be group by , which under certain circumstances can cause > a lot of traffic to one node. This transfer is of course obsolete the less > nodes you have. > Have you checked in the UI what it reports? > > On 17. May 2017, at 17:13, Junaid Nasir <jna...@an10.io> wrote: > > I

spark cluster performance decreases by adding more nodes

2017-05-17 Thread Junaid Nasir
I have a large data set of 1B records and want to run analytics using Apache spark because of the scaling it provides, but I am seeing an anti pattern here. The more nodes I add to spark cluster, completion time increases. Data store is Cassandra, and queries are run by Zeppelin. I have tried many