sliding Top N window

2016-03-11 Thread Yakubovich, Alexey
Good day,

I have a following task: a stream of “page vies” coming to kafka topic. Each 
view contains list of product Ids from a visited page. The task: to have in 
“real time” Top N product.

I am interested in some solution that would require minimum intermediate writes 
… So  need to build a sliding window for top N product, where the product 
counters dynamically changes and window should present the TOP product for the 
specified period of time.

I believe there is no way to avoid maintaining all product counters counters in 
memory/storage.  But at least I would like to do all logic, all calculation on 
a fly, in memory, not spilling multiple RDD from memory to disk.

So I believe I see one way of doing it:
   Take, msg from kafka take and line up, all elementary action (increase by 1 
the counter for the product PID )
  Each action will be implemented as a call to HTable.increment()  // or 
easier, with incrementColumnValue()…
  After each increment I can apply my own operation “offer” would provide that 
only top N products with counters are kept in another Hbase table (also with 
atomic operations).
 But there is another stream of events: decreasing product counters when view 
expires the legth of sliding window….

So my question: does anybody know/have and can share the piece code/ know how: 
how to implement “sliding Top N window” better.
If nothing will be offered, I will share what I will do myself.

Thank you
Alexey

This message, including any attachments, is the property of Sears Holdings 
Corporation and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.


Unsupported major.minor version 51.0

2015-08-11 Thread Yakubovich, Alexey
I found some discussions online, but it all cpome to advice to use JDF 1.7 (or 
1.8).
Well, I use JDK 1.7 on OS X Yosemite .  Both
java –verion //

java version 1.7.0_80

Java(TM) SE Runtime Environment (build 1.7.0_80-b15)

Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

and
echo $JAVA_HOME// 
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home
show JDK 1.7.
But for the Spark 1.4.1.  (and for Spark 1.2.2, downloaded 07/10/2015, I have  
the same error when build with maven ()  (as sudo mvn -DskipTests -X clean 
package  abra.txt)

Exception in thread main java.lang.UnsupportedClassVersionError: 
org/apache/maven/cli/MavenCli : Unsupported major.minor version 51.0


Please help how to build the thing.

Thanks

Alexey

This message, including any attachments, is the property of Sears Holdings 
Corporation and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.


Can't build Spark 1.3

2015-06-02 Thread Yakubovich, Alexey
\
I downloaded the latest Spark (1.3.) from github. Then I tried to build it.
First for scala 2.10 (and hadoop 2.4):

build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

That resulted in hangup after printing bunch of line like

[INFO] Dependency-reduced POM written at ……
INFO] Dependency-reduced -
Then I tried for scala 2.11

mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

That resulted in multiple compilation errors.

What I actually want is:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-0.12.0 
-Phive-thriftserver -DskipTests clean package

Is it only me, who can’t build Spark 1.3?
And, is there any site  to download Spark prebuilt for Hadoop 2.5 and Hive?

Thank you for any help.
Alexey

This message, including any attachments, is the property of Sears Holdings 
Corporation and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.